Ian Mascarenhas

From SI410
Jump to: navigation, search

We’ve all probably Googled ourselves before. In the age of the Internet, everyone seems to have some sort of online presence. Whether it’s intentional or not, we all have created this presence by interacting with online artifacts such as social media. In the past ten years, data has become a very important component of the internet. Some even refer to it as the new oil. As a result, our data isn’t just out there for the purpose of Googling ourselves and having a laugh, but also so companies and other interested parties can access it and treat it like a commodity. As data has become one of the biggest aspects of our digital world, some websites have learned to respect privacy while others have violated it.


I searched using three different browsers: Safari, Google Chrome, and Firefox. To my surprise, results were very consistent across these browsers, implying that they don't play much of a role in controlling privacy. I also used incognito mode in the browsers, but results were still consistent. Since my name is relatively unique, I first just searched it without modifications as “Ian Mascarenhas.” One problem I came across was that results were coming up specifically for my last name. To try and narrow my results, I searched “Ian Mascarenhas Michigan” and “Ian Mascarenhas University of Michigan.” Generally as I went through search results, I noticed that the queries fell into one of three categories. The first one are pages that I have created and control: my social media accounts. The second results are pages that I know exist, but I don’t have control over information on them. The last category are pages that have information on me without my knowledge or permission.

1st Classification: Social Media

The first category, pages that I have control over, are my social media pages: LinkedIn, Facebook, and Instagram. My Linkedin is the very first result when I search my name. I have control over all information that shows up on my LinkedIn, and I don’t have an issue with my profile being very visible to the public, as I primarily use it for networking. Unlike most social media, LinkedIn gives the feature to limit profile visibility.
Linkedin's Privacy Policy
According to the Linkedin help page, “Viewers who aren't signed in to LinkedIn will see the sections of your profile you choose to display publicly... You can also choose to hide your public profile from non-LinkedIn members and from appearing in search engine results.” They also make it known to their users that “after you make changes or edits to your public profile, it can take several weeks or months at times for search engines like Google, Yahoo, or Bing to detect changes and refresh. LinkedIn doesn’t control that refresh process...” The next social media page is Facebook, which is also on the first page of results. I also control all information on my Facebook page. In addition, Facebook limits the amount of information that a person can see on someone else’s profile without being signed into their own account. Unlike LinkedIn, Facebook doesn’t give the option to limit certain parts of your profile to the public. At the same time, I’m able to give as much or as little information as I please. The last social media account that I have is Instagram, which appeared in the results because my full name is in my profile. My Instagram account is private, so people can only see my profile picture and biography, unless they request to follow me. With all social media, I have full control over how much information appears on my profile. At any time, I can choose to deactivate any of these accounts and they will no longer show up in search results. Since I am the direct source for these pages, they serve as an accurate portrayal of my digital identity. Though these websites could be using my data behind the scenes, they have given the illusion that I have full control, which is something other websites don't do at all.

2nd Classification: Knowledge of Existence but No Control

The next classification of search results are pages that I know exist and have my name on them, but unlike social media accounts, I am unable to control the information on these pages.

Search Query: 'Ian Mascarenhas University of Michigan'

When I use “Ian Mascarenhas University of Michigan” as a search query, I see several University of Michigan pages that have my name on them. For example, my MCommunity page is one of the search results. Although I don’t directly have control over this page, the school has decided that this information should be public. There is also a page of EECS tutors that has my name and phone number. Just like the MCommunity page, I provided my information to the university and they chose to make it publicly available. I think this category of search results is intriguing, because it’s not necessarily an invasion of privacy, but I still don’t have the ability to take the information down if I want to. Danah Boyd and Kate Crawford offer an interesting commentary on the situation: “Many are not aware of the multiplicity of agents and algorithms currently gathering and storing their data for future use. Researchers are rarely in a user’s imagined audience. Users are not necessarily aware of all the multiple uses, profits, and other gains that come from information they have posted. Data may be public (or semi-public) but this does not simplistically equate with full permission being given for all uses. Big Data researchers rarely acknowledge that there is a considerable difference between being in public (i.e. sitting in a park) and being public (i.e. actively courting attention) (boyd & Marwick 2011).” Even though people aren’t necessarily doing “research” on my data, the same principle still applies. I have given certain information to the university, but that doesn’t mean I gave permission for all uses. An interested party might use the tutor page to find my name and phone number, and they can then look up my name on the MCommunity site. Without any of my explicit permission, that person now has access to my full name, phone number, and email address. While these websites aren't necessarily doing anything harmful, they aren't giving me control over my own information. Even though I am fine with them presenting my information, other people might not be so relaxed. Also, while these pages aren't doing anything bad, other websites might have more malicious intentions.

3rd Classification: Unknown Parties

The last category of search results that I found are sites that have my information without my permission. Without completing this assignment, I would have no idea that these sites have my information. One of these sites is called michiganresidentdatabase.com. This site has lots of data on me, including my birth year, my voter registration date, my voter id, and most importantly, my address. Along with that information, the site also provides the same information about my family and people who live near me. Another site with the exact same information is voterrecords.com.


According to these websites, my information is on the site because the government has released it as public record. Another site that has my information without permission is SignalHire, which is a contacts search engine. The information on this site has been taken directly from my Linkedin page. According to the site’s privacy policy, SignalHire acts as a search engine and accesses information that is already available: “We do not own this information nor can we remove or change it - we only find and then index what is publicly available to anyone on the WEB.” SignalHire isn’t providing any new information on me, but it’s simply acting as a middleman. I never gave explicit permission to SignalHire and I can’t control the information on their website. Unlike the other two categories that I found, the websites in this classification don't have my permission (direct or indirect) to share my information and I prefer that the pages don't exist, although I don't have control over that.


After finding all this information on myself from a few simple internet searches, I started to wonder: if all this personal data is available to the public on the first few pages of search engines, then what kind of data is available behind the scenes? Without my consent/permission, what companies or organizations are secretly obtaining information about me? With some of these websites, such as SignalHire, they might just be scraping the web for data that already exists, but does that make it ethical for them to use it? What websites could potentially release secretly obtained information to the public? There’s a difference between being in public and being public, but is that understood by certain websites? One of my biggest concerns is that I can't answer these questions about my own data, and I should be able to. As the global scale of the web increases and big data becomes a bigger phenomenon in our society, we should continue to ask these questions in order to get a better understanding of big data ethics.