We’ve all probably Googled ourselves before. In the age of the Internet, everyone seems to have some sort of online presence. Whether it’s intentional or not, we all have created this presence by interacting with online artifacts such as social media. In the past ten years, data has become a very important component of the internet. Some even refer to it as the new oil. As a result, our data isn’t just out there for the purpose of Googling ourselves and having a laugh, but also so companies and other interested parties can access it and treat it like a commodity. As data has become one of the biggest aspects of our digital world, some websites have learned to respect privacy while others have violated it.
I searched using three different browsers: Safari, Google Chrome, and Firefox. To my surprise, results were very consistent across these browsers, implying that they don't play much of a role in controlling privacy. I also used incognito mode in the browsers, but results were still consistent. Since my name is relatively unique, I first just searched it without modifications as “Ian Mascarenhas.” One problem I came across was that results were coming up specifically for my last name. To try and narrow my results, I searched “Ian Mascarenhas Michigan” and “Ian Mascarenhas University of Michigan.” Generally as I went through search results, I noticed that the queries fell into one of three categories. The first one are pages that I have created and control: my social media accounts. The second results are pages that I know exist, but I don’t have control over information on them. The last category are pages that have information on me without my knowledge or permission.
1st Classification: Social MediaThe first category, pages that I have control over, are my social media pages: LinkedIn, Facebook, and Instagram. My Linkedin is the very first result when I search my name. I have control over all information that shows up on my LinkedIn, and I don’t have an issue with my profile being very visible to the public, as I primarily use it for networking. Unlike most social media, LinkedIn gives the feature to limit profile visibility.
2nd Classification: Knowledge of Existence but No Control
The next classification of search results are pages that I know exist and have my name on them, but unlike social media accounts, I am unable to control the information on these pages.
Search Query: 'Ian Mascarenhas University of Michigan'
When I use “Ian Mascarenhas University of Michigan” as a search query, I see several University of Michigan pages that have my name on them. For example, my MCommunity page is one of the search results. Although I don’t directly have control over this page, the school has decided that this information should be public. There is also a page of EECS tutors that has my name and phone number. Just like the MCommunity page, I provided my information to the university and they chose to make it publicly available. I think this category of search results is intriguing, because it’s not necessarily an invasion of privacy, but I still don’t have the ability to take the information down if I want to. Danah Boyd and Kate Crawford offer an interesting commentary on the situation: “Many are not aware of the multiplicity of agents and algorithms currently gathering and storing their data for future use. Researchers are rarely in a user’s imagined audience. Users are not necessarily aware of all the multiple uses, profits, and other gains that come from information they have posted. Data may be public (or semi-public) but this does not simplistically equate with full permission being given for all uses. Big Data researchers rarely acknowledge that there is a considerable difference between being in public (i.e. sitting in a park) and being public (i.e. actively courting attention) (boyd & Marwick 2011).” Even though people aren’t necessarily doing “research” on my data, the same principle still applies. I have given certain information to the university, but that doesn’t mean I gave permission for all uses. An interested party might use the tutor page to find my name and phone number, and they can then look up my name on the MCommunity site. Without any of my explicit permission, that person now has access to my full name, phone number, and email address. While these websites aren't necessarily doing anything harmful, they aren't giving me control over my own information. Even though I am fine with them presenting my information, other people might not be so relaxed. Also, while these pages aren't doing anything bad, other websites might have more malicious intentions.
3rd Classification: Unknown Parties
The last category of search results that I found are sites that have my information without my permission. Without completing this assignment, I would have no idea that these sites have my information. One of these sites is called michiganresidentdatabase.com. This site has lots of data on me, including my birth year, my voter registration date, my voter id, and most importantly, my address. Along with that information, the site also provides the same information about my family and people who live near me. Another site with the exact same information is voterrecords.com.
After finding all this information on myself from a few simple internet searches, I started to wonder: if all this personal data is available to the public on the first few pages of search engines, then what kind of data is available behind the scenes? Without my consent/permission, what companies or organizations are secretly obtaining information about me? With some of these websites, such as SignalHire, they might just be scraping the web for data that already exists, but does that make it ethical for them to use it? What websites could potentially release secretly obtained information to the public? There’s a difference between being in public and being public, but is that understood by certain websites? One of my biggest concerns is that I can't answer these questions about my own data, and I should be able to. As the global scale of the web increases and big data becomes a bigger phenomenon in our society, we should continue to ask these questions in order to get a better understanding of big data ethics.