Given the prompt of this assignment, I did some thinking about the importance of data identity to me and my family. To my surprise, I’ve realized that data identity has a very influential presence for my family members however it is not as important to me. There are many examples I can give regarding my past where my dad intentionally withholds information from a form or a website or fills in incorrect names, birthdays, and other information to conceal our data identity. I would come to find that these actions have led to a choppy and incomprehensible data identity. That idea went against my expectations since I never think about my data identity or how to protect it and my life heavily revolves around the use of the internet. To rationalize this, I realized that for most of my life I didn’t use social media and as a child, I moved homes frequently.
As said earlier, searching my name on Google yielded poor results. If you search up Aaron Zhang on google there are results for many different profiles for Aaron Zhang on Facebook, LinkedIn, and Instagram to name a few. None of these have any relation to me. The images tab also did not have any pictures relevant to me.
KeywordsTo have a better chance of finding myself through a Google search, I started to narrow down my search through certain keywords. The only social media applications I use are Instagram and Facebook. If you search those with my name it is still very hard to find me. Searching with Instagram does not show my Instagram account. There was a pattern among those that came up in results. Almost all of the recommended results had Instagram user handles that included the full name Aaron Zhang somehow. This is not the case with mine which only includes my first name. Searching with Facebook showed a link to a Facebook search of different Aaron Zhang accounts registered with my name, account, and profile picture included. On Facebook, there is some personal information you could find about me but not much. The most you can find is my birthday, high school, university, hometown, and friends. Using what can be found on Facebook, I widened my keyword search to include my high school, UofM, and my hometown. If you search my name and hometown, you can find a school-published newsletter with my name as one of the scholars of the highest distinction in the graduating class of 2019. Surprisingly, nothing comes up when searching me in conjunction with my UofM. The biggest breakthrough when trying to learn about myself from an outsider’s perspective was with searching with my address. The first three results are all related to me. The first one is a record of my swimming history on my high school team.
Using image search was a huge letdown. Trying to search for me on google image search using an image of me or something I’ve posted in the past shows results like boy or pictures of people that look similar to me instead of anywhere I’d posted the picture.
To see if there was a better way to search for data on a person, I used a free data brokering website to see if there was anything they could find about me on the internet. The website could not find anything about me given my name, address, city, and age.
I play a lot of video games. They are a part of my daily life. Many of the games I play are multiplayer connected that store my progress through an account. Because of this, I would consider this also a part of my data identity. It has become more apparent to me that in addition to the development of games there is also data associated with those accounts. For example, the game League of Legends is a popular game I play. Every player must have an account with the parent company to play the game. The game also runs off of a client that collects user data. This is because from the client you can report players, bugs, and issues. The client also has an end-of-year player statistic recap. This includes that data about those that you’ve played the most with, win ratios, most played characters, new skins acquired, etc. There is also a website called op.gg that allows people to search players based on region and username. Here you can find win ratios for all champions played for each season the player was active. You can also find their recent games played and which other accounts they play the most with. What I think is important about this type of data collection is when it is used to make deductions about players.
Other websites help to track down win ratios, item builds, pick/ban ratios, and other general statistics based on thousands of players each day. This is a small example of “Big Dick Data” like of definition given by Catherine D’Ignazio and Lauren Klein’s article, The Numbers Don’t Speak for Themselves. A simple explanation of the relationship between this article and the data collection of this video game is that the game is too complicated to balance based on the obvious win-ratios alone. Win-ratios around 50% indeed means that a champion is probably balanced. However, this incorrectly represents most of the video game. For example, champions that are very mechanically challenging tend to have lower win rates because their high skill cap rewards players that spend time to master the champion and punish those that are new. This type of observation is the same as described in D’Ignazio and Klein’s article. Inappropriate collection and use of “Big Dick Data” having the inverse side effect of creating bias. Creating bias in which champions are stronger than others is not important at all, but it does show a good example of what can go wrong when the big data is not held accountable for proper contextualization of data. You can imagine how important and influential decisions may become when using big data gathered from millions or average workers from all across the country. The stakes become much higher in this case.
Password management is a huge issue during times like this. Since I am just getting started in my professional career, I’ve been making steps towards preparing myself to find a job after college. Doing this includes trying to apply for internships for some experience in my chosen field. I am a computer science major and so now I’m registering accounts for many recruiter websites like LinkedIn, Glassdoor, and Handshake. Not only that but many fortune 500 companies with software development internships post internships directly on their website in addition to the third-party posting sites. Therefore when applying to internships and registering an account with a username and password with a different character, number, and symbol requirements, it’s easier to use Google’s password save feature. This way Google saves all my passwords for all of my accounts and automatically fills it in for me when I go to sign in to those websites again. In Critical Questions For Big Data by Danah Boyd and Kate Crawford this idea is brought up and its negative consequences. Does the storage of my passwords by Google put my security at risk? For example, if the passwords stored by Google were stolen could there be a way to find which passwords are for my accounts. All my passwords are very similar to each other and based on previous information you could rule out other possibilities based on the websites I have passwords for. Video game accounts, high school login, universities, recruiting websites, and social media are all good criteria to narrow down a search for data on me. What could be worse is if Google stores data of my account, drive, passwords, etc. all together. Then by narrowing down any one of those data sectors, someone could also find access to all my other data associated with my Google account. This is a big concern especially for those that have important passwords like for a bank account. Unless it becomes impossible for unwanted individuals to get access to these things, big data will always need to be developed to be protected, or else major ethical injustices could be committed.
Boyd, D., & Crawford, K. (2012). Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication & Society.
D'Ignazio, C., & Klein, L. F. (2020) The Numbers Don’t Speak for Themselves. Data Feminism. Cambridge, MA: MIT Press.