Given the prompt of this assignment, I did some thinking about the importance of data identity to me and my family. To my surprise, I’ve realized that data identity has a very influential presence for my family members, however it is not as important to me. There are many examples I can give regarding my past where my dad intentionally withholds information from a form or a website or fills in incorrect names, birthdays, and other information to conceal our data identity. I would come to find that these actions have led to a choppy and incomprehensible data identity. That idea went against my expectations since I never think about my data identity or how to protect it and my life heavily revolves around the use of the internet. To rationalize this, I realized that I have a very common Chinese last name, for most of my life I didn’t use social media, and as a child, I moved homes frequently.
As said earlier, searching my name on Google yielded poor results. If you search up Aaron Zhang on google there are results for many different profiles for Aaron Zhang on Facebook, LinkedIn, and Instagram to name a few. None of these have any relation to me. The images tab also did not have any pictures relevant to me.
KeywordsTo have a better chance of finding myself through a Google search, I started to narrow down my search through certain keywords. The only social media applications I use are Instagram and Facebook. Searching with Instagram does not show my Instagram account. Searching with Facebook showed a link to a Facebook search of different Aaron Zhang accounts registered with my name, account, and profile picture included. On Facebook, there is some personal information you can find about me but not much. The most you can find is my birthday, high school, university, hometown, and friends. Using what can be found on Facebook, I widened my keyword search to include my high school, UofM, and my hometown. If you search my name and hometown, you can find a school-published newsletter with my name as one of the scholars of the highest distinction in the graduating class of 2019. Surprisingly, nothing comes up when searching me in conjunction with UofM. The biggest breakthrough when trying to learn about myself from an outsider’s perspective was searching with my address. The first three results are all related to me. The first one is a record of my swimming history on my high school team.
Using image search was a huge letdown. Trying to search for me on google image search using an image of me or something I’ve posted in the past shows results like boy or pictures of people that look similar to me instead of websites I posted them on.
To see if there was a better way to search for data on a person, I used a free data brokering website to see if there was anything they could find about me on the internet. The website could not find anything about me given my name, address, city, and age.
I play a lot of video games. They are a part of my daily life. Many of the games I play are multiplayer connected that store my progress through an account. Because of this, I would consider it a part of my data identity. It has become more apparent to me that in addition to the development of games, there is data associated with those accounts. For example, the game League of Legends is a popular game I play. Every player must have an account with the parent company to play the game. The game also runs off of a client that collects user data. This is because from the client you can report players, bugs, and issues. The client also has an end-of-year player statistic recap. This includes that data about those that you’ve played the most with, win ratios, most played characters, new skins acquired, etc. There is also a website called op.gg that allows people to search players based on region and username. Here you can find win ratios for all champions played for each season the player was active. You can also find their recent games played and which other accounts they play the most with. What I think is important about this type of data collection is when it is used to make deductions about players.
Other websites help to track down win ratios, item builds, pick/ban ratios, and other general statistics based on thousands of players each day. This is a small example of “Big Dick Data” like of definition given by Catherine D’Ignazio and Lauren Klein’s article, The Numbers Don’t Speak for Themselves. A simple explanation of the relationship between this article and the data collection of this video game is that the game is too complicated to balance based on the obvious win-ratios alone. Win-ratios around 50% imply a champion is balanced. However, this incorrectly represents most of the video game. For example, champions that are very mechanically challenging tend to have lower win rates because their high skill cap rewards players that spend time to master the champion and punish those that are new. This type of observation is the same as described in D’Ignazio and Klein’s article. Inappropriate collection and use of “Big Dick Data” having the inverse side effect of creating bias. Creating bias in which champions are stronger than others is not important at all, but it does show a good example of what can go wrong when the big data is not held accountable for proper contextualization of data. You can imagine how important and influential decisions may become when using big data gathered from millions of average workers from all across the country. The stakes become much higher in that case.
Password management is a huge issue during times like this. Since I am just getting started in my professional career, I’ve been making steps towards preparing myself to find a job after college. Doing this includes trying to apply for internships. I am a computer science major and now I’m registering accounts for many recruiter websites like LinkedIn, Glassdoor, and Handshake. Not only that, but many fortune 500 companies with software development internships post internships directly on their website in addition to the third-party posting sites. Therefore, when applying to internships and registering an account with a username and password with different character, number, and symbol requirements, it’s easier to use Google’s password save feature. This way Google saves all my passwords for all of my accounts and automatically fills it in for me when I go to sign in to those websites again. In Critical Questions For Big Data by Danah Boyd and Kate Crawford, this idea is brought up with its negative consequences. Does the storage of my passwords by Google put my security at risk? For example, if the passwords stored by Google were stolen, could there be a way to find which passwords are for my accounts? All of my passwords are very similar to each other and based on previous information you could rule out other possibilities based on the websites I have passwords for. Video game accounts, high school login, universities, recruiting websites, and social media are all good criteria to narrow down a search for data on me. What could be worse is if Google stores data of my account, drive, passwords, etc. all together. Then by narrowing down any one of those data sectors, someone could also find access to all my other data associated with my Google account. This is a big concern especially for those that have important passwords like for a bank account. Unless it becomes impossible for unwanted individuals to get access to these things, big data will always need to be developed to be protected, or else major ethical injustices could be committed.
My data identity doesn't show off who I am as a person. By getting creative with my ideas of a data identity, I feel I've learned more than I intended. There's a lot to my data identity than meets the eye. It goes deeper than just Google searches. It's the little things we do everyday that slip our minds that build our data identity.
Boyd, D., & Crawford, K. (2012). Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication & Society.
D'Ignazio, C., & Klein, L. F. (2020) The Numbers Don’t Speak for Themselves. Data Feminism. Cambridge, MA: MIT Press.