Difference between revisions of "Bias in Information"

From SI410
Jump to: navigation, search
Line 105: Line 105:
  
 
===== Stereotype and Discrimination Reinforcement =====
 
===== Stereotype and Discrimination Reinforcement =====
Consequences of social and preexisting bias in information can also implicitly undermine disadvantaged groups by associating them with negative traits and not associating them with positive traits. In machine learning and facial recognition algorithms created by large technology companies like IBM, Microsoft, and Amazon, women with darker skin were found to be misgendered about 1/3 of the time compared to their lighter skinned counterparts. <ref>Buolamwini, Joy. “Artificial Intelligence Has a Racial and Gender Bias Problem.” Time, Time, 7 Feb. 2019, time.com/5520558/artificial-intelligence-racial-gender-bias/.</ref> In this case, this reinforces the detrimental notions that some hold where women with darker skin are regarded as less feminine.<ref>Fahs, Breanne. "The dreaded body: disgust and the production of “appropriate” femininity." Journal of Gender Studies 26.2 (2017): 184-196.</ref> This creates issues because instead of ensuring groups of people get equal treatment based on the merit of their individual characteristics, they get unfair assumptions attached to their identity based on phenotypical features.
+
Consequences of social and preexisting bias in information can also implicitly undermine disadvantaged groups by associating them with negative traits and not associating them with positive traits. One research study found that people with African-American sounding names needed to send out more resumes than people with white-sounding names in order to get a callback.<ref>Bertrand, M. & Mullainathan, S. (2004). Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review. https://www.nber.org/papers/w9873</ref> This showcases the bias that is rampant in the corporate industry as people are deemed unworthy by the names on their resumes instead of the skills and experiences they possess. In machine learning and facial recognition algorithms created by large technology companies like IBM, Microsoft, and Amazon, women with darker skin were found to be misgendered about 1/3 of the time compared to their lighter skinned counterparts. <ref>Buolamwini, Joy. “Artificial Intelligence Has a Racial and Gender Bias Problem.” Time, Time, 7 Feb. 2019, time.com/5520558/artificial-intelligence-racial-gender-bias/.</ref> In this case, this reinforces the detrimental notions that some hold where women with darker skin are regarded as less feminine.<ref>Fahs, Breanne. "The dreaded body: disgust and the production of “appropriate” femininity." Journal of Gender Studies 26.2 (2017): 184-196.</ref> This creates issues because instead of ensuring groups of people get equal treatment based on the merit of their individual characteristics, they get unfair assumptions attached to their identity based on phenotypical features.
  
 
=== Filtering Results ===
 
=== Filtering Results ===

Revision as of 21:19, 21 April 2019

Bias in information is defined as searching for information to produce results that differ from one another and sequentially produce different interpretations of these results. When users search for information, they are searching for “the resolution of uncertainty” [1]. There may be some confusion searchers as they try to understand their search results which can lead to discrepancies in the searcher's knowledge on the topic. The act of filtering results in a specific way or only allowing certain information to be accessible to an observer can drastically change the value and meaning of the content provided. Online search engines provide a space for biased information to exist. The prevalence of bias in information among search engines lead to ethical concerns regarding privacy, the filtering of search results, and types of biases that may occur among social groups.

Bias in searching for information[2]

Types of Bias

There are different types of bias in information which can be grouped into three main categories: general bias, research bias, and news bias. Among the types of bias in general bias are confirmation bias and groupthink/bandgwagon bias. Research bias consists of selection bias, anchoring bias, response bias and non-response bias. The types of news bias include commercial bias, bad news bias, status quo bias, access bias, visual bias, fairness bias, narrative bias, expediency bias, glory bias and spin.

General Bias

Confirmation Bias

Confirmation bias is defined as users interpreting information as a confirmation to their current beliefs. This is common in sites online, particularly in media sites where users and publishers only present information that prove their points. [3] Confirmation bias has the potential to lead to self-fulfilling modes of thinking that may inhibit civil discussion or debate.

Groupthink/Bandwagon Bias

Groupthink or Bandwagon bias may occur in settings where large groups of people share a common motive for wanting to come together. Out of fear of becoming isolated from the group, participants typically try to maintain a harmonious work environment. So they will refrain from sharing their honest opinions on controversial decisions. [3]

Research Bias

Selection Bias

Selection bias is common in research, where researchers decide the number of users and the type of users to use for research. This results in non-random participants, which makes it nearly impossible to validate the actual findings found in the research. [3] Researchers can tailor the participants picked in a way that yields results in line with their bias.

Anchoring Bias

Anchoring bias occurs when users or researchers use a single piece of information to make subsequent decisions. In reality, there should be a certain amount of information to allow researchers to decide which information is best in hopes of avoiding bias. Once an Anchor is set, users will continue to base all actions and decisions based off of that anchor which is particularly difficult to remove once established. [3]

Response Bias

Response Bias refers to the wide range of tendencies that people can have when answering a survey or questionnaire inaccurately[4]. Response Biases can drastically affect the validity of surveys or questionnaires. Social norms, the wording of a particular question, or the desire of the participant to answer in a way that would confirm the researchers' null hypothesis are all examples of just a few of the possible causes of response bias.

Non-Response Bias

Non-Response Bias is when the results of surveys, questionnaires or elections become inaccurate because there is too large of a number of participants that did not participate[5]. The population that is left that actually did participate in the survey is no longer representative of the target population because there are too few participants in order to gather reliable data. The most common recommended protection against non-response bias is to reduce the amount of non-response from participants[6].

News Bias

Commercial Bias

Commercial bias is defined as the news that must remain "new," therefore news outlets rarely ever double-check their sources. This information may already be reported or take the form of stories that are considered as "old." This leads to a bias in the information released in the news as a form of "new" content. [3]

Bad News Bias

Bad news bias is defined as news outlets that highlight stories that are scary or threatening because it can generate more views. News providers attempt to pique the interest of viewers with shocking stories to benefit themselves. This creates a bias in the types of stories as less concerning stories are being shared via news. [3]

Status Quo Bias

Status quo bias is defined as the preference people have for things to stay the same, which causes new outlets to stick to their typical routine. This type of bias stems from people in fear of the consequences of changing their preferences to something "new."[7] News outlets exhibit status quo bias by reporting on the same types of stories in fear of losing viewership so they avoid reporting on different stories.

Access Bias

Journalists and readers may compromise the transparency of the news in order to gain access to powerful people as story sources. News outlets create a bias in the information they report on as they are simply leveraging the power of well-known public figures.

Visual Bias

Stories with a visual hook are more likely to attract a larger audience. News outlets focus on stories that have some type of visual appeal to their audience. [8] This leads to a bias in these type of stories and information which news outlets report on as they are reporting on stories that has visual aspect.

Fairness Bias

Fairness bias is defined as reporters presenting their opposing viewpoints in order to seem "fair" regardless of their opinions. This bias is most prevalent in news reporting on politics. [8] News outlets seek to create the idea that politicians are always in opposition and can never agree. This can lead to bias in which news outlets are targeting one party or another.

Narrative Bias

News outlets present a story as a narrative with a beginning, middle, and an ending. However, many real-life news stories are middle and there shouldn't be an ending including in the news report. In other words, viewers don't get information to gain a better understanding of the topic just the main part with a limited conclusion. Ending of the story meaning what is the solution, what are the next steps and etc. Journalists try to tackle the problem by inserting a provisional ending, therefore making the reports seem more conclusive than they actually are. They are focused on having a nice way of ending the story as opposed to giving the viewers an answer of how the story ends. This type of bias in news reporting attempts to create drama throughout the narrative storyline as it generally leads to more interesting stories and increases the number of viewers.[8]

Expediency Bias

Expediency bias is defined as news outlets that seek to a report on information that can be obtained quickly, easily, and inexpensively.[8] News outlets are extremely competitive and seek to report on information that seems attractive and appealing to the larger audience. This leads to bias in this type of information as it is obtained quickly and easily. Therefore, reporters should focus on "fact checking" their sources in case they need to discover other resources to ensure their credibility.

Glory Bias

Glory bias is prevalent when news reporters assert themselves into the story that they are reporting on. [8] This type of bias leads to journalists attempting to establish a cultural identity as a knowledgeable insider. In reality, journalists should observe and "keep track" of the details in the stories so it's reported without bias.

Spin

"Spin" involves emphasizing certain aspects of a news story with the hope that other aspects can be ignored.[9] It matters how one discuss a subject as these details and emotions can reflect as the "truth" regardless if its factual or not. An example of this is when house prices are low and people share that it is bad for "sellers". Alternatively, when housing prices are "up", people say this is bad for "buyers". In reality, it is a lose-lose situation regardless of the scenarios. [10]

Search Engine Results

The First 10 results

A search engine can provide thousands of results as a list of ranked items where the top items are considered to be the most "relevant" pages. These results are based on the user's given query. If one is researching on a specific topic and if the top results don’t provide the user with what they are searching for they will repeatedly refine the contents of the query until they find exactly what they are looking for. As a result, the first few links that appear when a user performs their search may repeatedly be ranked highest among the returned results from query to query. However, this may exclude a number of important, potentially opposing pieces of content. An example being Safiya Umoja Noble's research of the misrepresentative results given after searching the term "black girls".[11] In the context of news and media, this may lead to a number of self-fulfilling biases and/or discrimination.

Information Overload

Information overload means that one is overwhelmed and trying to process a lot of information. This amount of information is readily available to the public has only increased over time.[12]. Information overload is exhibited through the thousands of results given by search engines and can make it seemingly impossible for an average user to parse through all of this information. For example, in libraries and museums, it is witnessed that one human could not possibly read all of the books in an extensive library or fully study all aspects of a museum. Excessive information can prevent a user from understanding certain information so it prevents them from making an informed decision.

Search Engines

A search engine is a software system that is designed to carry out a web search on a particular query or a phrase that is provided by a user. The information provided from search results can include many different types of media such as articles, documents, images, videos, and infographics. Search engines provide easy access to information that is available in specific locations - libraries and museums. Search engines are the most common form of finding information today. Google, for example, processes 40,000 queries a second [13], which accounts to 3.5 billion searches a day and 1.2 trillion searches a year.

How Search Engines Work

Search engines.png

A search engine can provide thousands of results in seconds and has the ability to do so because of its background work. In the background, there are three major steps: web crawling, indexing, and the algorithm the search engine performs[14]. A web crawler searches the World Wide Web to discover specific documents which can be added to the search engine’s personal collection. Every time a document is updated or a new document is found, a crawler will add a copy of this document to a collection. This collection of documents is kept by the search engine in a data center. This can be organized and searched through based on what a user is looking for. The algorithm is a search engine that must decide how to organize the documents to provide the user with a ranked set of results as its ideal for the user to see the most relevant results. The ranking of these results is based conceptually on how many connections a result has to other potential results. This ranking protocol is referred to as PageRank. [15] Before these three steps can occur a user must write a query for the technology to compute its results.

Search Engine Optimization (SEO)

Search Engine Optimization refers to the process of optimizing a website or web page's online visibility from a web search through organic search results. SEO is attempting to improve search engine rankings, especially on Google. SEO is a major internet marketing strategy and is examined to detect the most useful keywords and search terms typed into search engines to produce a higher ranked result for a given website.

Ethical Concerns

Searching for information is today's reality and the process inevitably causes many ethical concerns to arise. These ethical concerns come from the bias involved in the search engine design, the filtering of results, and the privacy of the user.

Privacy

Along with the process of finding optimal results, a search engine will track certain information about a user. The time and date of each query that is searched along with the IP address of the information is stored. Although unlikely, pooling similar IP address can get a list of searches by a specific user[16]. The IP address that is shared with the search engine is on your local router. This provide specific information on geolocation. The use of geolocation can be used against users in specific scenarios, for example in China the use of google is prohibited and provide different search engines.[17]

Along with the ability to ban specific phrases in certain locations, a search engine uses past searches and the documents looked as part of their algorithms. When a document is viewed frequently its ranking increases on the list of results because users find it relevant.Youtube and Netflix adopt the recommender system that conducts personalized information filtering using search and view history or tracking cookies. The methods that companies use to gather data are problematic because users are not informed and notifications used to ask for users' consent is too vague or hard for users to comprehend.[18]

Bias

Bias in algorithm[19]

Bias can be introduced into the process in each step because of the nature of search engines and how it reflect its results. In “Values in technology and disclosive computer ethics”, Brey discuss the idea that technology has “embedded values” meaning that computers and their software are not “morally neutral” [20]. Computers can favor specific values because of its design and structure.

Brey discuss three types of biases to relate and define in search engines :

  • Preexisting Bias
  • Technical Bias
  • Emergent Bias

Preexisting bias occurs when values and attitudes exist prior to the development of the software. Software systems breakdowns occur when an order of documents is provided after a search. If the algorithm of the system always favors certain documents over others without any interference from outside sponsors it might receive the documents first that reflect the values of the creator of the algorithm.

Technical bias occurs due to the limitations of the software. Because of the nature of search engines and how people use them it is impossible to display certain results. It is impossible for people to view certain results. The documents that has been gathered can have certain limitations.In many situations, this type of information can lead to bias due to the fact that there might be more information for specific things in comparison to other topics.

Emergent bias occurs when the system is being used in a way that is not intended by its designers. When a user enters a phrase, the wording of the phrase is very important. Different words with the same meaning can have different connotations which can provide different results.

Social Bias

Studies show that search engines reinforce many social biases and stereotypes. For example, Google images has been criticized for its lack of diversity in its search results [3]. Another example, if one search for the word “doctor” it reflect significantly more men than women. it is true there are more male doctors than female doctors, Google images shows a very disproportionate amount of male to female doctors. Another report shows that when searching for three black teenagers the results include a series of mugshot photos. Whereas when searching for three white teenagers images of smiling young adults appear [3]. Google search results are affected by preexisting bias, technical bias, and emergent bias. This is harmful because these results perpetuate societal stereotypes and values. Engineers and users must be mindful of the implications of these biases and work to overcome them in technology and society.

User Bias
Bias in autocomplete suggestions for different search engines[21]

User input plays a huge role in the existence of bias in search engines. When users type the same term in the search box, different search engines like Yahoo, Google, and Bing give different autocomplete words. For example, Google might give positive search suggests for the search term "Hillary Clinton" while Yahoo and Bing might give negative suggests. This kind of bias originates from different user behavior and feature for different search engines.[21] The majority of Bing users are between 55 and 64 years old, and Google has much younger users compared to Bing and Yahoo.[22] Besides age, the users have different economic, social, and culture backgrounds that make a huge contribution to search history and search behavior for search engines.

Stereotype and Discrimination Reinforcement

Consequences of social and preexisting bias in information can also implicitly undermine disadvantaged groups by associating them with negative traits and not associating them with positive traits. One research study found that people with African-American sounding names needed to send out more resumes than people with white-sounding names in order to get a callback.[23] This showcases the bias that is rampant in the corporate industry as people are deemed unworthy by the names on their resumes instead of the skills and experiences they possess. In machine learning and facial recognition algorithms created by large technology companies like IBM, Microsoft, and Amazon, women with darker skin were found to be misgendered about 1/3 of the time compared to their lighter skinned counterparts. [24] In this case, this reinforces the detrimental notions that some hold where women with darker skin are regarded as less feminine.[25] This creates issues because instead of ensuring groups of people get equal treatment based on the merit of their individual characteristics, they get unfair assumptions attached to their identity based on phenotypical features.

Filtering Results

The search engine's algorithm contributes to the type of results that are reflected to its individual. In order to find the most relevant documents, its filtering to identify and categorize documents based on the selected subject.Due to the nature of receiving a list of relevant documents, it is impossible for all of the results to be shown at once or viewed. Search results can be influenced by advertisements and specific companies or sites sponsoring their own documents to be prioritized in comparison to other documents. This leads to certain documents receiving less attention and awareness that this information within the document does exist.

Filtering that is used to customize information poses concerns to privacy issues and limits the information that users are exposed to. For instance, Netflix's recommender system allows priorities to information similar to what users searched for. As Xavier Amatriain, the Director of Algorithms Engineering at NetflixFor, says, "over 75% of what people watch comes from our commendations."[26] It is very likely that the similarities among the information can trap users in a loop that isolates them from having the choice of accessing new information. [18]

Filter Bubbles

Search engines and other services that make use of filtering algorithms to tailor user's results towards their interests creates "filter bubbles." Eli Pariser describes these search engines as "creating a unique universe of information for each of us." [27] Pariser coined the term filter bubble to describe this unique universe of information. He believes that filter bubbles present three new dynamics to personalization:[27]

  • 1. Each user is alone in their filter bubble.
  • 2. The filter bubble is invisible.
  • 3. Each user doesn't choose to enter their prospective filter bubble.

The implications of this is that every user who uses the filtering mechanisms of search engines or other services is trapped into their own filter bubble of personalization unknowingly. These users don't choose to enter the filter bubbles yet search algorithms can create the filter bubbles for them based on their search history.[27]

Filter bubbles provide the benefit of creating a personalized universe of information for each user. However, it yields some negative consequences. As the filter bubble is created it only shows users information that are similar to their previous interests.[27] This can lead to users confining their interests and restricting them from searching for new or different. It results in users not being able to branch out and explore other content.[27]

Filter bubbles grow smaller and more precise over time. As filtering and search algorithms gather more data on their users, they are able to provide more accurate recommendations which in turn leads to more restricted search results. As the filter bubble grows smaller, the amount of unique content which each user sees diminishes as well.[27]

Echo Chambers

Similar to filter bubbles, echo chambers are the result of filtering and personalization algorithms. Echo chambers are described as, "a situation where certain ideas, beliefs or data points are reinforced through repetition of a closed system that does not allow for the free movement of alternative or competing ideas or concepts."[28] Echo chambers generally have one view strongly represented while opposing ones are excluded, often resulting in a system iwherewhich alternate ideas and concepts are never introduced and these ideals of the system are "echoed" back and forth without a change in discourse and become more ingrained in participant's beliefs. Echo chambers are primarily found on social media platforms and often surrounding political ideologies.

See Also

References

  1. “Information.” Wikipedia, Wikimedia Foundation, 9 Apr. 2019, en.wikipedia.org/wiki/Information.
  2. “Should the Google Search Engine Be Answerable To Competition Regulation Authorities?” Economic and Political Weekly, 7 Sept. 2018, www.epw.in/engage/article/should-google-search-engine-be.
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Ching, Teo Choong, and Teo Choong Ching. “Types of Cognitive Biases You Need to Be Aware of as a Researcher.” UX Collective, UX Collective, 27 Sept. 2016, uxdesign.cc/cognitive-biases-you-need-to-be-familiar-with-as-a-researcher-c482c9ee1d49.
  4. "Response Bias". Wikipedia, Wikimedia Foundation, 18 April 2019. en.m.wikipedia.org
  5. "Participation Bias". Wikipedia, Wikimedia Foundation. 19 April 2019.
  6. "Estimating Nonresponse Bias in Mail Surveys". Armstrong, J. Scott. Journal of Marketing Research Vol14. No. 3. Special Issue. 19, April 2019
  7. “Status Quo Bias.” Behavioraleconomics.com | The BE Hub, www.behavioraleconomics.com/resources/mini-encyclopedia-of-be/status-quo-bias/.
  8. 8.0 8.1 8.2 8.3 8.4 “Media / Political Bias.” Media, rhetorica.net/bias.htm.
  9. " Allen, Dr. Steven J. “Deception and Misdirection - Media Bias: 8 Types [a Classic, Kinda].” Capital Research Center: America’s Investigative Think Tank, 24 Nov. 2015, capitalresearch.org."
  10. " Allen, Dr. Steven J. “Deception and Misdirection - Media Bias: 8 Types [a Classic, Kinda].” Capital Research Center: America’s Investigative Think Tank, 24 Nov. 2015, capitalresearch.org."
  11. Noble, Safiya Umoja. “Critical Surveillance Literacy in Social Media: Interrogating Black Death and Dying Online.” Black Camera, vol. 9, no. 2, 2018, p. 147., doi:10.2979/blackcamera.9.2.10.
  12. “Information Overload.” Wikipedia, Wikimedia Foundation, 28 Mar. 2019, en.wikipedia.org/wiki/Information_overload.
  13. “Google Search Statistics.” Google Search Statistics - Internet Live Stats, www.internetlivestats.com/google-search-statistics/.
  14. “How Do Search Engines Work? - BBC Bitesize.” BBC News, BBC, 23 Oct. 2018, www.bbc.com/bitesize/articles/ztbjq6f.
  15. “PageRank.” Wikipedia, Wikimedia Foundation, 10 Apr. 2019, en.wikipedia.org/wiki/PageRank.
  16. Weissman, Cale Guthrie. “What Is an IP Address and What Can It Reveal about You?” Business Insider, Business Insider, 18 May 2015, www.businessinsider.com/ip-address-what-they-can-reveal-about-you-2015-5.
  17. Replacement of Google with Alternative Search Systems in China - Documentation and Screen Shots, cyber.harvard.edu/filtering/china/google-replacements/.
  18. 18.0 18.1 Fröding, Barbro, and Martin Peterson. “Why Virtual Friendship Is No Genuine Friendship.” SpringerLink, Springer Netherlands, 6 Jan. 2012, link.springer.com/article/10.1007/s10676-011-9284-4.
  19. Noble, Safiya Umoja. Algorithms of Oppression: How Search Engines Reinforce Racism. New York University Press, 2018.
  20. Brey, Philip. “Values in Technology and Disclosive Computer Ethics (Chapter 3) - The Cambridge Handbook of Information and Computer Ethics.” Cambridge Core, Cambridge University Press, www.cambridge.org/core/books/cambridge-handbook-of-information-and-computer-ethics/values-in-technology-and-disclosive-computer-ethics/4732B8AD60561EC8C171984E2F590C49.
  21. 21.0 21.1 ipullrank. “Dr. Epstein, You Don't Understand How Search Engines Work.” IPullRank, 14 Sept. 2016, ipullrank.com/dr-epstein-you-dont-understand-how-search-engines-work/.
  22. Sentance, Rebecca. “What Are the Differences in How Age Demographics Search the Internet?” UserZoom, 11 Dec. 2018, www.userzoom.com/blog/what-are-the-differences-in-how-age-demographics-search-the-internet/.
  23. Bertrand, M. & Mullainathan, S. (2004). Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review. https://www.nber.org/papers/w9873
  24. Buolamwini, Joy. “Artificial Intelligence Has a Racial and Gender Bias Problem.” Time, Time, 7 Feb. 2019, time.com/5520558/artificial-intelligence-racial-gender-bias/.
  25. Fahs, Breanne. "The dreaded body: disgust and the production of “appropriate” femininity." Journal of Gender Studies 26.2 (2017): 184-196.
  26. 26.0 26.1 Amatriain, Xavier. “Machine Learning & Recommender Systems at Netflix Scale.” InfoQ, InfoQ, 16 Jan. 2014, www.infoq.com/presentations/machine-learning-netflix.
  27. 27.0 27.1 27.2 27.3 27.4 27.5 Pariser, Eli. The Filter Bubble: What the Internet is Hiding From You. The Penguin Press, New York, 2011.
  28. “What Is an Echo Chamber? - Definition from Techopedia.” Techopedia.com, www.techopedia.com/definition/23423/echo-chamber.