Difference between revisions of "Bias in Information"

From SI410
Jump to: navigation, search
Line 1: Line 1:
 
'''Bias in information''' is recognized when searches for information produce differing results, and sequentially produce different interpretations of those results.  When a user is searching for information they are searching for “the resolution of uncertainty” <ref>https://en.wikipedia.org/wiki/Information</ref>.  The information that is provided to the searcher, in conjunction with the searcher’s understanding of that information, can lead to discrepancies in the searcher's knowledge on the topic.  The act of filtering results in a specific way or only allowing certain information to be accessible to an observer can drastically change the value and meaning of the content provided.  Online search engines, as a product of the computer revolution, provide a space for bias in information to exist.  The prevalence of bias in information, especially among search engines, has lead rise to ethical concerns regarding privacy, the filtering of search results, and the types of biases.   
 
'''Bias in information''' is recognized when searches for information produce differing results, and sequentially produce different interpretations of those results.  When a user is searching for information they are searching for “the resolution of uncertainty” <ref>https://en.wikipedia.org/wiki/Information</ref>.  The information that is provided to the searcher, in conjunction with the searcher’s understanding of that information, can lead to discrepancies in the searcher's knowledge on the topic.  The act of filtering results in a specific way or only allowing certain information to be accessible to an observer can drastically change the value and meaning of the content provided.  Online search engines, as a product of the computer revolution, provide a space for bias in information to exist.  The prevalence of bias in information, especially among search engines, has lead rise to ethical concerns regarding privacy, the filtering of search results, and the types of biases.   
 +
 +
==Type of Bias==
 +
=== Confirmation Bias ===
 +
Confirmation Bias is the tendency for users to interpret new evidence or information as confirmation to their current beliefs. This bias is seen very commonly in sites online, particularly media sites, where users and publishers only present information that backs up their arguments or points. <ref name=bias> Ching, Teo Choong, and Teo Choong Ching. “Types of Cognitive Biases You Need to Be Aware of as a Researcher.” UX Collective, UX Collective, 27 Sept. 2016, uxdesign.cc/cognitive-biases-you-need-to-be-familiar-with-as-a-researcher-c482c9ee1d49. </ref>
 +
 +
=== Groupthink/Bandwagon Bias===
 +
Groupthink or Bandwagon bias occurs when a group of people is working on the same project or production. Users will typically try to have harmonious group-work and will not question or challenge information being presented by other members of their group. <ref name=bias></ref>
 +
 +
=== Selection Bias ===
 +
Selection Bias is a bias that occurs most commonly in research, where researchers pick the number of users and type of users who are being used for research. This results in non-random participants, which makes it nearly impossible to validate the actual findings found in the research. <ref name=bias></ref>
 +
 +
=== Anchoring Bias ===
 +
Anchoing Bias occurs when users or researchers use one single piece of information to make subsequent decisions. Once an Anchor is set, users will continue to base all actions and decisions based off of that anchor and it is particularly difficult to remove once established. <ref name=bias></ref>
  
 
==Search Engine Results==
 
==Search Engine Results==

Revision as of 15:47, 7 April 2019

Bias in information is recognized when searches for information produce differing results, and sequentially produce different interpretations of those results. When a user is searching for information they are searching for “the resolution of uncertainty” [1]. The information that is provided to the searcher, in conjunction with the searcher’s understanding of that information, can lead to discrepancies in the searcher's knowledge on the topic. The act of filtering results in a specific way or only allowing certain information to be accessible to an observer can drastically change the value and meaning of the content provided. Online search engines, as a product of the computer revolution, provide a space for bias in information to exist. The prevalence of bias in information, especially among search engines, has lead rise to ethical concerns regarding privacy, the filtering of search results, and the types of biases.

Type of Bias

Confirmation Bias

Confirmation Bias is the tendency for users to interpret new evidence or information as confirmation to their current beliefs. This bias is seen very commonly in sites online, particularly media sites, where users and publishers only present information that backs up their arguments or points. [2]

Groupthink/Bandwagon Bias

Groupthink or Bandwagon bias occurs when a group of people is working on the same project or production. Users will typically try to have harmonious group-work and will not question or challenge information being presented by other members of their group. [2]

Selection Bias

Selection Bias is a bias that occurs most commonly in research, where researchers pick the number of users and type of users who are being used for research. This results in non-random participants, which makes it nearly impossible to validate the actual findings found in the research. [2]

Anchoring Bias

Anchoing Bias occurs when users or researchers use one single piece of information to make subsequent decisions. Once an Anchor is set, users will continue to base all actions and decisions based off of that anchor and it is particularly difficult to remove once established. [2]

Search Engine Results

The First 10 results

A search engine provides thousands of results as a list where the top items are the most important or relevant to the given query. When researching a specific topic, if the first few results don’t provide the user with what he was searching for, he will retype his search into something more specific and repeat this process until he finds satisfactory results. Given this process, the first few links that appear when a user writes are very important, and most documents will not be even be seen.

Information Overload

Information overload is just what it sounds like, it is quite literally that there is too much information. The amount of information readily available to the public has only increased, with the rise of technology, to the point that a user can be provided with too much information [3]. Information overload is exhibited through the thousands of results given by search engines, and can make it seemingly impossible for an average user to parse through all of the information that exists.

Information overload predates the era of modern technology. It can be seen in older scenarios as well. For example, in both libraries and museums we see overload as one human could not possibly read all of the books in an extensive library or fully study all aspects of a museum. Excessive information can prevent a user from understanding certain information which further may prevent them from making an informed decision.

Search Engines

A search engine is a software system that is designed to carry out a web search on a particular query or phrase that is provided by a user. The information provided from a search can include many different types of media some of which include: articles, documents, images, videos, and infographics. Search engines provide easy access to information that can also be available in specific locations like libraries and museums. Search engines are the most common form of finding information today. Google, for example, process 40,000 queries a second [4], which accounts to 3.5 billion searches a day and 1.2 trillion searches a year.

How Search Engines Work

Search engines.png

A search engine is able to provide thousands of results in second and is able to do so because of the work that occurs in the background. In the background there are three major steps: web crawling, indexing, and the algorithm the search engine performs[5]. In the first step a web crawler searches the World Wide Web in order to find specific documents to add to the search engine’s personal collection. Every time a document is updated or a new document is found, a crawler will add a copy of this document to a collection. This collection of documents, now kept by the search engine in a data center, can be organized and searched through based off of what a user is looking for. In the last step, the algorithm, a search engine must decide how to organize the documents to provide the user with a ranked set of results where ideally the first thing the user sees is what is most relevant to the user’s search. The ranking of these results is based conceptually on how many connections a result has to other potential results. This ranking protocol is referred to as PageRank. [6] Before these three steps can occur, however, a user must write a query for the technology to compute results for. Typically we see this as a phrase, but can also be any type of media for example a picture.

Ethical Concerns

The search for information is an inevitable process which causes many ethical concerns to arise. These ethical concerns come from the bias involved in the search engine design, the filtering of results, and the privacy of the user.

Privacy

Along with the process of finding optimal results, a search engine will also track certain information about a user behind the scenes. The time and date, along with the content, of each query that is searched along with the IP address of the computer searching it is all information that is stored. Although unlikely, pooling similar IP address can get a list of searches by a specific user[7]. The IP address shared with the search engine are not of personal computers but instead of your local router. This gives specific information on geolocation and the types of searches that occur in specific locations. This use of address can be used against users in specific scenarios, for example in China the use of google is prohibited and instead provides different search engines for the country.[8]

Along with the ability to ban specific phrases in certain locations, a search engine also uses past searches and the documents looked as part of their algorithms. When a document is looked at frequently it will move higher up on the list of results due to the fact that users find it relevant. Many websites such as Youtube and Netflix adopt the recommender system that conducts personalized information filtering using search and view history or tracking cookies. The methods that the companies use to gather data are problematic because in most cases users are uninformed, and even so the notifications used to ask for users' consent are sometimes too vague or hard for users to understand.[9]

Bias

Due to the nature of a search engine, and the processes it goes through to provide results, bias can be introduced into the process in each step. In “Values in technology and disclosive computer ethics”, Brey discusses the idea that technology has “embedded values” which means that computers and their software are not “morally neutral” [10]. Somewhere in the process of their design, computers can favor specific values.

Brey discusses three types of biases which we can use to relate and define in search engines :

  • Preexisiting Bias
  • Technical Bias
  • Emergent Bias

The first, preexisiting bias, occurs when from values and attitudes exist prior to the development of the software. In our breakdown of software systems, we can see this when an order of documents is provided after a search. If the systems algorithm always favors certain documents over others, without any interference from outside sponsors, we might always receive the documents first that reflect the values of the creator of the algorithm.

The second bias, technical bias, occurs due to the limitations of the software. Due to the nature of search engines, and the way that humans use them, where often only the first results are even looked at, it is impossible to display certain results – or for humans to even see certain results. Taking it a step further, the documents that can be gathered also have certain limitations. Only the information that is available can be crawled upon and added to the collection. In many situations the information provided can lead to bias, due to the fact that there might be more information for specific things than others.

The last, emergent bias occurs when the system is being used in a way not intended by its designers. When a user enters a phrase, the wording of the phrase can be very important. Different words with the same meaning often have different connotations that can provide different results.

Filtering Results

Showing results to an individual is a process that is dealt with by a search engine's algorithm. Finding the most relevant documents is done so by uniquely identifying and categorizing documents based off of their subject. Due to the nature of recieving a list of relevant documents, it is impossible for all of the results to be shown at once or even seen. Knowing this, search results can be influenced by advertisements and specific companies or sites sponsoring their own documents to be shown above others. Certain documents then receive unfair advantages to be shown higher up than others and can influence the information that will be viewed.

Customized information filtering not only poses concerns to privacy issues but also limit the information that users get exposed to. For instance, Netflix's recommender system gives priorities to information similar to what users have searched or viewed previously. As Xavier Amatriain, the Director of Algorithms Engineering at NetflixFor, says, "over 75% of what people watch comes from our commendations."[11] It is very likely that the similarities among the information can trap users in a loop that isolates them from having the choice of accessing new information. [9]

Consequences

The ethical concerns brought up by searching for information can occur either intentionally or unintentionally, but nonetheless bring consequences. It is important to consider the ethical concerns when searching anything. Information seekers must also understand how to filter though the information that is provided in order to make informed and unbiased decisions.

References

  1. https://en.wikipedia.org/wiki/Information
  2. 2.0 2.1 2.2 2.3 Ching, Teo Choong, and Teo Choong Ching. “Types of Cognitive Biases You Need to Be Aware of as a Researcher.” UX Collective, UX Collective, 27 Sept. 2016, uxdesign.cc/cognitive-biases-you-need-to-be-familiar-with-as-a-researcher-c482c9ee1d49.
  3. https://en.wikipedia.org/wiki/Information_overload
  4. http://www.internetlivestats.com/google-search-statistics/
  5. https://www.bbc.com/bitesize/articles/ztbjq6f
  6. https://en.wikipedia.org/wiki/PageRank
  7. https://www.businessinsider.com/ip-address-what-they-can-reveal-about-you-2015-5
  8. https://cyber.harvard.edu/filtering/china/google-replacements/
  9. 9.0 9.1 https://link.springer.com/content/pdf/10.1007%2F978-3-319-18609-2_10.pdf
  10. https://www.cambridge.org/core/books/cambridge-handbook-of-information-and-computer-ethics/values-in-technology-and-disclosive-computer-ethics/4732B8AD60561EC8C171984E2F590C49
  11. 11.0 11.1 https://www.infoq.com/presentations/machine-learning-netflix