Search engines

From SI410
Jump to: navigation, search

A search engine is an information retrieval system that, when prompted, finds information stored in a database or computer system. Search engines generally output the information in a list formation with each result, commonly called a hit, containing a link to a site as well as a few lines of the site’s content indicating why the result was retrieved. Search engines have been designed to increase the speed and ease with which one can find information and to increase the amount of information about a topic that one can have access to at a time. Search engines use various mathematical and machine-learning techniques along with data mining to produce their results. They provide the user with many benefits but also raise ethical concerns regarding privacy, censorship, and security.

search engines[1]

History of search engines

While the concept of information retrieval has been around for a long time, the first search engine for content files, the ARCHIE, was created by Alan Emtage [2] in December 1990. Built using FTP, file transfer protocol, ARCHIE enabled users to upload and retrieve files along with searching for files. In 1994, WebCrawler [3] became the first widely used search engine. It was also the first to fully index the content of a web page and make all of the words searchable. One of the primary investors in WebCrawler was Paul Allen, a co-founder of Microsoft. In 1994, Yahoo was founded and quickly became popular due to its packaging. In 1998, Google popularized enabling the searching of terms and search engine optimization (SEO) through an algorithm called PageRank. Search engines primarily drive profits through advertisements and each host algorithms for optimizing the advertisement experience for users. Soon after, the search engine business became increasingly profitable and enticing to others.

Companies

Unlike the earlier days of search engines, the market now has hundreds of search engines that are popular for different purposes and in different places. For example, in the United States, India, and the United Kingdom, Google is by far the most popular search engine while in China [4], Baidu is the most popular search engine with the majority market share and compliance with the government’s censorship laws. In Japan and Taiwan [5], Yahoo is the most popular and controls the largest market share. Some people prefer eco-engines that claim to be carbon neutral as a way to oppose the large quantities of energy that engines take to run.

Google

The Google search engine is the most widely used search engine in the world. It was founded by Sergey Brin and Larry Page in 1998 in California. It is also Google’s most popular product. A vast majority of Google’s revenue comes from ads and in Q3 of 2022, almost 60% of Google’s $69.1 billion revenue came from search ads alone [6]. Google search controls over 78% [7] of the market and continues to be the common case and point when people talk about tech giants. Google search uses search engine optimization to provide end users with a clean, and easy-to-use, interface with a personalized experience while retrieving as robust and useful results as it can.

Bing

With the second highest market share in the United States, Bing is the second largest search engine. Bing is a Microsoft product launched by then-CEO Steve Ballmer in 2009 [8] in response to the rise of Google. While Bing may not be the most used engine, it is still quite expansive operating highly in five continents and partnering with Yahoo so that Yahoo search traffic is exclusively served [9] by Bing Ads. There are some differences in search algorithms as Bing has received criticism for appearing to promote misinformation more than its competitors [10]. Bing apparently pushed articles about conspiracy theories and videos on platforms like TikTok higher in their search results than other engines. Regardless, as it is the default search engine for Microsoft’s Edge and Explorer, it is quite widely utilized.

Yahoo! Search

Yahoo is one of the 11 most accessed websites in the world and while it has a smaller market share than Google, it competes more directly with Bing. Yahoo was also made by Jerry Yang and David Filo in 1995 before either. Yahoo has gone through partnerships with both Bing and Google. Since 2011, Yahoo has worked with Bing to power its internet search and from late 2015-2018 it briefly also worked with Google. Yahoo search is the default search engine for Firefox browsers in the US.

Baidu

Baidu is a search engine developed by Robin Li and Eric Xu based in Beijing, China. It controls less than 2% [11] market share globally but over 75% [12] market share in China. Baidu hosts services exclusively in Chinese. Li previously worked for IDD Information Services where, in 1996, he developed the RankDex [13] site-sourcing algorithm which used hyperlinks to measure the quality of the indexed websites and ranked them according to popularity and quality. This predated Google’s PageRank and Larry Page, founder of Google, referenced Li’s work in his patent [14] for PageRank. RankDex’s technology was later used for Baidu’s search engine. Baidu is a private company but is complaint with local laws and China’s censorship as directed by the Chinese government. In November 2022, Baidu was rated non-compliant [15] with the United Nations Global Compact due to issues with respect to complacency to human rights issues. Baidu is the default browser for Microsoft Edge in China.

Ecosia

Ecosia is a German search engine developed by Christian Kroll with the mission of being carbon neutral [16]. Ecosia achieves its carbon goals by planting a tree based on search traffic. Ecosia sends about 80% [17] of its profits to environmental efforts. While the company does not often directly plant trees, it funds organizations powering environmental efforts including planting trees in various areas. Ecosia also targets areas where biodiversity is at risk. It also has its own projects such as a solar farm. Ecosia is powered by Microsoft and search and advertisements are both powered by Bing. Therefore, the profits it uses are derived from the same advertisement algorithms used by Bing. Ecosia can be added to a browser through an extension.

Ethical Implications

Bias

Search engine bias generally refers to the idea that search engines are not neutral but inherently favor certain things over others as well as that there are certain sites that are favored over others by search engines. Since results are derived based on an algorithm, most indexed attributes in the system are given values. These values are not always known even by the developers of the programs that use them. While the ranking of search results appears arbitrary, some sites have understood how to gamify the search engine rankings such as PageRank. Google’s PageRank has been seen to highly rank pages with numerous connections in the system of hubs and authorities [18]. An authority is a page that is linked by many other sources while a hub is a page that links itself to many other pages. The pages of sites that choose to exploit these features typically belong to large companies such as Amazon or eBay [19]. The implications of this can be seen in multiple areas. First, sites that understand this are able to promote disinformation at a higher rate. If a site that chooses to publish disinformation promotes its page using these tactics, more people will end up seeing it than would have ordinarily. Additionally, due to the fact that other parts of the algorithm are not known, it could appear like these pages are highly ranked because they are more reputable or references by reputable sources while they are not. The higher up a site is pushed, the more people will click it which in turn pushes the page up further, continuing the perpetuation of the information on that site.

While there is a general public understanding that search engines follow the principle of search neutrality [20], there are instances as previously described where this is not the case. A study [21] has also found that search engines are not returning neutral information regarding gender. A search for the word “professor” across many search engines returns images where only 15% are of women. Women currently make up about 44% [22] of tenure-track professors. Since the depth of search engine algorithms is unknown, it is difficult to know why this is, however, certain implications are known. Since girls are more likely [23] to go into professions where they see female role models, a search engine’s amplification of gender stereotypes could deter women from pursuing certain professions and could contribute to gender imbalance. One person does not build search engines and pages are not typically pushed on an individual basis by a single developer. Although some people claim to desire neutral search engines, most do want to achieve the “best” [24] search results which inherently produces some search engine bias.

Privacy

There are a couple of ways that search engines run into ethical issues regarding privacy. The first is that search engines collect information about their users essentially making them into data subjects to help things like advertisements and individualizing experiences. There are many components of search engines that lead into this such as logging search queries, browsing history, cookies, and IP addresses. Companies do lay out things that are going to be collected in their privacy agreements which users need to agree to before use. Another area of privacy concern is that when users search for other people on a search engine, various parts of their personal information, including their likeness, can be found.

The Courts

Companies such as Google, Yahoo, Microsoft, and AOL all keep archived logs of every search made by users including the date and time of the search. The search results are generally used to build a profile for a user that can later be used to target advertisements. In their most invasive form, the search results and the profile it creates could be inaccurate and subpoenaed [25] in court cases.

Advertisers

In the case of advertisements, search engines generally monetize information about their users in two ways: by creating a profile of the user and allowing advertisers to target groups of people and by allowing “real-time bidding” <ef> https://www.forbes.com/sites/hessiejones/2021/10/18/real-time-bidding-the-ad-industry-has-crossed-a-very-dangerous-line/?sh=3e5b8b6a48ca </ref>. Real-time bidding is an efficient way to buy advertisement space. An impression is created on a user’s browser, information about the user and the page is given to an ad exchange and advertisers are able to bid, in real time, for those impressions. In this process, information about the user including zip code, GPS location, browser history, and device identifiers are revealed. Bidstream data such as those are stored in the advertiser’s database unless it is manually deleted. Many [26] people see this as an indicator that laws should be created to make privacy a default rather than an option such that data collection and sharing is minimal.

Privacy Breaches

In 2006, AOL Search was the subject of a large data breach. In it, the search engine revealed [27] a list of about 20 million searches attached to 650,000 users who were given unique identifiers unrelated to their names. However, with the data leak, many individual identities were able to be revealed along with their personal searches and information. This could be done through reverse engineering but in a very simple way. Google and Yahoo were both subjects of a Chinese attack that leaked a lot of data. Google launched a large cybersecurity effort while Yahoo did less [28]. Yahoo has continued to be the subject of many other data breaches including a very large one [29] in 2016 involving 500 million user accounts.

Privacy-Minded Browsers

There are some privacy-minded search engines that have sprung up from time to time including DuckDuckGo. DuckDuckGo claims [30] in their privacy statement to collect much less data than their counterparts and sometimes collects no data at all. It also prided itself on being a site that would refrain from most censorship. By way of doing this, the engine has found itself a host of disinformation [31] regarding things like vaccines and police brutality.

Censorship and Surveillance

In the United States

Search engine companies have surveillance powers over search queries. In the early days of the internet, search engines were promoted as a technology that would [32] “give voice to diverse social, economic, and cultural groups … [and] empower the traditionally disempowered, giving them access both to typically unreachable modes of power” (Introna and Nissenbaum). Search engines have the ability to systematically exclude certain sites and broader types of content. There are ethical implications for this with regard to democracy and the first amendment in the United States. For search engines that operate outside of the United States, companies and the public have raised the question of the extent to which laws of other countries can be followed. In the United States, the first amendment does not allow most censorship except when it comes to certain things such as child pornography, trafficking, and computer fraud.

Outside of the United States

Search engines have faced problems when producing their products in other countries. One of the first cases of censorship against a search engine came from British Columbia in Canada where Google was ordered to remove the website of a company that had re-labeled the networking technology of a Canadian company and sold the equipment as its own. The court stated [33] that “[t]his is not an order to remove speech that, on its face, engages freedom of expression values, it is an order to de-index websites that are in violation of several court orders”. Following this, a precedent that national governments could censor search results based on their own laws was built.

China

Another example of censorship is in China. China has some of the strictest [34] censorship laws in the world such that the original versions of search engines cannot operate there. Various search engines operated in China using Chinese versions of the service that were initially subject to censorship by the government. This included [35] blocking sites and producing warnings for “insensitive” searches. Google specifically had to ban YouTube in China and de-index those videos from their search engine when a video of Chinese officials beating Tibetan protesters was released [36]. Following a series of cyberattacks called “Operation Aurora” [37] targeting numerous parts of the US private sector including search engines such as Google and Yahoo Search, many search engines decided that unless they could produce an uncensored version of their product, they would pull out of China which is what they did in 2010. In August of 2018, a Google project called “Dragonfly” was leaked and revealed by The Intercept [38] to relaunch Google in China abiding by the country’s media laws. The search engine was to blacklist certain searches which would be incorporated into autofill and image search. The project created a large stir among the Google staff who said the project raised “urgent moral and ethical issues”. An internal letter [39] stated that the Dragonfly project goes against the Google AI ethical code, “which says that the company will not build or deploy technologies ‘whose purpose contravenes widely accepted principles of international law and human rights.’” After the clash within the company regarding ethical concerns, the project was shut down and the project was not deployed. Currently, China’s largest search engine is Baidu which fully complies with the censorship laws. Microsoft’s Bing has a Chinese version that still operates in China holding about 2% of the market share in the country and is the only major foreign search engine available in the country as it also complies [40] with the government.

References

  1. https://appleinsider.com/inside/macos-ventura/best/the-best-search-engines-to-use-if-youre-tired-of-google
  2. https://daily.jstor.org/alan-emtage-first-internet-search-engine/
  3. https://carlhendy.com/history-of-search-engines/#lycos
  4. https://gkbooks.in/top-10-search-engines/#Search_Engine_Popularity_by_Country
  5. https://geography.oii.ox.ac.uk/age-of-internet-empires/
  6. https://www.oberlo.com/statistics/how-does-google-make-money#:~:text=Google%20revenue%20breakdown%20(Q3%202022,%25)%20was%20from%20search%20ads.
  7. https://ntelt.cikd.ca/top-5-search-engines-used-in-daily-life/
  8. https://ntelt.cikd.ca/top-5-search-engines-used-in-daily-life/
  9. https://about.ads.microsoft.com/en-us/blog/post/january-2019/microsoft-and-verizon-media-strengthen-search-partnership
  10. https://fsi.stanford.edu/news/bing-search-disinformation
  11. https://www.statista.com/statistics/1219413/market-share-held-by-baidu-worldwide/
  12. https://www.searchenginejournal.com/top-chinese-search-engines/456497/
  13. https://www.rankdex.com/about.html
  14. https://patents.google.com/patent/US6285999B1/en
  15. https://www.reuters.com/article/china-esg-downgrade-idUSL4N32320Q
  16. https://www.ecosia.org/
  17. https://www.ethicalconsumer.org/technology/how-ethical-search-engine-ecosia
  18. https://safecont.com/en/ranking-urls-hubs-authorities/
  19. https://link.springer.com/chapter/10.1007/978-3-540-75829-7_2
  20. https://psu.pb.unizin.org/ist110/chapter/2-4-search-neutrality/
  21. https://link.springer.com/chapter/10.1007/978-3-030-86144-5_19
  22. https://www.aauw.org/resources/article/fast-facts-academia/
  23. https://journals.sagepub.com/doi/10.1111/j.1471-6402.2006.00260.x
  24. https://psu.pb.unizin.org/ist110/chapter/2-4-search-neutrality/
  25. https://www.wiley.com/en-us/Ethics+and+Technology%3A+Controversies%2C+Questions%2C+and+Strategies+for+Ethical+Computing%2C+5th+Edition-p-9781119186571
  26. https://www.eff.org/deeplinks/2020/03/google-says-it-doesnt-sell-your-data-heres-how-company-shares-monetizes-and
  27. https://www.proquest.com/docview/1782998082?parentSessionId=4ns6oTsGraAg9bpwuwnaMr7X%2FWujQI7qXgNO%2FMwT7Tk%3D
  28. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2883607
  29. https://lifelock.norton.com/learn/data-breaches/company-data-breach
  30. https://duckduckgo.com/
  31. https://www.vox.com/recode/22981115/duckduckgo-free-speech-privacy-oops
  32. https://www.researchgate.net/publication/2410076_Shaping_The_Web_Why_The_Politics_Of_Search_Engines_Matters
  33. https://www.cbc.ca/news/canada/british-columbia/google-ruling-1.4181322
  34. https://www.scmp.com/news/china/politics/article/3199997/china-step-internet-censorship-stricter-rules-social-media-and-streaming-sites
  35. http://www.cnn.com/2006/BUSINESS/01/25/google.china/
  36. https://www.theguardian.com/world/2009/mar/25/china-blocks-youtube
  37. https://www.sciencedirect.com/topics/computer-science/operation-aurora
  38. https://theintercept.com/2018/08/01/google-china-search-engine-censorship/
  39. https://theintercept.com/2018/08/16/google-china-crisis-staff-dragonfly/
  40. https://www.reuters.com/technology/microsoft-bing-says-suspended-auto-suggest-function-china-government-behest-2021-12-17/