Difference between revisions of "Wikipedia Bots"

From SI410
Jump to: navigation, search
Line 23: Line 23:
 
  Another category of bots, bots which create new articles or add content to preexisting articles, are involved with the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. Misinformation violates the trust that users have in the website. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. <ref>Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System" https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System</ref>.  Another case of misinformation involves a 2017 study called "Even good bots fight: The case of Wikipedia"<ref>Tsvetkova, García-Gavilanes, Floridi, & Yasseri (2017) "Even good bots fight: The case of Wikipedia"</ref> that was published by the scientific journal Plos One. The study examined the volume and rate at which Wikipedia bots reverted each other's edits, observing that bots reciprocally reverted each other's edits over periods that lasted years. News websites and blogs then wrote a number of articles with titles such as "Study reveals bot-on-bot editing wars raging on Wikipedia's pages" and "People built AI bots to improve Wikipedia. Then they started squabbling in petty edit wars, sigh". A follow up study <ref>Geiger & Halfaker (2017) 'Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of “Even Good Bots Fight”'</ref> found that a majority of the bot reverts, suspected of being instances of bot on bot conflict, were classified as routine and productive, in contrast with the assertions of bot wars by the articles. The articles are an instance of propagation of misinformation via exaggeration in the info-sphere.
 
  Another category of bots, bots which create new articles or add content to preexisting articles, are involved with the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. Misinformation violates the trust that users have in the website. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. <ref>Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System" https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System</ref>.  Another case of misinformation involves a 2017 study called "Even good bots fight: The case of Wikipedia"<ref>Tsvetkova, García-Gavilanes, Floridi, & Yasseri (2017) "Even good bots fight: The case of Wikipedia"</ref> that was published by the scientific journal Plos One. The study examined the volume and rate at which Wikipedia bots reverted each other's edits, observing that bots reciprocally reverted each other's edits over periods that lasted years. News websites and blogs then wrote a number of articles with titles such as "Study reveals bot-on-bot editing wars raging on Wikipedia's pages" and "People built AI bots to improve Wikipedia. Then they started squabbling in petty edit wars, sigh". A follow up study <ref>Geiger & Halfaker (2017) 'Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of “Even Good Bots Fight”'</ref> found that a majority of the bot reverts, suspected of being instances of bot on bot conflict, were classified as routine and productive, in contrast with the assertions of bot wars by the articles. The articles are an instance of propagation of misinformation via exaggeration in the info-sphere.
  
[[2019New]]
+
[[Category:2019New]]

Revision as of 12:46, 19 April 2019

Wikipedia Bots.png

WIkipedia bots are internet bots that maintain Wikipedia, a user-edited online encyclopedia. As of March 15, 2019 Wikipedia contains over five million [1] English articles, as well as many more in 302 other languages[2]. Wikipedia depends on volunteers to grow and maintain the site. Wikipedia bots facilitate maintenance of the site by automatically performing tasks like editing article formatting and detecting Wikipedia policy violations. In 2014, there were 274 active bots and they accounted for 15% of all edits on Wikipedia. [3] The bots demonstrate how autonomous agents can improve the info sphere they inhabit. However, due to their potential to also cause wide damage to infosphere if left unchecked, there are policies concerning the operation of the bots that are set in place to safe guard the website.

Internet bots

An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed much faster and at a larger scale than they would be otherwise be done manually. Internet bots are currently an integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The Mirai_Botnet is an infamous example of such an attack. Well intentioned bots can cause damage as well due to unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. This means that a bot can run continuously for years and can independently make decisions. By Floridi's criteria[4] since an internet bot is interactive, autonomous, and adaptable, bots can be classified as Artificial Agents and further classified as moral agents if they make morally quantifiable decisions.

Bot Policy

Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and usefulness to the site. [5] Several components of the policy concern transparency, which is the degree of which information is available. [6] A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the rate at which it will complete those tasks. Additionally the bot must be clearly identifiable as bot by its username. A bot operator must submit a request to the Bot Approvals Group and receive approval, before starting a trial period of the bot. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's strict policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. Twitter's open policy has enabled the proliferation of bots on the platform, including malware and spam bots. [7]

Benefits to Info-sphere

Bots can improve the quality of Wikipedia articles by doing maintenance tasks like correcting reference formatting and linking to other articles. Additionally, they can aid in dealing with two issues that the website faces: vandalism and copyright infringement. Wikipedia's policy of open editing, allows for a range of disruptive actions including editing pages to include false information, including offensive content, and deleting existing high quality content. Cluebot[8] is a bot responsible for detecting vandalism, and is highly active, having made over 2 million edits[9]. Cluebot uses machine learning to classify edits vandalism. Another issue that bots can help assuage is that articles can include text copied directly from copy written sources. Bots can compare edits with copy-written material, flagging duplicates. Since bots are able to perform these checks more rapidly than humans, bots help Wikipedia balance a policy of openness with associated issues of user freedom.

Dangers to Info-sphere

Opaqueness

Another potential issue is opaqueness of policy. The source code for some bots is publicly available, though the bot policy does not require that the developer to publish a bot's source code. Cluebot's machine vandalism detection algorithm uses a neural network, which in comparison with other machine learning algorithms, makes it difficult to discern how features are utilized by the algorithm, since a neural network learns the features used for classification. [10]. According to Fleischmann and Wallace's convenant with transparency[11], transparent models allow the outside user to view the model's depiction of reality and its values. As such, Cluebot's spam detection model can be classified as a block box.

Bias and Neutrality

A black box model, along with presenting the issue of transparency, presents the issue of bias, since a black box model can hide algorithmic bias. Machine learning algorithms trained on human language data, have been shown to display stereotype bias.[12] An example of bias in Wikipedia bots would be a profiling algorithm which is weighted more heavily against anonymous users. [13]. The consequence of a false positive in the algorithm, is that there may be invalid reverts to articles. In 2010, 13% of wikipedia editors were female. [10] Reverting an edit can discourage future editing, which can exacerbate the gender ratio issue, increasing gender bias, as women may more strongly experience the effects of criticism. Related to the concept of bias, is the concept of neutrality. Bots have embedded values, which means that they have a tendency to promote certain values.[14] For example, a bot which removes profane language, must determine what language is and is not profane.

Truth

Another category of bots, bots which create new articles or add content to preexisting articles, are involved with the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. Misinformation violates the trust that users have in the website. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. [15]. Another case of misinformation involves a 2017 study called "Even good bots fight: The case of Wikipedia"[16] that was published by the scientific journal Plos One. The study examined the volume and rate at which Wikipedia bots reverted each other's edits, observing that bots reciprocally reverted each other's edits over periods that lasted years. News websites and blogs then wrote a number of articles with titles such as "Study reveals bot-on-bot editing wars raging on Wikipedia's pages" and "People built AI bots to improve Wikipedia. Then they started squabbling in petty edit wars, sigh". A follow up study [17] found that a majority of the bot reverts, suspected of being instances of bot on bot conflict, were classified as routine and productive, in contrast with the assertions of bot wars by the articles. The articles are an instance of propagation of misinformation via exaggeration in the info-sphere.
  1. https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
  2. https://en.wikipedia.org/wiki/List_of_Wikipedias
  3. Steiner. 2014 "Bots vs. Wikipedians, Anons vs. Logged-Ins (Redux): A Global Study of Edit Activity on Wikipedia and Wikidata" https://dl.acm.org/citation.cfm?doid=2641580.2641613
  4. Chapter 5 Floridi Information Ethics 2010
  5. https://en.wikipedia.org/wiki/Wikipedia:Bot_policy
  6. Floridi (2009) "The ethics of information transparency"
  7. Robert Gorwa & Douglas Guilbeault (2018). Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet. https://doi.org/10.1002/poi3.184
  8. https://en.wikipedia.org/wiki/User:ClueBot_NG
  9. Geiger, Halfaker. 2013. When the Levee Breaks: Without Bots, What Happens to Wikipedia’s Quality Control Processes?" https://dl.acm.org/citation.cfm?id=2491061
  10. Benítez, Castro, & Requena. (1997) "Are Artificial Neural Networks Black Boxes". https://www.researchgate.net/publication/5595919_Are_Artifi_cial_Neural_Networks_Black_Boxes
  11. Fleischmann & Wallace (2005) "A covenant with transparency: opening the black box of models"
  12. Caliskan, Bryson, & Narayanan (2017) "Semantics derived automatically from language corpora contain human-like biases"
  13. Laat. (2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?" https://link.springer.com/article/10.1007/s10676-015-9366-9
  14. Brey (2010) "Values in technology and disclosive computer ethics"
  15. Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System" https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System
  16. Tsvetkova, García-Gavilanes, Floridi, & Yasseri (2017) "Even good bots fight: The case of Wikipedia"
  17. Geiger & Halfaker (2017) 'Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of “Even Good Bots Fight”'