Wikipedia Bots

From SI410
Jump to: navigation, search
Back • ↑Topics • ↑Categories

Wikipedia Bots.png

Wikipedia bots are internet bots that maintain Wikipedia, a user-edited online encyclopedia. As of March 15, 2019 Wikipedia contains over five million [1] English articles, as well as many more in 302 other languages[2]. Wikipedia depends on the contributions and edits of volunteers to grow and maintain its content. Wikipedia bots facilitate maintenance of the site by automatically performing tasks like editing, formatting, and detecting Wikipedia policy violations in content. In 2014, there were 274 active bots and they accounted for 15% of all edits on Wikipedia. [3] The bots demonstrate how autonomous agents can improve the info sphere they inhabit. However, due to their potential to also cause wide damage to infosphere if left unchecked, there are policies concerning the operation of the bots that are set in place to safe guard the website.

Internet bots

An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed much faster and at a larger scale than if they were to be performed manually. Internet bots are currently an integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cybersecurity. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The Mirai Botnet is an infamous example of such an attack. Well-intentioned bots can cause damage as well due to unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long-running and autonomous. This means that a bot can run continuously for years and can independently make decisions. By Floridi's criteria[4] since an internet bot is interactive, autonomous, and adaptable, bots can be classified as Artificial Agents and further classified as moral agents if they make morally quantifiable decisions.

Popular Bots


The most active bot in terms of the number of edits made on the platform is Cydebot with 4.5 million edits to date. This bot is primarily responsible for moving and deleting categories on Wikipedia pages as well as updating the pages that list the categories of articles. These types of edits would be incredibly mundane for a user to edit but are important due to the nature of an Encyclopedia and these types of changes are demanded constantly [5].


Yobot is another popular bot on Wikipedia with almost 3.6 million edits made to date. The primary responsibility of Yobot is to tag articles that contain information about people who have died, either fictionally or in real life, as dead and vice versa. Additionally, the bot updates all the lists for articles that include the dead and non-dead tags in order to maintain accuracy [6]. Again, this type of task is necessary to keep Wikipedia a high function encyclopedia would take editors a long time where a bot is incredibly efficient and useful in a task such as this.

ClueBot NG

A bot with a little more controversy but still very active on the platform is ClueBot NG. With about 2.8 million edits to date, ClueBot NG is in the top 5 of bot edits. ClueBot NG's main job is to monitor content and check for possible information vandalism. Some of this monitoring is simple, such as the detection of inappropriate language, but other types of content can be difficult to monitor. Using machine learning techniques, the bot takes in past examples of vandalism and applies to new content and assigns a probability of the content being vandalism[7]. Obviously, since it uses probability and statistics the bot is not 100% in its predictions which can lead to the removal of appropriate content, and thus spark some controversy among both editors and users. However, the bot has been of great use to Wikipedia and has had a major rule in maintaining Wikipedia's trust and value among readers.

Bot Policy

Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. The bot must first be approved for usage, which requires the owner to prove both the bot's benignness and usefulness to the site. [8] Several components of the policy concern transparency, which is the degree of which information is available. [9] A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the rate at which it will complete those tasks. Additionally, the bot must be clearly identifiable as a bot by its username. A bot operator must submit a request to the Bot Approvals Group and receive approval, before starting a trial period of the bot. There exists a trade-off between the strictness of web policies and user freedom. Wikipedia's strict policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. Twitter's open policy has enabled the proliferation of bots on the platform, including malware and spam bots. [10]

Benefits to Info-sphere

Bots can improve the quality of Wikipedia articles by performing maintenance tasks like correcting reference formatting and linking to other articles. Additionally, they can aid in dealing with two issues that the website faces: vandalism and copyright infringement. Wikipedia's policy of open-editing, allows for a range of disruptive actions including editing pages to include false information, including offensive content, and deleting existing high-quality content. Cluebot[11] is a bot responsible for detecting vandalism, and is highly active, having made over 2 million edits[12]. Cluebot uses machine learning to classify edits and vandalism. Another issue that bots can help assuage is that articles can include text copied directly from copy written sources. Bots can compare edits with copy-written material, flagging duplicates. Since bots are able to perform these checks more rapidly than humans, bots help Wikipedia balance a policy of openness with associated issues of user freedom.

Dangers to Info-sphere


Another potential issue is opaqueness of policy. The source code for some bots is publicly available, though the bot policy does not require that the developer to publish a bot's source code. Cluebot's machine vandalism detection algorithm uses a neural network, which in comparison with other machine learning algorithms, makes it difficult to discern how features are utilized by the algorithm, since a neural network learns the features used for classification. [13]. According to Fleischmann and Wallace's convenant with transparency[14], transparent models allow the outside user to view the model's depiction of reality and its values. As such, Cluebot's spam detection model can be classified as a black box.

Bias and Neutrality

A black box model, along with presenting the issue of transparency, presents the issue of bias since a black box model can hide algorithmic bias. Machine learning algorithms trained on human language data, have been shown to display stereotype bias.[15] An example of bias in Wikipedia bots would be a profiling algorithm which is weighted more heavily against anonymous users. [16]. The consequence of a false positive in the algorithm, is that there may be invalid reverts to articles. In 2010, 13% of Wikipedia editors were female. [10] Reverting an edit can discourage future editing, which can exacerbate the gender ratio issue, increasing gender bias, as women may more strongly experience the effects of criticism. Related to the concept of bias, is the concept of neutrality. Bots have embedded values, which means that they have a tendency to promote certain values.[17] For example, a bot which removes profane language, must determine what language is and is not profane.


Another category of bots, bots which create new articles or add content to preexisting articles, are involved with the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. Misinformation violates the trust that users have in the website. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. [18]. Another case of misinformation involves a 2017 study called "Even good bots fight: The case of Wikipedia"[19] that was published by the scientific journal Plos One. The study examined the volume and rate at which Wikipedia bots reverted each other's edits, observing that bots reciprocally reverted each other's edits over periods that lasted years. News websites and blogs then wrote a number of articles with titles such as "Study reveals bot-on-bot editing wars raging on Wikipedia's pages" and "People built AI bots to improve Wikipedia. Then they started squabbling in petty edit wars, sigh". A follow up study [20] found that a majority of the bot reverts, suspected of being instances of bot on bot conflict, were classified as routine and productive, in contrast with the assertions of bot wars by the articles. The articles are an instance of propagation of misinformation via exaggeration in the info-sphere.

See Also


  1. “Wikipedia: Size of Wikipedia” Wikipedia, Wikimedia Foundation, 20 Apr. 2019,
  2. “List of Wikipedias” Wikipedia, Wikimedia Foundation, 29 Mar. 2019,
  3. Steiner. 2014 "Bots vs. Wikipedians, Anons vs. Logged-Ins (Redux): A Global Study of Edit Activity on Wikipedia and Wikidata"
  4. Chapter 5 Floridi Information Ethics 2010
  5. Merrill, Brad. “The Bots Who Edit Wikipedia (And The Humans Who Made Them).” MakeUseOf, 20 July 2015,
  6. Merrill, Brad. “The Bots Who Edit Wikipedia (And The Humans Who Made Them).” MakeUseOf, 20 July 2015,
  7. Merrill, Brad. “The Bots Who Edit Wikipedia (And The Humans Who Made Them).” MakeUseOf, 20 July 2015,
  8. “Wikipedia: Bot policy” Wikipedia, Wikimedia Foundation, 12 Apr. 2019,
  9. Floridi (2009) "The ethics of information transparency"
  10. Robert Gorwa & Douglas Guilbeault (2018). Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet.
  11. “User: ClueBot NG” Wikipedia, Wikimedia Foundation, 20 Oct. 2010,
  12. Geiger, Halfaker. 2013. When the Levee Breaks: Without Bots, What Happens to Wikipedia’s Quality Control Processes?"
  13. Benítez, Castro, & Requena. (1997) "Are Artificial Neural Networks Black Boxes".
  14. Fleischmann & Wallace (2005) "A covenant with transparency: opening the black box of models"
  15. Caliskan, Bryson, & Narayanan (2017) "Semantics derived automatically from language corpora contain human-like biases"
  16. Laat. (2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?"
  17. Brey (2010) "Values in technology and disclosive computer ethics"
  18. Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System"
  19. Tsvetkova, García-Gavilanes, Floridi, & Yasseri (2017) "Even good bots fight: The case of Wikipedia"
  20. Geiger & Halfaker (2017) 'Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of “Even Good Bots Fight”'