Difference between revisions of "Wikipedia Bots"

From SI410
Jump to: navigation, search
(Policy)
Line 4: Line 4:
 
An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed faster and at a larger scale than they would be done manually.  Internet bots are integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The [[Mirai_Botnet]] is an infamous example of such an attack. Even good intentioned bots can cause damage, as there can be unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. A bot can run continuously for years and can independently make decisions.
 
An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed faster and at a larger scale than they would be done manually.  Internet bots are integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The [[Mirai_Botnet]] is an infamous example of such an attack. Even good intentioned bots can cause damage, as there can be unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. A bot can run continuously for years and can independently make decisions.
  
=== Policy ===
+
=== Bot Policy ===
Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and use to the site. <ref>https://en.wikipedia.org/wiki/Wikipedia:Bot_policy</ref> A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the capacity at which it functions.  A bot operator must submit a request to the Bot Approvals Group. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. This has allowed for a number of malware and spam bots. <ref>https://arxiv.org/pdf/1801.06863.pdf</ref>  
+
Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and use to the site. <ref>https://en.wikipedia.org/wiki/Wikipedia:Bot_policy</ref> A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the capacity at which it functions.  A bot operator must submit a request to the Bot Approvals Group. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. This open policy has allowed for a number of malware and spam bots. <ref>https://arxiv.org/pdf/1801.06863.pdf</ref>
  
 
=== Benefits to Info-sphere ===
 
=== Benefits to Info-sphere ===

Revision as of 17:33, 15 March 2019

Wikipedia is a user-edited online encyclopedia . As of March, 2019 Wikipedia contains over five million [1] English articles, as well as many more in 302 other languages[2]. Wikipedia depends on volunteers to grow and maintain the site. Maintenance of the site is facilitated by internet bots, which have their own user accounts. In 2014, there were 274 active bots and they accounted for 15% of all edits on Wikipedia. [3] The bots demonstrate how autonomous agents can improve the info sphere they inhabit. However, due to their potential to also cause wide damage to info sphere if left unchecked, there are policies concerning the bots, set in place to safe guard the website.

Internet bots

An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed faster and at a larger scale than they would be done manually. Internet bots are integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The Mirai_Botnet is an infamous example of such an attack. Even good intentioned bots can cause damage, as there can be unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. A bot can run continuously for years and can independently make decisions.

Bot Policy

Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and use to the site. [4] A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the capacity at which it functions. A bot operator must submit a request to the Bot Approvals Group. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. This open policy has allowed for a number of malware and spam bots. [5]

Benefits to Info-sphere

Bots can improve the quality of Wikipedia articles, by doing tasks like spell checking articles, and linking to other articles. They can aid in dealing with two issues that the website faces: vandalism and copyright infringement. Wikipedia's policy of open editing, allows for a range of disruptive actions including editing pages to include false information, including offensive content, and deleting existing high quality content. Cluebot[6] is a bot responsible for detecting vandalism, that is highly active, having made over 2 million edits[7]. Cluebot uses machine learning to classify edits vandalism. This can produce false positives and any machine learning algorithm is susceptible to bias, as the algorithm must be trained on human data which may include bias. Another issue that bots can help assuage is that articles can include text copied directly from copy written sources. Bots can compare edits with copywritten material, flagging duplicates. Since bots are able to perform these checking tasks more feasibly than humans, Wikipedia is able to balance a policy of openness, which enables its large scale with associated issues of user freedom.

Danger to Info-sphere

Bots pose an array of dangers and ethical dilemmas to Wikipedia. First, an intentionally malicious bot could vandalize articles at a much faster rate, than any human. A well intentioned bot, may produce false positives and make invalid reverts to articles. In 2010, 13% of wikipedia editors were female. [8] Reverting an edit can discourage future editing, which can exacerbate the gender ratio issue, as women may more strongly experience the effects of criticism. Another potential issue is opaqueness of policy. Cluebot's machine vandalism detection algorithm uses a neural network, which compared with other algorithms presents more of black box, where it is difficult to discern how features are utilized by the algorithm. [9], which would be problematic if there was hidden bias in the algorithm. Such an example of bias would. be a profiling algorithm which is weighted more heavily against anonymous users. [10]. Additionally, enforcing an anti vandalism policy, requires a moral judgement. For example, for a bot to remove profane language, it must be determined what language is profane. A anti profanity policy may then restrict free speech. Another category of bots, bots which create new articles or add content to preexisting articles presents the issue of truth. A number of users trust Wikipedia as a reliable source of information. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. [11]. Misinformation violates the trust that users have in the website.
  1. https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia five million
  2. https://en.wikipedia.org/wiki/List_of_Wikipedias
  3. ip=35.3.51.231&id=2641613&acc=ACTIVE%20SERVICE&key=93447E3B54F7D979%2E0A17827594E6F2C8%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1552621378_e7b064c4e0c4e92a28e12ac3a1ac3ce1
  4. https://en.wikipedia.org/wiki/Wikipedia:Bot_policy
  5. https://arxiv.org/pdf/1801.06863.pdf
  6. https://en.wikipedia.org/wiki/User:ClueBot_NG
  7. http://files.grouplens.org/papers/geiger13levee-preprint.pdf
  8. https://web.archive.org/web/20100414165445/http://wikipediasurvey.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf
  9. https://www.researchgate.net/publication/5595919_Are_Artifi_cial_Neural_Networks_Black_Boxes
  10. https://link.springer.com/article/10.1007/s10676-015-9366-9
  11. https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System