Difference between revisions of "Wikipedia Bots"

From SI410
Jump to: navigation, search
Line 1: Line 1:
[[Wikipedia]] is a user-edited online encyclopedia . As of March, 2019 Wikipedia contains over five million <ref>https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia</ref> English articles, as well as many more in 302 other languages<ref>https://en.wikipedia.org/wiki/List_of_Wikipedias</ref>. Wikipedia depends on volunteers to grow and maintain the site. Maintenance of the site is facilitated by internet bots, which have their own user accounts. In 2014,  there were 274 active bots and they accounted for 15% of all edits on Wikipedia. <ref>Steiner. 2014 "Bots vs. Wikipedians, Anons vs. Logged-Ins (Redux): A Global Study of Edit Activity on Wikipedia and Wikidata" https://dl.acm.org/citation.cfm?doid=2641580.2641613</ref> The bots demonstrate how autonomous agents can improve the info sphere they inhabit. However, due to their potential to also cause wide damage to info sphere if left unchecked, there are policies concerning the bots, set in place to safe guard the website.  
+
[[Wikipedia]] is a user-edited online encyclopedia . As of March 15, 2019 Wikipedia contains over five million <ref>https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia</ref> English articles, as well as many more in 302 other languages<ref>https://en.wikipedia.org/wiki/List_of_Wikipedias</ref>. Wikipedia depends on volunteers to grow and maintain the site. Maintenance of the site is facilitated by internet bots, which have their own user accounts. In 2014,  there were 274 active bots and they accounted for 15% of all edits on Wikipedia. <ref>Steiner. 2014 "Bots vs. Wikipedians, Anons vs. Logged-Ins (Redux): A Global Study of Edit Activity on Wikipedia and Wikidata" https://dl.acm.org/citation.cfm?doid=2641580.2641613</ref> The bots demonstrate how autonomous agents can improve the info sphere they inhabit. However, due to their potential to also cause wide damage to infosphere if left unchecked, there are policies concerning the bots set in place to safe guard the website.  
  
 
===Internet bots===
 
===Internet bots===
An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed faster and at a larger scale than they would be done manually.  Internet bots are integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The [[Mirai_Botnet]] is an infamous example of such an attack. Even good intentioned bots can cause damage, as there can be unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. A bot can run continuously for years and can independently make decisions. By Floridi's criteria<ref>Chapter 5 Floridi Information Ethics 2010</ref> since a bot is interactive, autonomous, and adaptable, bots can be classified as [[Artificial Agents]] and further classified as moral agents if they make morally quantifiable decisions.
+
An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed faster and at a larger scale than they would be done manually.  Internet bots are integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The [[Mirai_Botnet]] is an infamous example of such an attack. Even good intentioned bots can cause damage due to unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. A bot can run continuously for years and can independently make decisions. By Floridi's criteria<ref>Chapter 5 Floridi Information Ethics 2010</ref> since an internet bot is interactive, autonomous, and adaptable, bots can be classified as [[Artificial Agents]] and further classified as moral agents if they make morally quantifiable decisions.
  
 
=== Bot Policy ===
 
=== Bot Policy ===
Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and usefulness to the site. <ref>https://en.wikipedia.org/wiki/Wikipedia:Bot_policy</ref> A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the capacity at which it functions.  A bot operator must submit a request to the Bot Approvals Group. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. This open policy has allowed for a number of malware and spam bots. <ref> Robert Gorwa & Douglas Guilbeault (2018). Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet. https://doi.org/10.1002/poi3.184</ref>
+
Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and usefulness to the site. <ref>https://en.wikipedia.org/wiki/Wikipedia:Bot_policy</ref> A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the capacity at which it functions.  A bot operator must submit a request to the Bot Approvals Group and receive approval, before starting a trial period of the bot. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's strict policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. This open policy has allowed for a number of malware and spam bots. <ref> Robert Gorwa & Douglas Guilbeault (2018). Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet. https://doi.org/10.1002/poi3.184</ref>
  
 
=== Benefits to Info-sphere ===
 
=== Benefits to Info-sphere ===
Bots can improve the quality of Wikipedia articles, by doing tasks like spell checking articles, and linking to other articles. They can aid in dealing with two issues that the website faces: vandalism and copyright infringement. Wikipedia's policy of open editing, allows for a range of disruptive actions including editing pages to include false information, including offensive content, and deleting existing high quality content. Cluebot<ref>https://en.wikipedia.org/wiki/User:ClueBot_NG</ref> is a bot responsible for detecting vandalism, that is highly active, having made over 2 million edits<ref>Geiger, Halfaker. 2013. When the Levee Breaks: Without Bots, What Happens to
+
Bots can improve the quality of Wikipedia articles by doing maintenance tasks like spell checking articles and linking to other articles. Additionally, they can aid in dealing with two issues that the website faces: vandalism and copyright infringement. Wikipedia's policy of open editing, allows for a range of disruptive actions including editing pages to include false information, including offensive content, and deleting existing high quality content. Cluebot<ref>https://en.wikipedia.org/wiki/User:ClueBot_NG</ref> is a bot responsible for detecting vandalism, that is highly active, having made over 2 million edits<ref>Geiger, Halfaker. 2013. When the Levee Breaks: Without Bots, What Happens to
 
Wikipedia’s Quality Control Processes?" https://dl.acm.org/citation.cfm?id=2491061 </ref>. Cluebot uses machine learning  
 
Wikipedia’s Quality Control Processes?" https://dl.acm.org/citation.cfm?id=2491061 </ref>. Cluebot uses machine learning  
to classify edits vandalism. This can produce false positives and any machine learning algorithm is susceptible to bias, as the algorithm must be trained on human data which may include bias. Another issue that bots can help assuage is that articles can include text copied directly from copy written sources. Bots can compare edits with copywritten material, flagging duplicates. Since bots are able to perform these checking tasks more feasibly than humans, Wikipedia is able to balance a policy of openness, which enables its large scale with associated issues of user freedom.  
+
to classify edits vandalism. This can produce false positives and any machine learning algorithm is susceptible to bias, as the algorithm must be trained on human data which may include bias. Another issue that bots can help assuage is that articles can include text copied directly from copy written sources. Bots can compare edits with copy-written material, flagging duplicates. Since bots are able to perform these checking tasks more feasibly than humans, Wikipedia is able to balance a policy of openness, which enables its large scale with associated issues of user freedom.  
 
=== Dangers to Info-sphere ===
 
=== Dangers to Info-sphere ===
Bots pose an array of dangers and ethical dilemmas to Wikipedia. First, an intentionally malicious bot could vandalize articles at a much faster rate, than any human.  A well intentioned bot, may produce false positives and make invalid reverts to articles.  In 2010, 13% of wikipedia editors were female. <ref>Glott, Schmidt, & Ghosh. (2010) "Wikipedia Survey – Overview of Results" https://web.archive.org/web/20100414165445/http://wikipediasurvey.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf</ref> Reverting an edit can discourage future editing, which can exacerbate the gender ratio issue, as women may more strongly experience the effects of criticism. Another potential issue is opaqueness of policy. Cluebot's machine vandalism detection algorithm uses a neural network, which compared with other algorithms presents more of black box, where it is difficult to discern how features are utilized by the algorithm. <ref>Benítez, Castro, & Requena. (1997) "Are Artificial Neural Networks Black Boxes". https://www.researchgate.net/publication/5595919_Are_Artifi_cial_Neural_Networks_Black_Boxes</ref>, which would be problematic if there was hidden bias in the algorithm. Such an example of bias would. be a profiling algorithm which is weighted more heavily against anonymous users. <ref>Laat. (2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?" https://link.springer.com/article/10.1007/s10676-015-9366-9</ref>. Additionally, enforcing an anti vandalism policy, requires a moral judgement. For example, for a bot to remove profane language, it must be determined what language is profane. An ant-profanity policy may then restrict free speech. Another category of bots, bots which create new articles or add content to preexisting articles presents the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. <ref>Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System" https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System</ref>. Misinformation violates the trust that users have in the website.
+
Bots pose an array of dangers and ethical dilemmas to Wikipedia. First, an intentionally malicious bot could vandalize articles at a much faster rate than any human.  A well intentioned bot, may produce false positives and make invalid reverts to articles.  In 2010, 13% of wikipedia editors were female. <ref>Glott, Schmidt, & Ghosh. (2010) "Wikipedia Survey – Overview of Results" https://web.archive.org/web/20100414165445/http://wikipediasurvey.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf</ref> Reverting an edit can discourage future editing, which can exacerbate the gender ratio issue, as women may more strongly experience the effects of criticism. Another potential issue is opaqueness of policy. Cluebot's machine vandalism detection algorithm uses a neural network, which compared with other algorithms presents more of black box, where it is difficult to discern how features are utilized by the algorithm. <ref>Benítez, Castro, & Requena. (1997) "Are Artificial Neural Networks Black Boxes". https://www.researchgate.net/publication/5595919_Are_Artifi_cial_Neural_Networks_Black_Boxes</ref>, which would be problematic if there was hidden bias in the algorithm. Such an example of bias would. be a profiling algorithm which is weighted more heavily against anonymous users. <ref>Laat. (2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?" https://link.springer.com/article/10.1007/s10676-015-9366-9</ref>. Additionally, enforcing an anti vandalism policy, requires a moral judgement. For example, for a bot to remove profane language, it must be determined what language is profane. An ant-profanity policy may then restrict free speech. Another category of bots, bots which create new articles or add content to preexisting articles, present the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. <ref>Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System" https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System</ref>. Misinformation violates the trust that users have in the website.

Revision as of 18:34, 15 March 2019

Wikipedia is a user-edited online encyclopedia . As of March 15, 2019 Wikipedia contains over five million [1] English articles, as well as many more in 302 other languages[2]. Wikipedia depends on volunteers to grow and maintain the site. Maintenance of the site is facilitated by internet bots, which have their own user accounts. In 2014, there were 274 active bots and they accounted for 15% of all edits on Wikipedia. [3] The bots demonstrate how autonomous agents can improve the info sphere they inhabit. However, due to their potential to also cause wide damage to infosphere if left unchecked, there are policies concerning the bots set in place to safe guard the website.

Internet bots

An internet bot is a software application that runs automated tasks on the web. They allow for repetitive tasks be performed faster and at a larger scale than they would be done manually. Internet bots are integral part of the web. For example, all search engines depend on crawlers who jump from link to link to index sites. Bots can roughly be divided into two categories: good actors like the search engine crawlers and malicious bots, which pose a large threat to cyber security. Bots can work together in what is known as a botnet to perform large scale attacks, like a distributed denial of service (DDoS) attack. The Mirai_Botnet is an infamous example of such an attack. Even good intentioned bots can cause damage due to unintended consequences. A crawler that neglects to obey the robots exclusion protocol can overload a web server by making too many requests for the server to handle. Two notable properties of bots that enable their potential for unintended consequences are that they can be long running and autonomous. A bot can run continuously for years and can independently make decisions. By Floridi's criteria[4] since an internet bot is interactive, autonomous, and adaptable, bots can be classified as Artificial Agents and further classified as moral agents if they make morally quantifiable decisions.

Bot Policy

Due to the potential for misuse, bots must adhere to Wikipedia's bot policy. Bot's must first be approved for usage, which requires the owner to prove both the bot's benignness and usefulness to the site. [5] A certain degree of transparency is required of the bot, which includes disclosing details of what tasks the bot is completing and the capacity at which it functions. A bot operator must submit a request to the Bot Approvals Group and receive approval, before starting a trial period of the bot. There exists a trade off between the strictness of web policies and user freedom. Wikipedia's strict policy can be contrasted with Twitter's bot policy, which is more encouraging of bots and does not require approval. This open policy has allowed for a number of malware and spam bots. [6]

Benefits to Info-sphere

Bots can improve the quality of Wikipedia articles by doing maintenance tasks like spell checking articles and linking to other articles. Additionally, they can aid in dealing with two issues that the website faces: vandalism and copyright infringement. Wikipedia's policy of open editing, allows for a range of disruptive actions including editing pages to include false information, including offensive content, and deleting existing high quality content. Cluebot[7] is a bot responsible for detecting vandalism, that is highly active, having made over 2 million edits[8]. Cluebot uses machine learning to classify edits vandalism. This can produce false positives and any machine learning algorithm is susceptible to bias, as the algorithm must be trained on human data which may include bias. Another issue that bots can help assuage is that articles can include text copied directly from copy written sources. Bots can compare edits with copy-written material, flagging duplicates. Since bots are able to perform these checking tasks more feasibly than humans, Wikipedia is able to balance a policy of openness, which enables its large scale with associated issues of user freedom.

Dangers to Info-sphere

Bots pose an array of dangers and ethical dilemmas to Wikipedia. First, an intentionally malicious bot could vandalize articles at a much faster rate than any human. A well intentioned bot, may produce false positives and make invalid reverts to articles. In 2010, 13% of wikipedia editors were female. [9] Reverting an edit can discourage future editing, which can exacerbate the gender ratio issue, as women may more strongly experience the effects of criticism. Another potential issue is opaqueness of policy. Cluebot's machine vandalism detection algorithm uses a neural network, which compared with other algorithms presents more of black box, where it is difficult to discern how features are utilized by the algorithm. [10], which would be problematic if there was hidden bias in the algorithm. Such an example of bias would. be a profiling algorithm which is weighted more heavily against anonymous users. [11]. Additionally, enforcing an anti vandalism policy, requires a moral judgement. For example, for a bot to remove profane language, it must be determined what language is profane. An ant-profanity policy may then restrict free speech. Another category of bots, bots which create new articles or add content to preexisting articles, present the issue of truth and trust online. A number of users trust Wikipedia as a reliable source of information. An early bot, Rambot, used public data bases to create articles on U.S. cities. Due to errors in the data, 2,000 articles were corrupted. [12]. Misinformation violates the trust that users have in the website.
  1. https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
  2. https://en.wikipedia.org/wiki/List_of_Wikipedias
  3. Steiner. 2014 "Bots vs. Wikipedians, Anons vs. Logged-Ins (Redux): A Global Study of Edit Activity on Wikipedia and Wikidata" https://dl.acm.org/citation.cfm?doid=2641580.2641613
  4. Chapter 5 Floridi Information Ethics 2010
  5. https://en.wikipedia.org/wiki/Wikipedia:Bot_policy
  6. Robert Gorwa & Douglas Guilbeault (2018). Unpacking the Social Media Bot: A Typology to Guide Research and Policy. Policy & Internet. https://doi.org/10.1002/poi3.184
  7. https://en.wikipedia.org/wiki/User:ClueBot_NG
  8. Geiger, Halfaker. 2013. When the Levee Breaks: Without Bots, What Happens to Wikipedia’s Quality Control Processes?" https://dl.acm.org/citation.cfm?id=2491061
  9. Glott, Schmidt, & Ghosh. (2010) "Wikipedia Survey – Overview of Results" https://web.archive.org/web/20100414165445/http://wikipediasurvey.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf
  10. Benítez, Castro, & Requena. (1997) "Are Artificial Neural Networks Black Boxes". https://www.researchgate.net/publication/5595919_Are_Artifi_cial_Neural_Networks_Black_Boxes
  11. Laat. (2015). "The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?" https://link.springer.com/article/10.1007/s10676-015-9366-9
  12. Niederer & Dijck (2010) "Wisdom of the Crowd or Technicity of Content? Wikipedia as a Sociotechnical System" https://www.researchgate.net/publication/249689493_Wisdom_of_the_Crowd_or_Technicity_of_Content_Wikipedia_as_a_Sociotechnical_System