Difference between revisions of "CAPTCHA"

From SI410
Jump to: navigation, search
Line 86: Line 86:
  
 
key features? text-id, item identification, item categorization, other models?
 
key features? text-id, item identification, item categorization, other models?
 +
 +
------
  
 
=== reCAPTCHA ===
 
=== reCAPTCHA ===

Revision as of 04:21, 12 February 2023

CAPTCHA is a nomenclature shorthand that refers to an assortment of automated systems used to distinguish between humans and computers. The concept was first created in 2000 by students and researchers at Carnegie Mellon University.[1][2] The earliest version of the system is credited to Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford, whose joint academic paper introducing CAPTCHA—entitled "CAPTCHA: Using Hard AI Problems for Security"—appeared in early 2003.[3] The quartet also coined the term "CAPTCHA", which is an acronym for "Completely Automated Public Turing Test To Tell Computers and Humans Apart."[1] Since their origination, CAPTCHAs of various forms have become a ubiquitous security measure across the entirety of the internet. They are employed by Wikipedia,[4] Google,[5] and other companies.[6]
An example of a CAPTCHA implementation on Wikipedia's website. This particular CAPTCHA is used to ensure that only human users are able to create new Wikipedia accounts.[7] Image Credit: Wikpedia

CAPTCHAs are designed around the limitations of current artificial intelligence, which struggle with particular tasks that humans generally have no difficulty with.[3] However, as artificial intelligence technology progresses, new programs are often able to reliably evade existing security measures. Computer scientists view this as a win-win—either a CAPTCHA version cannot be defeated and a site remains secure, or the program employed to evade the existing CAPTCHA has successfully solved an open problem in artificial intelligence and significantly advanced the field.[1][2] Regardless, CAPTCHAs themselves must continually evolve in order to remain a step ahead of malicious programs and evasive strategies, which has led to the propagation of new challenge systems such as reCAPTCHA.

CAPTCHA and its successors have been criticized for presenting a barrier to web accessibility and generating undue challenges for users with disabilities or atypical skill and knowledge sets. Such users, in addition to potentially lacking abilities presumptively possessed by the human user audience, often actively utilize technology to enable or supplement their online experiences (e.g., screen reading or speech recognition software.) Their assistive devices, being programs, are by design regularly unable to appropriately interpret or bypass certain CAPTCHAs in order to verify personhood and access the site.

Background

For example, humans typically outperform computers on image recognition tasks, especially those that require manipulation or categorization.

Evolution of Programs

Turing Test

Alan Turing, in his 1950 paper “Computing Machinery and Intelligence”, first proposed the idea of an imitation test for distinguishing between computers and humans. Turing describes an old parlor game, the Imitation Game, wherein a man and a woman go into a room together, and an interrogator remains in a separate room. The interrogator may pose input questions to the two, and, based solely on their output answers—ideally, typewritten so as to anonymize answers and disguise any effects of handwriting or vocal timbre—must decide which answers belong to the man and which to the woman. Turing proposes that it be modified for use as a metric of computer development and intelligence. Rather than a man and a woman, the interrogator attempts to distinguish between a human and a computer.[8]

Turing argues that if the functions and outputs of a human and a machine are indistinguishable, then for all intents and purposes the machine is indeed thinking in the way that a human does. He refers back to the solipsistic philosophical notion that the only human one can be truly certain exists or is thinking in the way that they do is oneself, and says that machines are no different in terms of thought.[8] Consequently, this metric can be used not only to determine whether an unknown entity is a computer or a human, but also as a benchmark of artificial intelligence progression generally.[8][9] Such imitation games are now commonly (and eponymously) known as Turing Tests, and are widely used within the domains of computer science and artificial intelligence.[9]

Upside-Down or Reverse Turing Test

The original role of a Turing test was for humans to be able to distinguish between a human and a computer pretending to be a human.[8][10] As such, CAPTCHAs, which are designed for computers (those running the program) to be able to distinguish between a human and a computer pretending to be a human, have been described as a upside-down Turing test of sorts.[10] Conversely, a reverse Turing test would perhaps more aptly be described as one wherein a human attempted to convince a computer moderator that they are a computer, although this terminology is contested.[10]

This role has been criticized as computers exercising undue authority over humans and barring access to sites created for humans by humans. That is, computers are taking on an active role in the digital landscape, rather than remaining tools exercised by human actors, and potentially exerting dominance over humans in a way that will only continue to escalate.[10] By the most extreme interpretations, artificial general intelligence and artificial super intelligence could pose an existential threat to humanity.[10][11][12][13]

Characteristics

cogsci human features/skills/abilities?

compsci computer features/skills/abilities? - optical character recognition—must explain

Dual Uses

Digitizing Information

reCAPTCHA (see below) served an innovative dual purpose beyond that of the typical CAPTCHA program. In addition to guarding websites against bots posing as human, reCAPTCHA took advantage of the massive amounts of human labor devoted to solving CAPTCHAs to read and digitize scanned archival texts.[14][15]

About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.[15]

reCAPTCHA contributed to interpreting and digitizing the New York Times’ historical archive and, after its Google acquisition, assisted with Google Books efforts.[15][16][17] The project team reports that reCAPTCHA-based text recognition is able to achieve accuracy of over 99.1% in correctly identifying individual words. Conversely, the optical character recognition software available at the time was only able to accurately identify 83.5% of words.[14]

In precisely one year of reCAPTCHA deployment, human users collectively solved over 1.2 billion CAPTCHA tasks, translating over 440 million suspicious words that had been unreadable to optical character recognition software. (The same words were presented to multiple users, whose responses were then cross-verified before a word was confirmed.) Consequently, the authors ultimately characterized reCAPTCHA as a successful use of broad human computational power.[14]

reCAPTCHA digitization efforts may be deemed as exploitative of human users who, in order to access sites and private accounts, are forced to dedicate their own unpaid time and brainpower—in however minuscule increments—to deciphering words for the archives of for-profit companies, even if the dissemination of that knowledge ultimately benefits them. Additionally, the prevalence and dominance of human computation and artificial intelligence digitization can contribute to the decline of professional human transcribers, whose jobs are rendered largely obsolete comparatively, as reCAPTCHA deployment can generate higher accuracy at lower price points.[14][16][17]

Training Artificial Intelligence

In a similar vein, there has been speculation that image-based CAPTCHAs have been and are still used to tag image databases and train artificial intelligence systems in image recognition. This suspicion has been particularly heightened by Google's gradual shift towards street view images for its reCAPTCHA v2 tasks; rather than identifying fruit or animals, users are typically asked to identify crosswalks, traffic signals, cars and vans, or other street features. Such imagery could easily be used to train self-driving cars, which are a tech focus for Google and other companies. It's also consistent with standard machine learning feedback and training practices: humans label images for the computer to review, and the system decides for itself what features cause that image to fall under that categorization. This use of nonconsensual human computation and trials has been viewed as exploitative and misleading on the part of tech companies.

Google self-driving car controversy?

Web Accessibility Concerns

social and cultural disparities (e.g., unfamiliarity with American traffic lights) legality? (e.g., internet inexperience, social access disparities, and/or different cultural backgrounds)

presumption of experience

age

disabilities

reCAPTCHA (see below) offers both auditory and visual tasks, but remains unsupported for deaf-blind users.

CAPTCHA Evasion

Artificial Intelligence Progression

Human Computation

Because most humans are reliably and easily able to bypass CAPTCHAs, mass-scale human labor can be used to bypass CAPTCHAs. The fruits of their labor are then handed immediately back to computer and bot accounts, in a delivery process that averages under twenty seconds. Middlemen recruit labor from low-income countries in Asia; in particular, Russia, India, China, Vietnam, and Bangladesh were indicated as probable sources.[18] They are able to take advantage of a sizeable, low-cost workforce to sell one thousand solved CAPTCHAs for nominal, single-digit sums in United States dollars. This more or less defeats the purpose of CAPTCHAs and transforms anti-spam measures into a purely economic barrier. Furthermore, the wage and labor conditions posed to such workers present a potential ethical concern.[18][19]

Timeline and Versions

The logo for Google's reCAPTCHA division and services.[5] Image Credit: Google.
An example provided by Google of its reCAPTCHA v2 further confirmation task. The user is shown nine images depicting various foods and asked to select the images depicting oranges.[6] Image Credit: Google.

While other programs exist and remain in use, reCAPTCHA iterations have become fairly dominant. This is at least in part due to Google's own ubiquity and dominance across the internet—Google offers a host of design and security services for other companies,[20] in addition to holding 90.8% of the market share for web searches.[21] One tech consulting firm estimates that Google reCAPTCHA captures 97.5% of the market share for CAPTCHAs.[22]

original versions still in use btw

key features? text-id, item identification, item categorization, other models?


reCAPTCHA

reCAPTCHA is a notable CAPTCHA service and company originally affiliated with Carnegie Mellon University's School of Computer Science.[14][23] Luis von Ahn, who was a part of the original CAPTCHA team in 2000, is credited as the project’s executive producer.[1][23] The project is the original team’s officially recommended CAPTCHA iteration.[1]

von Ahn received a prestigious MacArthur Fellowship in 2006, primarily in recognition and support of his work with CAPTCHA.[24][25] He stated that he planned to use the reward money to further his human computation ambitions.[25] von Ahn ultimately accomplished this with his novel reCAPTCHA text-identification project (see Digitizing Information, above).[14]

Google acquired the project in September 2009[26] and has spearheaded successive versions (reCAPTCHA v2, reCAPTCHA v3, and reCAPTCHA Enterprise).[27]

reCAPTCHA v1

reCAPTCHA v1 was a text-based implementation that tasked users with interpreting distorted and/or broken text.[14][15] It has been used in the past to digitize text archives (see Digitizing Information, above); it is unclear whether it is still used for this purpose.[14][15]

Google discontinued support for reCAPTCHA v1 in 2018, but continues to support and offer successive reCAPTCHAs.[28]

reCAPTCHA v2

reCAPTCHA v2 is an image-recognition implementation. Users are first prompted to click on and check a box, at which point some users will be verified as human based and allowed to continue, and others will be presented with an additional task prior to verification. This delineation is based upon prior user activity. If a user is prompted to take an additional task, that task will be in the form of images a user must identify: for instance, by selecting all the images containing orange slices from a grid of nine food-related images.[6][28]

reCAPTCHA v2 allows users to opt for an auditory task, rather than a visual task, and Google reports that it is compatible with major screen-reading services (see Web Accessibility Concerns, above).[6]

reCAPTCHA v3

reCAPTCHA Enterprise


Other Versions

SQUIGL-PIX

ESP-PIX

NuCAPTCHA

hCAPTCHA

ASIRRA

text

Future Versions

enabled an open problem in artificial intelligence.

References

  1. 1.0 1.1 1.2 1.3 1.4 Carnegie Mellon University. (2000–2010). The Official CAPTCHA Site. http://www.captcha.net/
  2. 2.0 2.1 Robinson, S. (2002, December 10). Human or Computer? Take This Test. The New York Times, F1. https://www.nytimes.com/2002/12/10/science/human-or-computer-take-this-test.html
  3. 3.0 3.1 von Ahn, L., Blum, M., Hopper, N.J., Langford, J. (2003). CAPTCHA: Using Hard AI Problems for Security. In: Biham, E. (eds) Advances in Cryptology — EUROCRYPT 2003. EUROCRYPT 2003. Lecture Notes in Computer Science, vol 2656. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39200-9_18
  4. Wikipedia. Create Account. https://en.wikipedia.org/w/index.php?title=Special:CreateAccount
  5. 5.0 5.1 Google. reCAPTCHA. https://www.google.com/recaptcha/about/
  6. 6.0 6.1 6.2 6.3 Google. reCAPTCHA Help. https://support.google.com/recaptcha/?hl=en
  7. Wikipedia. https://en.wikipedia.org/w/index.php?title=Special:CreateAccount
  8. 8.0 8.1 8.2 8.3 Turing, A.M. (1950, October). Computing Machinery and Intelligence, Mind, 59(236), 433–460. https://phil415.pbworks.com/f/TuringComputing.pdf
  9. 9.0 9.1 Metz, C. (2023, January 25). How Smart Are the Robots Getting? The New York Times, https://www.nytimes.com/2023/01/20/technology/chatbots-turing-test.html
  10. 10.0 10.1 10.2 10.3 10.4 Eliot, L. (2020, July 20). The Famous AI Turing Test Put In Reverse And Upside-Down, Plus Implications For Self-Driving Cars. Forbes. https://www.forbes.com/sites/lanceeliot/2020/07/20/the-famous-ai-turing-test-put-in-reverse-and-upside-down-plus-implications-for-self-driving-cars
  11. Gibbs, S. (2014, October 27). Elon Musk: artificial intelligence is our biggest existential threat. The Guardian. https://www.theguardian.com/technology/2014/oct/27/elon-musk-artificial-intelligence-ai-biggest-existential-threat
  12. Clark, S. (2014, December 2). Artificial intelligence could spell end of human race – Stephen Hawking. The Guardian. https://www.theguardian.com/science/2014/dec/02/stephen-hawking-intel-communication-system-astrophysicist-software-predictive-text-type
  13. Dredge, S. (2015, January 29). Artificial intelligence will become strong enough to be a concern, says Bill Gates. The Guardian. https://www.theguardian.com/technology/2015/jan/29/artificial-intelligence-strong-concern-bill-gates
  14. 14.0 14.1 14.2 14.3 14.4 14.5 14.6 14.7 von Ahn, L., Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008, September 12). reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science, 321, 1465–1468. 10.1126/science.1160379
  15. 15.0 15.1 15.2 15.3 15.4 reCAPTCHA. (2009). What Is reCAPTCHA. https://web.archive.org/web/20100611210239/http://recaptcha.net/learnmore.html
  16. 16.0 16.1 Gugliotta, G. (2011, March 28–29). Deciphering Old Texts, One Woozy, Curvy Word at a Time. The New York Times, D3. https://www.nytimes.com/2011/03/29/science/29recaptcha.html
  17. 17.0 17.1 Stone, B. (2009, September 16). Google Buys Service That Uses Humans to Digitize Books. The New York Times. https://archive.nytimes.com/bits.blogs.nytimes.com/2009/09/16/google-buys-service-that-uses-humans-to-digitize-books/
  18. 18.0 18.1 Motoyama, M., Levchenko, K., Kanich, C., McCoy, D., Voelker, G., & Savage, S. (2010). Re: CAPTCHAs – Understanding CAPTCHA-Solving Services in an Economic Context. Proceedings of the 19th USENIX Security Symposium, Washington, DC, USA. https://klevchen.ece.illinois.edu/pubs/mlkmvs-usesec10.pdf
  19. International Labour Organization. "World Employment and Social Outlook: Trends 2019." (2019, February 13). United Nations. https://www.ilo.org/wcmsp5/groups/public/---dgreports/---dcomm/---publ/documents/publication/wcms_670542.pdf
  20. Google. Google Cloud. https://cloud.google.com/
  21. Desjardins, J. (2018, April 23). How Google retains more than 90% of market share. Business Insider, https://www.businessinsider.com/how-google-retains-more-than-90-of-market-share-2018-4
  22. Slintel. (2023, February 1). reCAPTCHA. https://www.slintel.com/tech/captcha/recaptcha-market-share?
  23. 23.0 23.1 reCAPTCHA. (2009). About Us. https://web.archive.org/web/20100611210259/http://recaptcha.net/aboutus.html
  24. MacArthur Foundation. (2006, September 1). Luis von Ahn. https://www.macfound.org/fellows/class-of-2006/luis-von-ahn#searchresults
  25. 25.0 25.1 Spice, B. (2006, September 18). Brilliant Young Scientist Luis von Ahn Earns $500,000 MacArthur Foundation "Genius Grant”. Carnegie Mellon Today, Pittsburgh, PA, https://www.cmu.edu/cmnews/extra/060918_ahn.html
  26. von Ahn, L., & Cathcart, W. (2009, September 16). Teaching computers to read: Google acquires reCAPTCHA. Google. https://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html
  27. Google. reCAPTCHA: About. https://www.google.com/recaptcha/about/
  28. 28.0 28.1 Google. (2021, June 1). Choosing the type of reCAPTCHA. https://developers.google.com/recaptcha/docs/versions