How Captcha Turned You Into a Data Entry Robot

If you’ve ever purchased tickets on TicketMaster, made a free email account with Google, or created a Facebook page, chances are you’ve seen a reCaptcha—a little box with two words in a distorted image you have to type in to prove you’re not a robot. Though sometimes inconvenient, this program is surprisingly effective at stopping computers from making transactions meant for people. But did you know that every time you fill out a reCaptcha you are contributing to humanity’s knowledge base? A History of Captcha First developed in the early 2000s by a team of Carnegie Mellon computer scientists, Captchas were a simple way for companies offering online transactions to avoid computerized scams. For example, TicketMaster didn’t want scalpers buying up thousands of concert tickets—something easy to do with the right program—and needed a way to weed out the programs from the people. Using Captcha, users only took a few seconds to type out wavy, fragmented characters, completing a simple task computer algorithms just couldn’t figure out. This easily proved that users were people and not robots. Seeing the potential in this novel concept, Google acquired the Captcha startup early on and started selling its effective services to thousands of websites; however, after a few years, head developer Dr. Luis von Ahn looked at the numbers and started feeling guilty. He realized that people across the world were collectively wasting hundreds of thousands of hours per day proving they were humans. So he and his team came up with a brilliant idea: what if we use those ten seconds to produce a useful result, something that will give back to society? Thus was born reCaptcha. Making it Meaningful with reCaptcha As Dr. von Ahn’s team looked for other problems computers fail to solve, they stumbled upon the challenge of digital archiving. According to “The New York Times,” the typical process for converting old books, newspapers, and other print materials to digital format involves scanning in the pages and running text recognition software. This only works, however, with about 70-90 percent of the words. The remaining words have to then be manually transcribed by people. Von Ahn and his team decided to try adding snapshots of these rejected words to Captchas, turning wasted time into a useful scholarly endeavor. Called “reCaptcha,” this new human-verifying program shows you two words—one control word that is typically hard-to-read gibberish, and one real word that is a digital archive reject. The former proves you’re a human and the latter lets you contribute to human knowledge. You can test this next time you get a reCaptcha by spelling the control word perfectly, then misspelling the real word slightly. Your “wrong” answer will still be accepted. Don’t worry—you won’t ruin the digital archives either; your test-word answer will be rejected after comparison with a few others from around the world, and when the correct answer is sorted out it will be incorporated into the now digitized book. Benefits to Society But how effective is reCaptcha? According to a 2011 TED talk by Dr. Von Ahn, “The number of words digitized is about 100 million per day, which is the equivalent of about 2.5 million books per year. And this is all being done one word at a time by just people typing Captchas on the internet.” So what began as a simple tool for verifying real from artificial intelligence has now converted millions of books, pamphlets, and newspapers into searchable, digital text archives. And books are just the beginning—recently Google Maps has used reCaptcha to decipher street addresses, making their mapping technology more accurate than ever. Basically any image that’s too blurry, angled, or musty for a computer to read can be put into a reCaptcha where someone in the world will figure it out. Adapting to the Present Day Despite Captcha’s success in the past, a recent study reveals that Google Street View algorithms can now read reCaptchas with 99 percent accuracy. What does this mean for the program’s future as a verification tool? Luckily, the brains behind reCaptcha have developed new technology that no longer relies solely on image or audio problem-solving. The new system uses advanced risk-analysis tools, watching how you solve the problem before, during, and after encountering the image or audio stimulus. This proves beneficial to users as well, because it allows the program to use easier-to-solve puzzles while still providing reliable verification. At the same time, reCaptcha will continue using the contributions of millions of Internet users to make digital archives of rare print materials. On to the Future Now that reCaptcha has succeeded in digitization of out-of-print materials, street addresses, etc, what other terrains can the concept explore? In 2012, Dr. Luis von Ahn began a new startup to develop an app called “DuoLingo.” It’s a program that teaches foreign languages to users, but like reCaptcha, simultaneously does something productive—translate the internet. Companies go to DuoLingo with websites to translate, DuoLingo breaks down the material and uses it as stimulus for language learners, and then gets paid for their completed service. Meanwhile, users can enjoy free language learning. It’s a win-win! There’s plenty more to talk about with this technology, but the best source is the inventor himself, Luis von Ahn. For an explanation of how it works from the creator, and some hilarious examples of reCaptcha fails, see Luis von Ahn’s TED talk. [zipfinder] Photo: Adam Gerard/Flickr Find John on Google+

Author -

With over five years writing about the internet industry, John has developed a deep knowledge of internet providers and technology. Prior to writing professionally, John graduated with a degree in strategic communication from the University of Utah. His education and experience make his writing easy to understand, even when covering complex topics. John’s work has been cited by, PCMag, The Washington Post, Los Angeles Times, and more.

Share This