What is a CAPTCHA? More importantly, why is it? What does it stand for? Who invented it? Who’s working on this tech?
Consider the following situation. You’ve forgotten a password of one of the social network accounts and, while you are trying to recall it, you would feed sign in form with all the possible options coming to your mind. After the certain attempt, you would get CAPTCHA test usually presented by an image of distorted letters that you have to manually type in the text box in order to pass it.
Let’s look at another example you may also find familiar. You’ve signed up to a new social network account and started to add people you’re familiar with to your friends’ list. After a certain “add to friends” button push, you are getting the same CAPTCHA test you have to pass to continue your activity.
WHY IS IT HAPPENING TO ME?
This is the question you may have asked yourself facing need to pass CAPTCHA test. Don’t worry. There’s nothing wrong with you. If talking about social networks, usually you’re getting CAPTCHA after performing a set of repetitive actions (e.g. multiple trials of matching your password or sending friend requests one after another within a short period of time) because they remind robotic performance. You may feel offended but remember that it’s only for the sake of your own security.
Also, you can face the same when an action is concerned with using your private information like your credit card details or account settings. Another common space where CAPTCHA is an often guest is blogging platforms. When you try to comment on a post, initially you will need to pass this CAPTCHA test to prove that you’re human.
Therefore, to make sure that you are not a malware script inclined to trolling and to proceed with your actions on the safe side you’re asked to pass the CAPTCHA test.
WHAT IS IT?
CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. Therefore, the main task it’s created to handle is to tell apart human users from a software application (bot). Developers usually deploy bots when they need to perform repetitive tasks that don’t require any human judgment and analysis and can be handled by automated scripts in a more effective way than by human (due to the higher processing power).
One of the “positive” examples is web crawlers deployed on search engines. They examine the content of the websites by “crawling” from link to link and bring information to the servers. Later this information is used by search engines to rank the websites accordingly to the information they contain.
On the other hand, we can see “negative” examples when software applications are used for scraping private information, performing dictionary attacks, cheating on online polls, email and comment spamming, registering accounts and so on. This is obviously a dark side of the cool automation stuff that gives us freedom from dull generic work.
One of the notable cases is that of online polls falsification. In fact, it served the primary reason for creating CAPTCHAs. It happened in 1999 when students of the USA universities were asked to vote online for the best educational program. Students of Carnegie Mellon University and MIT decided to artificially gain the top positions for their universities.
They came up with the ideas to create bots who would automatically vote and, therefore, yield high results of those two universities and give no chance for the rest ones. Later in 2000, as it came out the first CAPTCHA was created. By the way, it happened in the same place where the falsification originated from – Carnegie Mellon University.
HOW DOES IT WORK?
Usually, CAPTCHA comes up in the form of an image that contains distorted letters. It doesn’t come up challenging for an average person to recognize those letters (unless he(she) doesn’t have bad eyesight, which creates another widely discussed problem of CAPTCHAs), yet annoying. However, for most of the software, this task is unreachable.
THE CLASSICAL CONCEPT OF THIS TYPE OF CAPTCHA IS BASED ON THE THREE KEY ISSUES:
- Segmentation: ability to separate letters is very important for the correct recognitions as CAPTCHAs don’t usually contain white spaces between letters.
- Invariant recognition: the letters in the CAPTCHAs look quite unusual due to their different shapes. Nevertheless, every human is capable of recognizing the true letters despite the wry way they are presented in CAPTCHA.
- Context: usually when we cannot recognize a letter we rely on the vocabulary we possess. Therefore, we can guess what this word is by looking at the other letters. For some CAPTCHA, context plays an essential role.
Another kind of CAPTCHA you might have come across is represented by a set of images you’re asked to recognize. It’s quite similar to the previous one in the way that only human can figure out what exactly those pictures contain due to the context that bots cannot grasp. The third and the rarest representation is audible one when a user is told of series of letters and numbers. Those recordings usually include some background noise for the sake of trustworthiness.
WHERE IS TURING HERE?
In the classical Turing test, a certain set of questions is passed to the interviewer to determine whether we’re dealing with a human or a computer program. Therefore, CAPTCHA is rather an implementation of the inverted Turing test when it’s a computer’s task to judge whether it’s a human or one more software it deals with.
Just like the classical Turing test, CAPTCHA was made with an intention to be easy for humans and nearly impossible to pass for bots. Therefore, creating CAPTCHA isn’t as easy as it can appear at the first sight. The problem is that every new CAPTCHA is generated by a machine. Then, this machine checks whether the response is correct but at the same, it shouldn’t be able to solve it.
However, as I early said, it works with the “most of the software”. This note is critical because modern times are providing new possibilities. Thus, some of the applications have built-in image, speech and optical character recognition based on the AI development (which nowadays is mostly presented by the different Machine Learning techniques). Regarding how fast the latter is growing, the relevance of CAPTCHA improvement is gradually becoming more intense.
WHO IS WORKING ON THIS?
In April 2014 it was reported that Google cracked the CAPTCHA algorithm. Google researchers invented the algorithm that turned out to be able to accurately solve any of Google’s CAPTCHAs with 99.8% of accuracy (which is significantly higher that the humans are capable of).
In fact, the research that finally led to this result was intended for the different purpose of street numbers and letters recognition, yet finally served for breaking CAPTCHAs as the contribution to the creation of the new, more secure CAPTCHA algorithm.
Nowadays Google has substituted most of the distorted-letters-styled CAPTCHA forms by just one checkbox “I’m not a robot”. It’s called reCAPTCHAand it has no more anything with classical CAPTCHAs. Here’s how Google describes it:
“reCAPTCHA is a free service that protects your site from spam and abuse.It uses advanced risk analysis techniques to tell humans and bots apart. With the new API, a significant number of your valid human users will pass the reCAPTCHA challenge without having to solve a CAPTCHA. reCAPTCHA comes in the form of a widget that you can easily add to your blog, forum, registration form, etc.”
New CAPTCHAs have proved that making technology more secure shouldn’t imply more challenges for users. Instead, they all serve for making online interactions easier and less disturbing.
WHAT IS NEXT?
In my next blog we will see different versions of google reCAPTCHA and how can we integrate google reCAPTCHA to our react webapp. We will also see ways to hack reCAPTCHA.