To protect online submission forms from spam bots and other auto-posting programs, webmasters result more and more to the implementation of captcha on these forms. Although it keeps the simplest programs and bots out, it also turns away a lot of real people.
Captcha stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” and now you know why no one ever refers to Captcha with the non abbreviated form. Sometimes it’s referred to as Turing Test or reverse Turing test, In its most common form, you, the user, get some garbled text presented and you are supposed to type that into a box and submit it, just to prove that you are a real human.
Problem with Captcha is that sometimes the letters are real difficult to read, especially for people with less than 10/10 vision. Me, personally, if the Captcha is too difficult to read, I just click the back button. For example on those “report broken link” links you often see in software directories, I will not go through with the report if the Captcha is too difficult. I mean, come on… I’m doing the webmaster a favor here, helping to clean up his site and he wants me to guess 5 or 6 unreadable codes before I can actually help him? No way, my time is far more valuable than that! I’m pretty sure there are lots of people just doing the same thing.
Now the webmaster could make the Captcha easier, but easy Captchas can be easily hacked with standard Linux tools like GD. Actually, even medium hard ones are being hacked. But it gets worse. Indian and Pakistan have now datacenters dedicated to resolving Captchas. A spammer can now buy an API (a piece of code on another computer that you can remotely use in your own software) through which his software can send the Captcha image to one of those datacenters where a real human will solve it. Cost of this: a measly 1 or 2$ per 1,000 Captchas resolved (Yes, that is one thousand).
Although Captchas will keep out amateur hackers, if they really want to spam, they will. And that while real humans who want to contribute real and valuable “content” to the site are turned away en masse.
Other forms of Captcha are beginning to appear.
- There’s the math solving Captcha which shows the user a simple math problem (5+6=) and he has to fill in the answer. Worthless, since easily broken with GD
- Another one uses different colors. You are shown several letters, some of which are in another color and you must only type the blue ones for example. If the letters not to type in are black and some are another color, this is so easily broken with GD. All a hacker has to do is take the image, select the non-white part (all the letters), place that on a black background (black letters will disappear) and then OCR the remaining image.
- The funniest one and in my opinion the most promising one is the one where you have to select the 3 most attractive people from a list of 9. Although tastes differ, it’s pretty easy to get it right the first try. It’s also the hardest to crack by software, although I see it possibly being done with statistical analysis.
None of these however will stand the Pakistani or Indian cheap-labor-hack…
My suggestion: implement an easy and maybe fun type of Captcha but that is hard to solve with image manipulation software like GD. This will keep the wanna-be spammers out and your users happy.