psilandia

Pasi Savolainen

This is my site. It's ordered mainly in a 'directory' fashion. To navigate, use:

Specifics about me.

Overhead of precalculating figlet captchas

After recent news item on slashdot considering easy defeatability of certain type CAPTCHA I set out to investigate risk of easy defeatability of wbcaptcha, figlet based captcha.

First, pixelized image based approach. This could be feasible, but it needs some kind of a rendering system. Probably 'lynx --dump' would do the trick, then searching for something that looks like a captcha and rendering that with suitable, most easily identified font and processing that as a regular image. Most likely selecting color of font and expanding that to make insignificant spaces disappear.

With that approach, there's a snag. Visual text identifying has very big problems with overlapping characters. And with figlet that means option called smushing (and used in probability P(0.5)). I'd say that limits it to at most 50% hit rate, with heavy (at this point of time) calculations and rather complex image analysis. I kinda hope that people capable of doing it are doing better things :)

Another approach would be to simply pregenerate all the letter combinations captcha has, get MD5 sum of them and make database of combination (dictionary attack).

As for the feasibility: wbcaptcha uses 60 different characters in captcha and subsequent characters are independent of eachother (eg. it could be 'aaaa' as well as 'abcd'). It can use 2 different renderings of characters (smushed or plain) and there're several fonts available (I have 5 listed). Default length of captcha is 4 characters (not limited to). This means that there are 60^4*2*5 ~= 13*10^6 possible renderings of captcha with figlet. MD5 checksum is 16bytes in length when stored, if we add to this length of corresponding captcha, this becomes 20 bytes. In total this database would weigh at approximately 260 Megabytes.

That's little. It would probably take little time for some zombiefication software to generate that size of file and query it locally.

Againg, there's a snag. Upping the word length to 5 characters makes database 144Gigabytes in size, 6 characters makes that 8 Terabytes and 7 characters takes 509 Terabytes.

My conclusion is that at this time, with current default settings wbcaptcha can be a push-over defense, but with little hardening it could be rather formidable, if not entirely equivalent to image based captchas.

The equation for database size is 60^(number_of_characters)*2*5*20 bytes. You can easily evaluate it's size in google, for example: 60^8*2*5*20 bytes in terabytes.

(4¼ years ago) | /code/misc/figlet-captcha-precalculation | Comments 1

Comments:

Oct 2006 Tim Another Option
Great article - how rarely do web developers try to think like spammers! I have one other thing I've done with my recent CAPTCHA script (visit my link). I actually space (randomly chosen) the letters apart, then add in clutter characters randomly using characters that exist in the font. As long as I keep them a few spaces away from the actual font design it works well, and when viewing the source, it looks like it would be difficult to parse out or to visually break. I'd be curious what you think. - Tim
Check HTML and CSS Validity. RSS feed Rendertime: insert_rendertime_time_heres