Captcha Decoded
Take a look at PWNtcha
You can also read Breaking a Visual CAPTCHA
See:
OCR and Neural Nets in JavaScript
Here John Resig (creator of JQuery javascript library) explains how exactly it is been done.
I'm an image processing specialist and CAPTCHA decoder, I've done many CAPTCHA resolving projects before.
OK, let's start CAPTCHA resolving steps!
Decoding any kind off CAPTCHA has 3 main steps:
1- Removing background
Clear the CAPTCHA from any noise (using any image processing methods).
Note for captcha decoding fighter: If you want to have a good CAPTCHA, you should add a stronger noise. Use random noised background that has similar color of characters.
2- Splitting characters
Easy step when they are separate and very hard when they're not.
*Note for captcha decoding fighter: If you want to have a good CAPTCHA, don't leave the character separate! Make them overlapping, do NOT use different colors for characters, decoders can split the characters very easily! (most of the developers are unaware of this and think it's better to use a colorful CAPTCHA!), the best one is making an overlapping string with black color. For an experienced CAPTCHA decoder, it's not a problem to decode a colorful CAPTCHA! It's just beautiful and not useful! :) Use random curved lines witch connect all characters to each other. *
3- Converting separate images into character
After separation, we have a character set, (we don't have any string now, just have images and pixels), we should convert character images into string, But how?! There are several ways, if they are not rotated, and have fixed font and size (such as freeglobes CAPTCHA), you can define a pattern set, your program should loop throw the patters to find the best match for each image, if the characters is very different and needs a large pattern you should use a "Neural Network" to recognize the character. A neural network for CAPTCHA resolving, will takes a character, and we say the network what this character is, for example, we will give it an image of "A" and we tel the NN: it's "A"! , then it will "LEARN" this character and will save its learning into a database, This procedure called "TRAINING". So, when we ask a trained network for a new character again, it will return us the best match from it's learning database. Usually decoder specialists use the CAPTCHA itself to train the neural network. Be careful! Using appropriate data for training can make or break your results.
Note for captcha decoding fighter: If you want to have a good CAPTCHA, use any method witch a decoder can't recognize the characters, even with a Neural network. Deform the characters randomly, use many fonts instead of one and rotate the characters as well, etc.
Finally, we concatenate all single characters into one and return it as result.
Unfortunately, there are no fixed algorithm for solving any CAPTCHA, it means, new CAPTCHA needs new analysis and training. You can't make a CAPTCHA decoder to decode all CAPTCHA.
What should you know before starting:
1- Image processing fundamentals
2- General understanding of a Neural Network
3- Simple image processing functions (in any language)
For PHP:
imagecreate()
imagecreatetruecolor()
imagecolorat()
imagecolorsforindex()
imagesetpixel()
.
.
.
For .NET:
Bitmap type,
getPixel()
setPixel()
.
.
.
For JavaScript and HTML5:
You should know the Canvas very well.
Lastly: Note for captcha decoding fighter: If you are wonder about how someone can decode a CAPTCHA and want to prevent it from decoding, you should first be a CAPTCHA decoder yourself or hire someone knows the weakness and attacking algorithm very well!
Hope to help! ;)