SlideShare a Scribd company logo
Machine Learning Methods
for CAPTCHA Recognition
Rachel Shadoan
Zachery Tidwell, II
Constantine Priemski
Navya Chandana and Shakeeb
CAPTCHA
Completely Automated Public Turing Test to tell Computers and Humans Apart
Why are they interesting?
o Harder than normal text recognition
On par with handwriting recognition,
reading damaged text
o Techniques translate well to other problems
Facial recognition (Gonzaga, 2002)
Weed identification (Yang, 2000)
o Near infinite data sets
Easier to avoid over-fitting
Hypothesis
CAPTCHA recognition can be
accomplished to a high degree
of accuracy using machine
learning methods with minimal
preprocessing of inputs.
Methods
Learning Methods
o Feed-forward Neural
Nets
o Self-Organizing Maps
o K-Means
o Cluster Classification
Segmentation Methods
o Overlapping
o Whitespace
o K-Means
Tools
o JCaptcha
o Image Processing
JCaptcha
o Open-source CAPTCHA
generation software
o Highly configurable
Can produce CAPTCHAs of
many levels of difficulty
o Check it out at:
http://jcaptcha.sourceforge.net
Image Processing
Sparse Image
Represents Images as unbounded set of pixels
Each pixel is a value between 0 and 1 and a
coordinate pair
Center each image before turning into a matrix of
0s and 1s
Original After Transformation
As covered in class
Feed-Forward Neural Nets
Self-Organizing Maps
Training
Initialize N buckets to 
random values
For each input
Find the bucket that is 
“closest” to the input
Adjust the “closest” 
bucket to more closely 
match the input using 
exponential average
Collection
For many inputs
Sort each input into 
the bucket it most 
closely matches
For each bucket and each 
character
Calculate the 
probability of that 
character going into 
that bucket.
K-Means
• Very similar to Self‐
Organizing Maps 
(SOMs)
• Can use the same 
classifying mechanism 
as used for SOM
Overlapping Segmentation
• Divide image into
fixed number of
overlapping tiles of
the same size
• In our case, 20 x 20
pixels with a 50%
overlap
• Discard chunks
under a certain size
and chunks that are
all white
Note: This is a B with
part of it cut off, not
an E. Therein lies the
rub.
• Iterate through the
image from left to
right—segment
when a full column
of whitespace is
encountered
• Works perfectly for
well-spaced text
Whitespace Segmentation
K-Means Segmentation
• Performs better
than heuristic
segmentation on
closely-packed
inputs
Even‐width
K‐Means
Whitespace
Even‐width
K‐Means
Whitespace
Segmentation Comparison
Experiment 1
Machine Learning Method:
Self-Organizing Map
Topology
200 buckets, initialized randomly
Inputs:
3 letter CATPCHAs
Random fonts
Letters A-G
“Chunked” using overlapping segmentation
Experiment 1 Results
Buckets fell into three primary categories:
Distinguishable
letters
Chunks with halves
of two letters
Indistinguishable
noise
Experiment 1 Results
Experiment 2
ML Method:
Neural Net
Topology:
Fully connected
400 inputs
50 node hidden layer
7 outputs
Inputs:
Single letter CATPCHAs
Random fonts
Letters A-G
400 Nodes
50 Nodes
7 Nodes
Contains … ?
A: 0 or 1
B : 0 or 1
C: 0 or 1
D: 0 or 1
E: 0 or 1
F: 0 or 1
G: 0 or 1
Neural Net Learning Curve
Experiment 2 Results
Experiment 2 Results
Neural Net Accuracy vs. Size of Hidden Layer
Past a certain
number of nodes
in the hidden
layer, the
topology ceases
to have a huge
impact on
accuracy.
Experiment 3
ML Method:
Neural Net
Topology:
Fully connected
400 inputs
1000 node hidden layer
7 outputs
ML Method:
SOM
Topology:
500 buckets
Inputs:
4 letter CATPCHAs
Fandom fonts
Letters A-G
Experiment 3
Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐G
Experiment 4
ML Method:
Neural Net
Topology:
Fully connected
400 inputs
1000 node hidden layer
7 outputs
ML Method:
SOM
Topology:
500 buckets
Inputs:
4 letter CATPCHAs
Fandom fonts
Letters A-Z
Experiment 4
Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐Z
Experiment 5
ML Method:
Neural Net
Topology:
Fully connected
400 inputs
1000 node hidden layer
7 outputs
ML Method:
SOM
Topology:
500 buckets
Inputs:
5 letter CATPCHAs
Fandom fonts
Letters A-Z
Experiment 5
Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
What it all means
• Increasing number of characters
dramatically decreases total accuracy
because segmentation quality decreases
• True positive rate goes down when
segmentation quality decreases
• Hence, better segmentation is the key
Future Work
Improved Segmentation
o Wirescreen segmentation
o Ensemble techniques
Improved True Positive Rates with Current
System
o Ensemble techniques
New problems
o Handwriting recognition
o Bot net of doom
Questions?

More Related Content

PDF
Machine Learning Methods For Captcha Recognition
PPTX
Captcha-recognition-with-active-deep-learning
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PDF
Recurrent Neural Networks, LSTM and GRU
PDF
Tutorial on Deep Generative Models
PDF
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
Introduction to Tree-LSTMs
Machine Learning Methods For Captcha Recognition
Captcha-recognition-with-active-deep-learning
Artificial Intelligence, Machine Learning and Deep Learning
Recurrent Neural Networks, LSTM and GRU
Tutorial on Deep Generative Models
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
Deep Learning for Personalized Search and Recommender Systems
Introduction to Tree-LSTMs

What's hot (20)

PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
PDF
Deep Generative Models
PPTX
Deep Learning Models for Question Answering
PDF
Machine Learning: Generative and Discriminative Models
PDF
Information Retrieval with Deep Learning
PDF
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
PPTX
InfoGAIL
PPTX
Word embeddings, RNN, GRU and LSTM
PPTX
Natural language processing techniques transition from machine learning to de...
PPTX
What Deep Learning Means for Artificial Intelligence
PPTX
Neural network basic and introduction of Deep learning
PDF
Convolutional Neural Networks: Part 1
PPTX
One shot learning
PDF
Icml2018 naver review
PPTX
Graph Representation Learning
PPTX
An introduction to Machine Learning (and a little bit of Deep Learning)
PPTX
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
PPTX
Towards Dropout Training for Convolutional Neural Networks
PDF
Memory Networks, Neural Turing Machines, and Question Answering
PDF
Transfer Learning: An overview
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Deep Generative Models
Deep Learning Models for Question Answering
Machine Learning: Generative and Discriminative Models
Information Retrieval with Deep Learning
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
InfoGAIL
Word embeddings, RNN, GRU and LSTM
Natural language processing techniques transition from machine learning to de...
What Deep Learning Means for Artificial Intelligence
Neural network basic and introduction of Deep learning
Convolutional Neural Networks: Part 1
One shot learning
Icml2018 naver review
Graph Representation Learning
An introduction to Machine Learning (and a little bit of Deep Learning)
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Towards Dropout Training for Convolutional Neural Networks
Memory Networks, Neural Turing Machines, and Question Answering
Transfer Learning: An overview
Ad

Viewers also liked (20)

PDF
Use deep learning to hack captcha
PPTX
CAPTCHA and Convolutional neural network
PPT
Human or Intelligent Machine?
PPT
PPT
Captchas
PPTX
Captcha seminar
PPTX
CAPTCHA Cracking System
PPT
Captcha ppt
PPTX
CAPTCHA
PPT
captcha.ppt
PDF
Generic Solving Of Text Based Captcha
PPTX
PPT
Captcha1
PPTX
breaking PHP web Captcha
PPTX
Captcha and Recaptcha Seminar
PPTX
Evaluation of captcha technologies
PPTX
CAPTCHA
PPTX
CAPTCHA
DOC
CAPTCHA(Image Verification Code)
PPT
CAPTCHA- Newly Attractive Presentation for Youth
Use deep learning to hack captcha
CAPTCHA and Convolutional neural network
Human or Intelligent Machine?
Captchas
Captcha seminar
CAPTCHA Cracking System
Captcha ppt
CAPTCHA
captcha.ppt
Generic Solving Of Text Based Captcha
Captcha1
breaking PHP web Captcha
Captcha and Recaptcha Seminar
Evaluation of captcha technologies
CAPTCHA
CAPTCHA
CAPTCHA(Image Verification Code)
CAPTCHA- Newly Attractive Presentation for Youth
Ad

Similar to Captcha Recognition using Neural Networks (18)

PPTX
deCaptcha
PPTX
Mncs 16-10-1주-변승규-introduction to the machine learning #2
PPT
Machine Learning ICS 273A
ODP
Eswc2009
PDF
Multi Layer Perceptron & Back Propagation
PPT
. An introduction to machine learning and probabilistic ...
PPTX
240715_JW_labseminar[metapath2vec: Scalable Representation Learning for Heter...
PDF
OCR with MXNet Gluon
PPTX
Deep learning from a novice perspective
PDF
Deep image generating models
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PPTX
Neural netorksmatching
PPTX
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
PDF
Off-line English Character Recognition: A Comparative Survey
PDF
Original SOINN
PPTX
a paper reading of table recognition
PPT
Handwriting recognition
PPT
deCaptcha
Mncs 16-10-1주-변승규-introduction to the machine learning #2
Machine Learning ICS 273A
Eswc2009
Multi Layer Perceptron & Back Propagation
. An introduction to machine learning and probabilistic ...
240715_JW_labseminar[metapath2vec: Scalable Representation Learning for Heter...
OCR with MXNet Gluon
Deep learning from a novice perspective
Deep image generating models
DEF CON 24 - Clarence Chio - machine duping 101
Neural netorksmatching
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
Off-line English Character Recognition: A Comparative Survey
Original SOINN
a paper reading of table recognition
Handwriting recognition

Recently uploaded (20)

PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Foundation to blockchain - A guide to Blockchain Tech
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
composite construction of structures.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Construction Project Organization Group 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Welding lecture in detail for understanding
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
PPT on Performance Review to get promotions
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
CH1 Production IntroductoryConcepts.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Mechanical Engineering MATERIALS Selection
Foundation to blockchain - A guide to Blockchain Tech
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT 4 Total Quality Management .pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
composite construction of structures.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Construction Project Organization Group 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Structs to JSON How Go Powers REST APIs.pdf
Welding lecture in detail for understanding
Lesson 3_Tessellation.pptx finite Mathematics
PPT on Performance Review to get promotions
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems

Captcha Recognition using Neural Networks