SlideShare a Scribd company logo
Fighting the WebBots
• A webbot is a program that visits web
  sites for all kinds of purposes.
• For example, Google webbots make
  copies of all web sites for their search
  engines.
• The challenge is to stop malicious
  webbots

6/7/2011         ITS102-12, Third Class      1
Webbots and Spam
• Spammers send webbots to get e-mail
  accounts from sites that offer them for
  free.
• How can you tell that someone who
  asks for an e-mail account is a person
  or a webbot?



6/7/2011        ITS102-12, Third Class      2
Are you a person or a bot?
• We know that there are certain things
  that computers cannot do.
• Ask the “applicant” to do something that
  computers cannot do.
• Cook a meal?
• Read something impossible for
  computers to read!

6/7/2011        ITS102-12, Third Class    3
CAPTCHA
• Completely
    Automated
    Public
    Turing test to tell
    Computers and
    Humans
    Apart

6/7/2011           ITS102-12, Third Class   4
CAPTCHA
• CAPTCHA does not have to be text, but
  “computer unreadable” text is
  convenient.
• Alternatives include pictures.
• For example, ask if a person in a
  pictures is smiling or not. What is wrong
  with such a CAPTCHA

6/7/2011        ITS102-12, Third Class    5
How Computers Read
Optical Character Recognition (OCR)
• Step 1: Separate print (usually dark) from
  background (usually light).
• Step 2: Pick up individual characters (group
  of dark pixels)
• Step 3: Identify their shape by looking for
  strokes, loops, corners, etc
• Step 4: Use rules to classify. For example, an
  H has two vertical strokes and a short
  horizontal strokes.

6/7/2011          ITS102-12, Third Class       6
Frustrating OCR
           Separate
     1     background          Use messy background.
           from print
           Pick up
                               Have them blend with
     2     individual
           characters          each other.
           Find strokes,
     3     loops, etc
                               Make the letters “wiggly”

           Apply
                               It should be hopeless by
     4     classification
           rules               this point.

6/7/2011                    ITS102-12, Third Class         7
Make your own CAPTCHA
• A web site that offers you the means:
• www.codeproject.com/KB/aspnet/
  CaptchaImage.aspx
• For a general tutorial see:
• www.theopavlidis.com/technology/
  captcha/tutorial.htm

6/7/2011       ITS102-12, Third Class     8
Some Weak CAPTCHAs

      From Paypal


       From Yahoo’s
         Briefcase




6/7/2011              ITS102-12, Third Class   9
Some CAPTCHAs that may be
       too hard for people

     From Yahoo:


   From Passport:




6/7/2011            ITS102-12, Third Class   10
But Human Vision is Amazing




6/7/2011   ITS102-12, Third Class   11
But Human Vision is Amazing




6/7/2011   ITS102-12, Third Class   12
Non Text CAPTCHAs
• Use pictures as CAPTCHAs
• Plus: There are very tough to break
• Minus:
     – Need to label a huge number of pictures.
     – If we use few pictures the webbot can just
       keep guessing.



6/7/2011            ITS102-12, Third Class          13
Synthetic Pictures
           (an idea by M. Kaplan)
                          Please click on or enter each
                          letter corresponding to the
                          following list in the field
                          below. You must enter them
                          in the exact sequence listed.




                               C           K
6/7/2011         ITS102-12, Third Class                   14

More Related Content

PPTX
Understanding Web Bots and How They Hurt Your Business
PDF
Rimini discussion
PPTX
Captcha
PPT
Captchas
PPTX
PDF
14A81A05A8
PPTX
PPTX
CAPTCHA
Understanding Web Bots and How They Hurt Your Business
Rimini discussion
Captcha
Captchas
14A81A05A8
CAPTCHA

Similar to WebBots (20)

PPT
Captcha ppt
DOCX
DOC
Seminar report on captcha
PPTX
latest ppt in tranning
PPTX
CAPTCHA
PPT
Jean captcha-ppt
PPTX
CSE captcha ppt.pptx
PPTX
introduction to captcha, its types and how does it works
PDF
A Survey of Current Research on CAPTCHA
PDF
Evolution of captcha technologies
PPSX
Captcha
PDF
PDF
PPTX
PPTX
Captcha system
PDF
Captcha Recognition and Robustness Measurement using Image Processing Techniques
PPTX
Captcha
PPTX
Captcha seminar
PPTX
Captcha ppt
Captcha ppt
Seminar report on captcha
latest ppt in tranning
CAPTCHA
Jean captcha-ppt
CSE captcha ppt.pptx
introduction to captcha, its types and how does it works
A Survey of Current Research on CAPTCHA
Evolution of captcha technologies
Captcha
Captcha system
Captcha Recognition and Robustness Measurement using Image Processing Techniques
Captcha
Captcha seminar
Captcha ppt
Ad

Recently uploaded (20)

PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Getting Started with Data Integration: FME Form 101
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Mushroom cultivation and it's methods.pdf
PDF
August Patch Tuesday
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
cloud_computing_Infrastucture_as_cloud_p
A comparative study of natural language inference in Swahili using monolingua...
Zenith AI: Advanced Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
MIND Revenue Release Quarter 2 2025 Press Release
gpt5_lecture_notes_comprehensive_20250812015547.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Mushroom cultivation and it's methods.pdf
August Patch Tuesday
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Web App vs Mobile App What Should You Build First.pdf
WOOl fibre morphology and structure.pdf for textiles
1 - Historical Antecedents, Social Consideration.pdf
Assigned Numbers - 2025 - Bluetooth® Document
A novel scalable deep ensemble learning framework for big data classification...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Ad

WebBots

  • 1. Fighting the WebBots • A webbot is a program that visits web sites for all kinds of purposes. • For example, Google webbots make copies of all web sites for their search engines. • The challenge is to stop malicious webbots 6/7/2011 ITS102-12, Third Class 1
  • 2. Webbots and Spam • Spammers send webbots to get e-mail accounts from sites that offer them for free. • How can you tell that someone who asks for an e-mail account is a person or a webbot? 6/7/2011 ITS102-12, Third Class 2
  • 3. Are you a person or a bot? • We know that there are certain things that computers cannot do. • Ask the “applicant” to do something that computers cannot do. • Cook a meal? • Read something impossible for computers to read! 6/7/2011 ITS102-12, Third Class 3
  • 4. CAPTCHA • Completely Automated Public Turing test to tell Computers and Humans Apart 6/7/2011 ITS102-12, Third Class 4
  • 5. CAPTCHA • CAPTCHA does not have to be text, but “computer unreadable” text is convenient. • Alternatives include pictures. • For example, ask if a person in a pictures is smiling or not. What is wrong with such a CAPTCHA 6/7/2011 ITS102-12, Third Class 5
  • 6. How Computers Read Optical Character Recognition (OCR) • Step 1: Separate print (usually dark) from background (usually light). • Step 2: Pick up individual characters (group of dark pixels) • Step 3: Identify their shape by looking for strokes, loops, corners, etc • Step 4: Use rules to classify. For example, an H has two vertical strokes and a short horizontal strokes. 6/7/2011 ITS102-12, Third Class 6
  • 7. Frustrating OCR Separate 1 background Use messy background. from print Pick up Have them blend with 2 individual characters each other. Find strokes, 3 loops, etc Make the letters “wiggly” Apply It should be hopeless by 4 classification rules this point. 6/7/2011 ITS102-12, Third Class 7
  • 8. Make your own CAPTCHA • A web site that offers you the means: • www.codeproject.com/KB/aspnet/ CaptchaImage.aspx • For a general tutorial see: • www.theopavlidis.com/technology/ captcha/tutorial.htm 6/7/2011 ITS102-12, Third Class 8
  • 9. Some Weak CAPTCHAs From Paypal From Yahoo’s Briefcase 6/7/2011 ITS102-12, Third Class 9
  • 10. Some CAPTCHAs that may be too hard for people From Yahoo: From Passport: 6/7/2011 ITS102-12, Third Class 10
  • 11. But Human Vision is Amazing 6/7/2011 ITS102-12, Third Class 11
  • 12. But Human Vision is Amazing 6/7/2011 ITS102-12, Third Class 12
  • 13. Non Text CAPTCHAs • Use pictures as CAPTCHAs • Plus: There are very tough to break • Minus: – Need to label a huge number of pictures. – If we use few pictures the webbot can just keep guessing. 6/7/2011 ITS102-12, Third Class 13
  • 14. Synthetic Pictures (an idea by M. Kaplan) Please click on or enter each letter corresponding to the following list in the field below. You must enter them in the exact sequence listed. C K 6/7/2011 ITS102-12, Third Class 14