SlideShare a Scribd company logo
Handwritten and Machine Printed Text
 Separation in Document Images using
   the Bag of Visual Words Paradigm
        Konstantinos Zagoris1,2, Ioannis Pratikakis2, Apostolos
        Antonacopoulos1, Basilis Gatos3, Nikos Papamarkos2


1Pattern Recognition and Image Analysis (PRImA) Research Lab
School of Computing, Science and Engineering,
University of Salford, Greater Manchester, UK

2Department of Electrical and Computer Engineering
Democritus University of Thrace, Xanthi, Greece

3Institute
         of Informatics and Telecommunications,
National Center for Scientific Research “Demokritos” Athens, Greece
Current state-of-the-art
Three (3) main approaches
 Text Line Level
 Word Level
 Character Level
Disadvantages
 Different
         Page Segmentation Algorithms
 Incompatible Feature Set
Inspired from Information
Bag of Visual Word   •
                         Retrieval Theory
 Model (BoVWM)       •   An image content is
                         described by a set of “visual
                         words”.
                     •   A “visual word” is expressed
                         by a group of local features
                     •   Most well-known local
                         feature is the Scale-Invariant
                         Feature Transform (SIFT)
                     •   Codebook Creation
                     •   A codebook is defined by
                         the set of the clusters
                     •   A “visual word” is denoted
                         as the vector which
                         represents the center of
                         each cluster
                     •   Codebook is analogous to a
                         dictionary
Each visual entity is
Bag of Visual Word   •
                         described by a BoVWM
 Model (BoVWM)           descriptor
                     •   Each SIFT point belongs to a
                         “visual word”
                     •   The “visual word” that
                         corresponds to the closest
                         center of the cluster by a
                         distance function
                         (Euclidean, Manhattan)
                     •   The descriptor reflects the
                         frequency of each visual
                         word that appears in the
                         image.
Proposed Method


                            Block
                          Descriptor
Original      Page
                          Extraction   Classification   Final Result
Image      Segmentation
                           (BoVW
                           model)
Page Segmentation
                                               1.   B. Gatos, I. Pratikakis, and
                                                    S. Perantonis. Adaptive
                                                    degraded document image
                                                    binarization. Pattern
                   Original
                   Image                            Recognition, 39(3):317–
                                                    327, 2006.

                                               2.   N. Nikolaou, M. Makridis, B. G
                                                    atos, N. Stamatopoulos, and
                                                    N. Papamarkos.
   Locally Adaptive Binarisation Method [1]         Segmentation of historical
                                                    machine-printed documents
                                                    using adaptive run length
                                                    smoothing and skeleton
                                                    segmentation paths. Image
                                                    and Vision
 Adaptive Run Length Smoothing Algorithm [2]        Computing, 28(4):590–
                                                    604, 2010.




                 Final Result
Block Descriptor Extraction
   This step involves the creation of the block
    descriptor by utilizing the BoVW model
   Codebook Properties
   It must be small enough to ensure a low
    computational cost. It must be large enough
    to provide sufficiently high discrimination
    performance
   For the clustering stage the k-means algorithm
    is employed due to its simplicity and speed.
Block Descriptor
An example text block
                                 Extraction
                          •   The SIFTs are calculated on
                              the greyscale version
 Initial SIFT keypoints   •   those SIFTs whose position in
                              the binary image does not
                              match the foreground pixel
                              are rejected
                          •   Each of the remaining local
                              features is assigned a Visual
                              Word from the Codebook
                          •   a Visual Word Descriptor is
                              formed based on the
  Final SIFT keypoints        appearance of each Visual
                              Word of the Codebook in
                              this particular block
Decision System
   a classifier decides if the block contains
    handwritten or machine printed text or neither of
    the above (noise)
   Based on the Support Vector Machines (SVMs)
   Conventional approach – one against one, one
    against others
   Train two SVMs with the Radial Basis Function (RBF)
    kernel
   The first (SVM1) deals with the handwritten text
    problem against all the other
   the second (SVM2) deals with the machine printed
    text problem against all the other.
Decision System Algorithm
Decision System Algorithm

 Support Vector


            D1
                                              D2
         Sample

                                     Sample        Support Vector




SVM1 (Handwritten Text)   SVM2 (Machine-printed Text)
Examples


           Original Image




             Output of the
           proposed method
Evaluation Datasets
 103 modified document images from the
  IAM Handwriting Database
 100 representative images selected from
  the index cards of the UK Natural History
  Museum’s card archive (PRImA-NHM)
 The ground truth files adhere to the Page
  Analysis and Ground-truth Elements
  (PAGE) format framework
 http://datasets.primaresearch.org
Evaluation
            The F-measure of each method.
Dataset                            IAM      PRImA-
                                             NHM
Upper Bound (Proposed
Segmentation)                     0.9887    0.7985
Proposed Method (Proposed
Segmentation and BoVW)            0.9886    0.7689
Gabor Filters (Proposed
Segmentation and Gabor Filters)   0.7921    0.5702
Page Segmentation Problems
 Binarization   Failures



 Noise   – Text Overlapping



 Handwritten     – Machine text Overlapping
Thank You!

Ευχαριστώ!

More Related Content

PPTX
Segmentation - based Historical Handwritten Word Spotting using document-spec...
PPTX
Scene Text Detection on Images using Cellular Automata
PPTX
Text extraction using document structure features and support vector machines
PPTX
Developing Document Image Retrieval System
PDF
Self-Directing Text Detection and Removal from Images with Smoothing
PPT
Das09112008
PPTX
Automatic Image Annotation
PPTX
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Scene Text Detection on Images using Cellular Automata
Text extraction using document structure features and support vector machines
Developing Document Image Retrieval System
Self-Directing Text Detection and Removal from Images with Smoothing
Das09112008
Automatic Image Annotation
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)

What's hot (19)

PPTX
MultiModal Retrieval Image
PPTX
Text extraction from images
PDF
A Literature Survey: Neural Networks for object detection
PDF
C04741319
PPT
Color reduction using the combination of the kohonen self organized feature m...
PDF
A simple framework for contrastive learning of visual representations
PDF
A systematic image compression in the combination of linear vector quantisati...
PDF
O017429398
PDF
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
PDF
AN IMPROVED MULTI-SOM ALGORITHM
PDF
Pioneering VDT Image Compression using Block Coding
PDF
Ijetcas14 527
PDF
IRJET- Object Detection using Hausdorff Distance
PDF
M017427985
PDF
J017426467
PPTX
Detecting text from natural images with Stroke Width Transform
PDF
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
PDF
E017443136
MultiModal Retrieval Image
Text extraction from images
A Literature Survey: Neural Networks for object detection
C04741319
Color reduction using the combination of the kohonen self organized feature m...
A simple framework for contrastive learning of visual representations
A systematic image compression in the combination of linear vector quantisati...
O017429398
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
AN IMPROVED MULTI-SOM ALGORITHM
Pioneering VDT Image Compression using Block Coding
Ijetcas14 527
IRJET- Object Detection using Hausdorff Distance
M017427985
J017426467
Detecting text from natural images with Stroke Width Transform
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
E017443136
Ad

Similar to Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm (20)

PPT
16 17 bag_words
PDF
Lec18 bag of_features
PDF
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
PDF
Pc Seminar Jordi
PDF
Mit6870 template matching and histograms
PDF
Lecture 02 internet video search
PDF
Machine Printed Handwritten Text Discrimination
PDF
Currency recognition on mobile phones
PDF
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
PDF
F010433136
PDF
Manuscript document digitalization and recognition: a first approach
PPTX
12 cv mil_models_for_grids
PPT
PDF
BAG OF VISUAL WORDS FOR WORD SPOTTING IN HANDWRITTEN DOCUMENTS BASED ON CURVA...
PDF
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
PDF
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
PDF
Feature Extraction and Feature Selection using Textual Analysis
PDF
Dr.Kawewong Ph.D Thesis
PDF
Modern features-part-2-descriptors
PPTX
Object classification in far field and low resolution videos
16 17 bag_words
Lec18 bag of_features
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
Pc Seminar Jordi
Mit6870 template matching and histograms
Lecture 02 internet video search
Machine Printed Handwritten Text Discrimination
Currency recognition on mobile phones
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
F010433136
Manuscript document digitalization and recognition: a first approach
12 cv mil_models_for_grids
BAG OF VISUAL WORDS FOR WORD SPOTTING IN HANDWRITTEN DOCUMENTS BASED ON CURVA...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Feature Extraction and Feature Selection using Textual Analysis
Dr.Kawewong Ph.D Thesis
Modern features-part-2-descriptors
Object classification in far field and low resolution videos
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)

Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm

  • 1. Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm Konstantinos Zagoris1,2, Ioannis Pratikakis2, Apostolos Antonacopoulos1, Basilis Gatos3, Nikos Papamarkos2 1Pattern Recognition and Image Analysis (PRImA) Research Lab School of Computing, Science and Engineering, University of Salford, Greater Manchester, UK 2Department of Electrical and Computer Engineering Democritus University of Thrace, Xanthi, Greece 3Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos” Athens, Greece
  • 2. Current state-of-the-art Three (3) main approaches  Text Line Level  Word Level  Character Level
  • 3. Disadvantages  Different Page Segmentation Algorithms  Incompatible Feature Set
  • 4. Inspired from Information Bag of Visual Word • Retrieval Theory Model (BoVWM) • An image content is described by a set of “visual words”. • A “visual word” is expressed by a group of local features • Most well-known local feature is the Scale-Invariant Feature Transform (SIFT) • Codebook Creation • A codebook is defined by the set of the clusters • A “visual word” is denoted as the vector which represents the center of each cluster • Codebook is analogous to a dictionary
  • 5. Each visual entity is Bag of Visual Word • described by a BoVWM Model (BoVWM) descriptor • Each SIFT point belongs to a “visual word” • The “visual word” that corresponds to the closest center of the cluster by a distance function (Euclidean, Manhattan) • The descriptor reflects the frequency of each visual word that appears in the image.
  • 6. Proposed Method Block Descriptor Original Page Extraction Classification Final Result Image Segmentation (BoVW model)
  • 7. Page Segmentation 1. B. Gatos, I. Pratikakis, and S. Perantonis. Adaptive degraded document image binarization. Pattern Original Image Recognition, 39(3):317– 327, 2006. 2. N. Nikolaou, M. Makridis, B. G atos, N. Stamatopoulos, and N. Papamarkos. Locally Adaptive Binarisation Method [1] Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths. Image and Vision Adaptive Run Length Smoothing Algorithm [2] Computing, 28(4):590– 604, 2010. Final Result
  • 8. Block Descriptor Extraction  This step involves the creation of the block descriptor by utilizing the BoVW model  Codebook Properties  It must be small enough to ensure a low computational cost. It must be large enough to provide sufficiently high discrimination performance  For the clustering stage the k-means algorithm is employed due to its simplicity and speed.
  • 9. Block Descriptor An example text block Extraction • The SIFTs are calculated on the greyscale version Initial SIFT keypoints • those SIFTs whose position in the binary image does not match the foreground pixel are rejected • Each of the remaining local features is assigned a Visual Word from the Codebook • a Visual Word Descriptor is formed based on the Final SIFT keypoints appearance of each Visual Word of the Codebook in this particular block
  • 10. Decision System  a classifier decides if the block contains handwritten or machine printed text or neither of the above (noise)  Based on the Support Vector Machines (SVMs)  Conventional approach – one against one, one against others  Train two SVMs with the Radial Basis Function (RBF) kernel  The first (SVM1) deals with the handwritten text problem against all the other  the second (SVM2) deals with the machine printed text problem against all the other.
  • 12. Decision System Algorithm Support Vector D1 D2 Sample Sample Support Vector SVM1 (Handwritten Text) SVM2 (Machine-printed Text)
  • 13. Examples Original Image Output of the proposed method
  • 14. Evaluation Datasets  103 modified document images from the IAM Handwriting Database  100 representative images selected from the index cards of the UK Natural History Museum’s card archive (PRImA-NHM)  The ground truth files adhere to the Page Analysis and Ground-truth Elements (PAGE) format framework  http://datasets.primaresearch.org
  • 15. Evaluation The F-measure of each method. Dataset IAM PRImA- NHM Upper Bound (Proposed Segmentation) 0.9887 0.7985 Proposed Method (Proposed Segmentation and BoVW) 0.9886 0.7689 Gabor Filters (Proposed Segmentation and Gabor Filters) 0.7921 0.5702
  • 16. Page Segmentation Problems  Binarization Failures  Noise – Text Overlapping  Handwritten – Machine text Overlapping