SlideShare a Scribd company logo
www.projectsatbangalore.com 09591912372
2916 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 7, JULY 2014
Phase-Based Binarization of Ancient Document
Images: Model and Applications
Hossein Ziaei Nafchi, Reza Farrahi Moghaddam, Member, IEEE, and Mohamed Cheriet, Senior Member, IEEE
Abstract—In this paper, a phase-based binarization model for
ancient document images is proposed, as well as a postprocessing
method that can improve any binarization method and a ground
truth generation tool. Three feature maps derived from the phase
information of an input document image constitute the core of
this binarization model. These features are the maximum moment
of phase congruency covariance, a locally weighted mean phase
angle, and a phase preserved denoised image. The proposed
model consists of three standard steps: 1) preprocessing; 2) main
binarization; and 3) postprocessing. In the preprocessing and
main binarization steps, the features used are mainly phase
derived, while in the postprocessing step, specialized adaptive
Gaussian and median filters are considered. One of the outputs
of the binarization step, which shows high recall performance, is
used in a proposed postprocessing method to improve the perfor-
mance of other binarization methodologies. Finally, we develop
a ground truth generation tool, called PhaseGT, to simplify
and speed up the ground truth generation process for ancient
document images. The comprehensive experimental results on the
DIBCO’09, H-DIBCO’10, DIBCO’11, H-DIBCO’12, DIBCO’13,
PHIBD’12, and BICKLEY DIARY data sets show the robust-
ness of the proposed binarization method on various types of
degradation and document images.
Index Terms—Historical document binarization, phase-derived
features, ground truthing, document enhancement.
I. INTRODUCTION
LIBRARIES and archives around the world store an
abundance of old and historically important documents
and manuscripts. These documents accumulate a significant
amount of human heritage over time. However, many envi-
ronmental factors, improper handling, and the poor quality of
the materials used in their creation cause them to suffer a high
degree of degradation of various types. Today, there is a strong
move toward digitization of these manuscripts to preserve their
content for future generations. The huge amount of digital
data produced requires automatic processing, enhancement,
and recognition. A key step in all document image processing
workflows is binarization, but this is not a very sophisti-
cated process, which is unfortunate, as its performance has
Manuscript received August 8, 2013; revised January 29, 2014 and April 22,
2014; accepted April 26, 2014. Date of publication May 7, 2014; date of
current version May 27, 2014. This work was supported by the Natural
Sciences and Engineering Research Council of Canada. The associate editor
coordinating the review of this manuscript and approving it for publication
was Dr. Debargha Mukherjee.
The authors are with the Synchromedia Laboratory for Multime-
dia Communication in Telepresence, École de Technologie Supérieure,
Montreal, QC H3C 1K3, Canada (e-mail: hossein.zi@synchromedia.ca;
rfarrahi@synchromedia.ca; mohamed.cheriet@etsmtl.ca).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2014.2322451
Fig. 1. Sample document images selected from the DIBCO’09 [21],
H-DIBCO’10 [22], and DIBCO’11 datasets [23].
a significant influence on the quality of OCR results. Many
research studies have been carried out to solve the problems
that arise in the binarization of old document images char-
acterized by many types of degradation [1]–[19], including
faded ink, bleed-through, show-through, uneven illumination,
variations in image contrast, and deterioration of the cellulose
structure [1], [20]. There are also differences in patterns of
hand-written and machine-printed documents, which add to the
difficulties associated with the binarization of old document
images.
To the best of our knowledge, none of the proposed methods
can deal with all types of documents and degradation. For
more details, see the Related Work section. Fig. 1 shows some
of the degraded document images used in this paper.
In this paper, a robust phase-based binarization method is
proposed for the binarization and enhancement of historical
documents and manuscripts. The three main steps in the
proposed method are: preprocessing, main binarization, and
post-processing. The preprocessing step mainly involves image
denoising with phase preservation [24], followed by some
morphological operations. We incorporate the Canny edge
detector [25] and a denoised image to obtain a binarized image
in rough form.
Then, we use the phase congruency features [18], [19],
[26] for the main binarization step. Phase congruency is
widely used in the machine vision and image processing
1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

More Related Content

PDF
Talous ja Yhteiskunta 4/2009
ODP
Il ruolo dell’avvocato negoziatore e le opportunità che esso offre
PDF
Lo Sucesores: Business must go on
PDF
Talous ja Yhteiskunta 3/2014
DOCX
Dibujo despiece kenny mahecha
PPTX
review on molecular modeling
PPTX
PPTX
Tarea 3 - Dibujo proyectivo
Talous ja Yhteiskunta 4/2009
Il ruolo dell’avvocato negoziatore e le opportunità che esso offre
Lo Sucesores: Business must go on
Talous ja Yhteiskunta 3/2014
Dibujo despiece kenny mahecha
review on molecular modeling
Tarea 3 - Dibujo proyectivo

Viewers also liked (7)

PDF
Portafolio de dibujos de despiece
PDF
Momento 4 201420_59_2016_16_1
PPTX
Diapositivas de johana 2
PDF
Momento 3 201420_59_2016_16_1
PPTX
Operação de Câmera de Vídeo
PDF
Thai Luxury furniture exporter : Sarnn company profile
PPTX
Equipes e Fases da Produção Audiovisual
Portafolio de dibujos de despiece
Momento 4 201420_59_2016_16_1
Diapositivas de johana 2
Momento 3 201420_59_2016_16_1
Operação de Câmera de Vídeo
Thai Luxury furniture exporter : Sarnn company profile
Equipes e Fases da Produção Audiovisual
Ad

Similar to Phase-Based Binarization of Ancient Document Images: Model and Applications (20)

PDF
Guides
PDF
Improvement of binarization performance using local otsu thresholding
PDF
Download full ebook of Spatial Wire Cutting Romana Rust instant download pdf
DOCX
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...
PDF
Modified Approach of Hough Transform for Skew Detection and Correction in Doc...
PPT
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
PDF
An effective and robust technique for the binarization of degraded document i...
PDF
Web Mining Research Issues and Future Directions – A Survey
PDF
Performance of Image-Based Crack Detection Systems in Concrete Structures
PDF
DSD-INT 2019 Modelling in DANUBIUS-RI-Bellafiore
PDF
DCW Data Quality 1992
PDF
Ic3414861499
PDF
project documentation
PPT
Digital Pathology Information Web Services (DPIWS): Convergence in Digital Pa...
PDF
Hyper3dpaper s
PDF
Simplifying Database Normalization within a Visual Interactive Simulation Model
PDF
DNA Query Language DNAQL: A Novel Approach
PDF
Enhancement of Degraded Document Images using Retinex and Morphological Opera...
Guides
Improvement of binarization performance using local otsu thresholding
Download full ebook of Spatial Wire Cutting Romana Rust instant download pdf
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...
Modified Approach of Hough Transform for Skew Detection and Correction in Doc...
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
An effective and robust technique for the binarization of degraded document i...
Web Mining Research Issues and Future Directions – A Survey
Performance of Image-Based Crack Detection Systems in Concrete Structures
DSD-INT 2019 Modelling in DANUBIUS-RI-Bellafiore
DCW Data Quality 1992
Ic3414861499
project documentation
Digital Pathology Information Web Services (DPIWS): Convergence in Digital Pa...
Hyper3dpaper s
Simplifying Database Normalization within a Visual Interactive Simulation Model
DNA Query Language DNAQL: A Novel Approach
Enhancement of Degraded Document Images using Retinex and Morphological Opera...
Ad

More from john236zaq (10)

PDF
Images as Occlusions of Textures: A Framework for Segmentation
PDF
Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain
PDF
Low-Complexity DFT-Based Channel Estimation with Leakage Nulling for OFDM Sys...
PDF
Low-Rank Neighbor Embedding for Single Image Super-Resolution
PDF
Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation
PDF
Modeling and Estimation of Transient Carrier Frequency Offset in Wireless Tra...
PDF
OFDM Synthetic Aperture Radar Imaging With Sufficient Cyclic Prefix
PDF
Progressive Image Denoising Through Hybrid Graph Laplacian Regularization: A ...
PDF
Reversible De-Identification for Lossless Image Compression using Reversible ...
PDF
Stochastic Analysis of the LMS and NLMS Algorithms for Cyclostationary White ...
Images as Occlusions of Textures: A Framework for Segmentation
Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain
Low-Complexity DFT-Based Channel Estimation with Leakage Nulling for OFDM Sys...
Low-Rank Neighbor Embedding for Single Image Super-Resolution
Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation
Modeling and Estimation of Transient Carrier Frequency Offset in Wireless Tra...
OFDM Synthetic Aperture Radar Imaging With Sufficient Cyclic Prefix
Progressive Image Denoising Through Hybrid Graph Laplacian Regularization: A ...
Reversible De-Identification for Lossless Image Compression using Reversible ...
Stochastic Analysis of the LMS and NLMS Algorithms for Cyclostationary White ...

Recently uploaded (20)

PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT
Occupational Health and Safety Management System
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
PPT on Performance Review to get promotions
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
communication and presentation skills 01
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Current and future trends in Computer Vision.pptx
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Occupational Health and Safety Management System
86236642-Electric-Loco-Shed.pdf jfkduklg
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Exploratory_Data_Analysis_Fundamentals.pdf
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Nature of X-rays, X- Ray Equipment, Fluoroscopy
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Fundamentals of Mechanical Engineering.pptx
Information Storage and Retrieval Techniques Unit III
PPT on Performance Review to get promotions
Visual Aids for Exploratory Data Analysis.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
communication and presentation skills 01
Categorization of Factors Affecting Classification Algorithms Selection
Current and future trends in Computer Vision.pptx

Phase-Based Binarization of Ancient Document Images: Model and Applications

  • 1. www.projectsatbangalore.com 09591912372 2916 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 7, JULY 2014 Phase-Based Binarization of Ancient Document Images: Model and Applications Hossein Ziaei Nafchi, Reza Farrahi Moghaddam, Member, IEEE, and Mohamed Cheriet, Senior Member, IEEE Abstract—In this paper, a phase-based binarization model for ancient document images is proposed, as well as a postprocessing method that can improve any binarization method and a ground truth generation tool. Three feature maps derived from the phase information of an input document image constitute the core of this binarization model. These features are the maximum moment of phase congruency covariance, a locally weighted mean phase angle, and a phase preserved denoised image. The proposed model consists of three standard steps: 1) preprocessing; 2) main binarization; and 3) postprocessing. In the preprocessing and main binarization steps, the features used are mainly phase derived, while in the postprocessing step, specialized adaptive Gaussian and median filters are considered. One of the outputs of the binarization step, which shows high recall performance, is used in a proposed postprocessing method to improve the perfor- mance of other binarization methodologies. Finally, we develop a ground truth generation tool, called PhaseGT, to simplify and speed up the ground truth generation process for ancient document images. The comprehensive experimental results on the DIBCO’09, H-DIBCO’10, DIBCO’11, H-DIBCO’12, DIBCO’13, PHIBD’12, and BICKLEY DIARY data sets show the robust- ness of the proposed binarization method on various types of degradation and document images. Index Terms—Historical document binarization, phase-derived features, ground truthing, document enhancement. I. INTRODUCTION LIBRARIES and archives around the world store an abundance of old and historically important documents and manuscripts. These documents accumulate a significant amount of human heritage over time. However, many envi- ronmental factors, improper handling, and the poor quality of the materials used in their creation cause them to suffer a high degree of degradation of various types. Today, there is a strong move toward digitization of these manuscripts to preserve their content for future generations. The huge amount of digital data produced requires automatic processing, enhancement, and recognition. A key step in all document image processing workflows is binarization, but this is not a very sophisti- cated process, which is unfortunate, as its performance has Manuscript received August 8, 2013; revised January 29, 2014 and April 22, 2014; accepted April 26, 2014. Date of publication May 7, 2014; date of current version May 27, 2014. This work was supported by the Natural Sciences and Engineering Research Council of Canada. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Debargha Mukherjee. The authors are with the Synchromedia Laboratory for Multime- dia Communication in Telepresence, École de Technologie Supérieure, Montreal, QC H3C 1K3, Canada (e-mail: hossein.zi@synchromedia.ca; rfarrahi@synchromedia.ca; mohamed.cheriet@etsmtl.ca). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2014.2322451 Fig. 1. Sample document images selected from the DIBCO’09 [21], H-DIBCO’10 [22], and DIBCO’11 datasets [23]. a significant influence on the quality of OCR results. Many research studies have been carried out to solve the problems that arise in the binarization of old document images char- acterized by many types of degradation [1]–[19], including faded ink, bleed-through, show-through, uneven illumination, variations in image contrast, and deterioration of the cellulose structure [1], [20]. There are also differences in patterns of hand-written and machine-printed documents, which add to the difficulties associated with the binarization of old document images. To the best of our knowledge, none of the proposed methods can deal with all types of documents and degradation. For more details, see the Related Work section. Fig. 1 shows some of the degraded document images used in this paper. In this paper, a robust phase-based binarization method is proposed for the binarization and enhancement of historical documents and manuscripts. The three main steps in the proposed method are: preprocessing, main binarization, and post-processing. The preprocessing step mainly involves image denoising with phase preservation [24], followed by some morphological operations. We incorporate the Canny edge detector [25] and a denoised image to obtain a binarized image in rough form. Then, we use the phase congruency features [18], [19], [26] for the main binarization step. Phase congruency is widely used in the machine vision and image processing 1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.