SlideShare a Scribd company logo
Rajiv K. Sharma & Dr. Amardeep Singh
International Journal of Image Processing, Volume (2): Issue (3) 12
Segmentation of Handwritten Text in Gurmukhi Script
Rajiv K. Sharma rajiv.patiala@gmail.com
Sr. Lecturer, SMCA
Thapar University,
Patiala, 147002
Punjab, India
Dr. Amardeep Singh amardeep_dhiman@yahoo.com
Reader, UCoE,
Punjabi University,
Patiala, 147004
Punjab, India
Abstract
Character segmentation is an important preprocessing step for text recognition.
The size and shape of characters generally play an important role in the process
of segmentation. But for any optical character recognition (OCR) system, the
presence of touching characters in textual as well handwritten documents further
decreases correct segmentation as well as recognition rate drastically. Because
one can not control the size and shape of characters in handwritten documents
so the segmentation process for the handwritten document is too difficult. We
tried to segment handwritten text by proposing some algorithms, which were
implemented and have shown encouraging results. Algorithms have been
proposed to segment the touching characters. These algorithms have shown a
reasonable improvement in segmenting the touching handwritten characters in
Gurmukhi script.
Keywords: Character Segmentation, Middle Zone, Upper Zone, Lower Zone, Touching Characters.
Handwritten, OCR
1. INTRODUCTION
In optical character recognition (OCR), a perfect segmentation of characters is required before
individual characters are recognized. Therefore segmentation techniques are to apply to word
images before actually putting those images to reorganization process. The simplest way to
segment the characters is to use inter – character gap as a segmentation point[1]
. However, this
technique results in partial failure if the text to be segmented contains touching characters. The
situation becomes grim if text consists of handwritten characters. The motivation behind this
paper is that to find out a reasonable solution to segment handwritten touching characters in
Gurmukhi script. Gurmukhi script is one of the popular scripts used to write Punjabi language
which is one of popular spoken language of northern India. Because our work is related with
segmentation of Gurumukhi script, so it is better to discuss some characteristics of the said script
so that the reader can have a better idea of the work.
Rajiv K. Sharma & Dr. Amardeep Singh
International Journal of Image Processing, Volume (2): Issue (3) 13
2. CHARACTERISTICS OF GURMUKHI SCRIPT
Gurmukhi script alphabet consists of 41 consonants and 12 vowels[2]
as shown in FIGURE 2.
Besides these, some characters in the form of half characters are present in the feet of
characters. Writing style is from left to right. In Gurmukhi, There is no concept of upper or
lowercase characters. A line of Gurmukhi script can be partitioned into three horizontal zones
namely, upper zone, middle zone and lower zone. Consonants are generally present in the
middle zone. These zones are shown in FIGURE 1. The upper and lower zones may contain
parts of vowel modifiers and diacritical markers.
FIGURE 1 : a) Upper zone from line number 1 to 2, b) Middle Zone from
line number 3 to 4, c) lower zone from line number 4 to 5
In Gurmukhi Script, most of the characters, as shown in FIGURE 2, contain a horizontal line at
the upper of the middle zone. This line is called the headline. The characters in a word are
connected through the headline along with some symbols as i, I, A etc. The headline helps in the
recognition of script line positions and character segmentation. The segmentation problem for
Gurmukhi script is entirely different from scripts of other common languages such as English,
Chinese, and Urdu
[3]
etc. In Roman script, windows enclosing each character composing a word
do not share the same pixel values in horizontal direction. But in Gurmukhi script, as shown in
FIGURE 1, two or more characters/symbols of same word may share the same pixel values in
horizontal direction.
This adds to the complication of segmentation problem in Gurmukhi script. Because of these
differences in the physical structure of Gurmukhi characters from those of Roman, Chinese,
Consonants (Vianjans)
FIGURE 2 a) : Consonants (Vianjans)
Rajiv K. Sharma & Dr. Amardeep Singh
International Journal of Image Processing, Volume (2): Issue (3) 14
Vowels and Vowel diacritics (Laga Matra)
FIGURE 2 b) : Vowels and Vowel diacritics (Laga Matra)
Other symbols
FIGURE 2 c) : Other symbols
Japanese and Arabic scripts, the existing algorithms for character segmentation of these scripts
does not work efficiently for handwritten Gurmukhi script.
3. PREPROCESSING
Preprocessing is applied on the input binary document so that the effect of spurious noise can be
minimized in the subsequent processing stages. In the present study, both salt and peeper noise
have been removed using standard algorithm [4]
. It is supposed that height and width of document
can be known easily. The image is saved in the form of an array. For that purpose a 2-D array
with number of rows equal to height of the document and number of columns equal to width of the
document is created. Calculate the maximum intensity of pixels in the document using any
standard function available in the tool used for the implementation, it is getRGB() method
available in java. Scan every pixel of document and compare its intensity with the maximum
intensity. If the intensity is equal to maximum intensity, store one in the array at that location, and
if it is not equal store zero in the array.
4. PROPOSED PROCEDURES TO SEGMENT LINE, WORD and CHARACTER
Line Detection
The following procedure is implemented to find the location of lines in the document.
i. Create an array of size equal to height of the document and with two columns.
Rajiv K. Sharma & Dr. Amardeep Singh
International Journal of Image Processing, Volume (2): Issue (3) 15
ii. Start from the first row and count the number of 1’s in that row. If it is zero, move to next
row. And if it is not zero, that is the starting location of that line. Store that location in the
array.
iii. Check consecutive rows until we get 0.The before we get zero is the ending location of that
line. Store that value in the array.
iv. Also calculate the location of maximum intensity in each line and store it in the second
column before that line. It would be used as the starting position of characters.
v. Repeat step (ii) to (iv) for the whole document.
Word Detection
The following procedure is implemented to find location of words in each line.
i. Create a 2-D array.
ii. For each line move from 0th
pixel up to width.
iii. Calculate the number of one’s in first column from the starting location of line to the ending
position of line.
iv. If number of 1’s are not zero, that is the starting location of word. Save that location in that
array. Keep on moving to the right until we get no one in any column. The column with 0
1’s is the ending location of the word. Store that location in array too.
v. Repeat this until we reach the width.
vi. And repeat step (ii) to (v) for each line.
Character Detection
The following procedure is implemented to find the location of character in each word.
i. Create a 3-d array. Its first index will represent line number. Second index will represent
word number and third index will contain the location of character. This array will be
created dynamically.
ii. Repeat the step (iii) to (iv) for each line and each word detected so far.
iii. Move from starting position of the word to the ending position of the word.
iv. Start from the starting position of line and move downwards to the ending position. Count
the number of one’s in that column leaving the location of line with maximum intensity. If it
is not zero, that is the starting position of character. Move to right until we get column with
no ones. that will be the ending location of character.
This process will generate the location of characters.
The above approach was put to number of documents; the image of one such scanned document
is given below.
FIGURE 3: Scanned Image of a Document
The result of the scanned document after processing is given below.
Rajiv K. Sharma & Dr. Amardeep Singh
International Journal of Image Processing, Volume (2): Issue (3) 16
FIGURE 4: Processed Document
The main objective of the work was to segment the lines, words and to segment the touching
characters present in handwritten document in Gurmukhi script. We obtained the following table
after putting handwritten Gurmukhi documents for segmentation. The results are summarized as
in following tables:
Document No of Lines Correctly
Detected
Inaccurate segmentation Accuracy
Doc1 5 4 1 80%
Doc2 8 7 1 87.5%
Doc3 10 8 2 80%
Doc4 13 11 2 84.61
TABLE 1: ACCURACY for Line Segmentation
Document No of Words Correctly
Detected
Inaccurate segmentation Accuracy
Doc1 38 32 6 84.21%
Doc2 56 49 7 87.5%
Doc3 95 79 16 83.15%
Doc4 110 90 20 81.81
TABLE 2: ACCURACY for Word Segmentation
Document No of Characters Correctly
Detected
Inaccurate segmentation Accuracy
Doc1 79 71 8 89.8%
Doc2 168 145 23 86.30%
Doc3 224 175 49 78.12%
Doc4 289 232 57 80.27
TABLE 3: ACCURACY for Character Segmentation
5. CONCLUSION AND FUTURE WORK
This work was carried out to detect lines present in scanned document in handwritten Gurumukhi
script. So firstly we are to find out the lines present in the document then to find words present in
each line detected at the first step. Using the detected words it is to segment characters present
in each word. Therefore using line detection algorithm (the first approach) lines were detected.
Mostly we found the correct lines, but some were not detected correctly. The reason behind this
may be the writing style of Gurumukhi script as shown in FIGURE 1. So the words presents in the
Rajiv K. Sharma & Dr. Amardeep Singh
International Journal of Image Processing, Volume (2): Issue (3) 17
lower zone were considered as a different line. The correctly detected lines were further put to
word detection algorithm. Here the results were good, but sometimes when the words were not
joined properly then that was detected as a different word. The locations of the detected words
were used to segment the characters. At few point segmentation was good but at few point it was
not upto the expectations. This may be because of the similarity in the shapes of few characters.
All these issues can be dealt in the future for handwritten documents written in 2-dimensional
script like Gurumukhi by making few changes to proposed work.
6. REFERENCES
1. Y. Lu. “Machine Printed Character Segmentation – an Overview”. Pattern Recognition,
vol. 29(1): 67-80, 1995
2. M. K. Jindal, G. S. Lehal, and R. K. Sharma. “Segmentation Problems and Solutions in
Printed Degraded Gurmukhi Script”. IJSP, Vol 2(4),2005:ISSN 1304-4494.
3. G. S .Lehal and Chandan Singh. “Text segmentation of machine printed Gurmukhi
script”. Document Recognition and Retrieval VIII, Proceedings SPIE, USA, vol. 4307:
223-231, 2001.
4. Serban, Rajjan and Raymund. “Proposed Heuristic Procedures to Preprocesses
Character Pattern using Line Adjacency Graphs”. Pattern recognition, vol. 29(6): 951-
975, 1996.
5. Veena Bansal and R.M.K. Sinha. “Segmentation of touching and Fused Devanagari
characters, ". Pattern recognition, vol. 35: 875-893, 2002.
6. R. G. Casy and E. Lecolinet. “A survey of methods and strategies in character
segmentation”. IEEE PAMI, Vol. 18:690 – 706,1996.
7. U. Pal and Sagarika Datta. “Segmentation of Bangla Unconstrained Handwritten Text”.
Proceedings of the Seventh International Conference on Document Analysis and
Recognition (ICDAR ), 2003.
8. U. Pal, S. Sinha and B. B. Chaudhuri. “Multi-Script Line identification from Indian
Documents”, Proceedings of the Seventh International Conference on Document
Analysis and Recognition (ICDAR) 2003.
9. Rajean Plamondon, Sargur N. Srihari. “On – Line and Off – Line Handwritting
Recognition: A Comprehensive Survey”, IEEE Transaction on Pattern Analysis and
Machine Intelligence, Vol 22(1). Janurary, 2000.
10. Giovanni Seni and Edward Cohen. “ External word segmentation of off – line handwritten
text lines”. Pattern Recognition, Vol. 27(1): 41-52, 1994.

More Related Content

PDF
HMM BASED POS TAGGER FOR HINDI
PDF
Classification and Identification of Telugu Aksharas using Moment Invariants ...
PDF
Isolated Kannada Character Recognition using Chain Code Features
PDF
Anandkumar novel approach
PDF
An exhaustive font and size invariant classification scheme for ocr of devana...
PDF
D2 anandkumar
PDF
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
PDF
Usage of regular expressions in nlp
HMM BASED POS TAGGER FOR HINDI
Classification and Identification of Telugu Aksharas using Moment Invariants ...
Isolated Kannada Character Recognition using Chain Code Features
Anandkumar novel approach
An exhaustive font and size invariant classification scheme for ocr of devana...
D2 anandkumar
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
Usage of regular expressions in nlp

Similar to Segmentation of Handwritten Text in Gurmukhi Script (20)

PDF
Fragmentation of Handwritten Touching Characters in Devanagari Script
PDF
Fragmentation of Handwritten Touching Characters in Devanagari Script
PDF
Fragmentation of handwritten touching characters in devanagari script
PDF
Devnagari document segmentation using histogram approach
PDF
Script Identification In Trilingual Indian Documents
PDF
A New Method for Identification of Partially Similar Indian Scripts
PDF
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
PDF
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
PDF
Critical Review on Off-Line Sinhala Handwriting Recognition
PDF
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
PDF
Recognition of Offline Handwritten Hindi Text Using SVM
PDF
An Empirical Study on Identification of Strokes and their Significance in Scr...
PDF
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
PDF
Performance Comparison between Different Feature Extraction Techniques with S...
PDF
E123440
PDF
Improvement of telugu ocr by segmentation of touching characters
PDF
Mprovement of telugu ocr by segmentation of touching characters
PDF
Recognition of Words in Tamil Script Using Neural Network
PDF
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
PDF
Fuzzy rule based classification and recognition of handwritten hindi
Fragmentation of Handwritten Touching Characters in Devanagari Script
Fragmentation of Handwritten Touching Characters in Devanagari Script
Fragmentation of handwritten touching characters in devanagari script
Devnagari document segmentation using histogram approach
Script Identification In Trilingual Indian Documents
A New Method for Identification of Partially Similar Indian Scripts
A Novel Approach for Bilingual (English - Oriya) Script Identification and Re...
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
Critical Review on Off-Line Sinhala Handwriting Recognition
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Recognition of Offline Handwritten Hindi Text Using SVM
An Empirical Study on Identification of Strokes and their Significance in Scr...
Segmentation of Handwritten Chinese Character Strings Based on improved Algor...
Performance Comparison between Different Feature Extraction Techniques with S...
E123440
Improvement of telugu ocr by segmentation of touching characters
Mprovement of telugu ocr by segmentation of touching characters
Recognition of Words in Tamil Script Using Neural Network
DISCRIMINATION OF ENGLISH TO OTHER INDIAN LANGUAGES (KANNADA AND HINDI) FOR O...
Fuzzy rule based classification and recognition of handwritten hindi
Ad

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Lesson notes of climatology university.
PDF
Insiders guide to clinical Medicine.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Institutional Correction lecture only . . .
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
Complications of Minimal Access Surgery at WLH
Microbial disease of the cardiovascular and lymphatic systems
Lesson notes of climatology university.
Insiders guide to clinical Medicine.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Institutional Correction lecture only . . .
102 student loan defaulters named and shamed – Is someone you know on the list?
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
TR - Agricultural Crops Production NC III.pdf
VCE English Exam - Section C Student Revision Booklet
STATICS OF THE RIGID BODIES Hibbelers.pdf
Computing-Curriculum for Schools in Ghana
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
GDM (1) (1).pptx small presentation for students
O5-L3 Freight Transport Ops (International) V1.pdf
Ad

Segmentation of Handwritten Text in Gurmukhi Script

  • 1. Rajiv K. Sharma & Dr. Amardeep Singh International Journal of Image Processing, Volume (2): Issue (3) 12 Segmentation of Handwritten Text in Gurmukhi Script Rajiv K. Sharma rajiv.patiala@gmail.com Sr. Lecturer, SMCA Thapar University, Patiala, 147002 Punjab, India Dr. Amardeep Singh amardeep_dhiman@yahoo.com Reader, UCoE, Punjabi University, Patiala, 147004 Punjab, India Abstract Character segmentation is an important preprocessing step for text recognition. The size and shape of characters generally play an important role in the process of segmentation. But for any optical character recognition (OCR) system, the presence of touching characters in textual as well handwritten documents further decreases correct segmentation as well as recognition rate drastically. Because one can not control the size and shape of characters in handwritten documents so the segmentation process for the handwritten document is too difficult. We tried to segment handwritten text by proposing some algorithms, which were implemented and have shown encouraging results. Algorithms have been proposed to segment the touching characters. These algorithms have shown a reasonable improvement in segmenting the touching handwritten characters in Gurmukhi script. Keywords: Character Segmentation, Middle Zone, Upper Zone, Lower Zone, Touching Characters. Handwritten, OCR 1. INTRODUCTION In optical character recognition (OCR), a perfect segmentation of characters is required before individual characters are recognized. Therefore segmentation techniques are to apply to word images before actually putting those images to reorganization process. The simplest way to segment the characters is to use inter – character gap as a segmentation point[1] . However, this technique results in partial failure if the text to be segmented contains touching characters. The situation becomes grim if text consists of handwritten characters. The motivation behind this paper is that to find out a reasonable solution to segment handwritten touching characters in Gurmukhi script. Gurmukhi script is one of the popular scripts used to write Punjabi language which is one of popular spoken language of northern India. Because our work is related with segmentation of Gurumukhi script, so it is better to discuss some characteristics of the said script so that the reader can have a better idea of the work.
  • 2. Rajiv K. Sharma & Dr. Amardeep Singh International Journal of Image Processing, Volume (2): Issue (3) 13 2. CHARACTERISTICS OF GURMUKHI SCRIPT Gurmukhi script alphabet consists of 41 consonants and 12 vowels[2] as shown in FIGURE 2. Besides these, some characters in the form of half characters are present in the feet of characters. Writing style is from left to right. In Gurmukhi, There is no concept of upper or lowercase characters. A line of Gurmukhi script can be partitioned into three horizontal zones namely, upper zone, middle zone and lower zone. Consonants are generally present in the middle zone. These zones are shown in FIGURE 1. The upper and lower zones may contain parts of vowel modifiers and diacritical markers. FIGURE 1 : a) Upper zone from line number 1 to 2, b) Middle Zone from line number 3 to 4, c) lower zone from line number 4 to 5 In Gurmukhi Script, most of the characters, as shown in FIGURE 2, contain a horizontal line at the upper of the middle zone. This line is called the headline. The characters in a word are connected through the headline along with some symbols as i, I, A etc. The headline helps in the recognition of script line positions and character segmentation. The segmentation problem for Gurmukhi script is entirely different from scripts of other common languages such as English, Chinese, and Urdu [3] etc. In Roman script, windows enclosing each character composing a word do not share the same pixel values in horizontal direction. But in Gurmukhi script, as shown in FIGURE 1, two or more characters/symbols of same word may share the same pixel values in horizontal direction. This adds to the complication of segmentation problem in Gurmukhi script. Because of these differences in the physical structure of Gurmukhi characters from those of Roman, Chinese, Consonants (Vianjans) FIGURE 2 a) : Consonants (Vianjans)
  • 3. Rajiv K. Sharma & Dr. Amardeep Singh International Journal of Image Processing, Volume (2): Issue (3) 14 Vowels and Vowel diacritics (Laga Matra) FIGURE 2 b) : Vowels and Vowel diacritics (Laga Matra) Other symbols FIGURE 2 c) : Other symbols Japanese and Arabic scripts, the existing algorithms for character segmentation of these scripts does not work efficiently for handwritten Gurmukhi script. 3. PREPROCESSING Preprocessing is applied on the input binary document so that the effect of spurious noise can be minimized in the subsequent processing stages. In the present study, both salt and peeper noise have been removed using standard algorithm [4] . It is supposed that height and width of document can be known easily. The image is saved in the form of an array. For that purpose a 2-D array with number of rows equal to height of the document and number of columns equal to width of the document is created. Calculate the maximum intensity of pixels in the document using any standard function available in the tool used for the implementation, it is getRGB() method available in java. Scan every pixel of document and compare its intensity with the maximum intensity. If the intensity is equal to maximum intensity, store one in the array at that location, and if it is not equal store zero in the array. 4. PROPOSED PROCEDURES TO SEGMENT LINE, WORD and CHARACTER Line Detection The following procedure is implemented to find the location of lines in the document. i. Create an array of size equal to height of the document and with two columns.
  • 4. Rajiv K. Sharma & Dr. Amardeep Singh International Journal of Image Processing, Volume (2): Issue (3) 15 ii. Start from the first row and count the number of 1’s in that row. If it is zero, move to next row. And if it is not zero, that is the starting location of that line. Store that location in the array. iii. Check consecutive rows until we get 0.The before we get zero is the ending location of that line. Store that value in the array. iv. Also calculate the location of maximum intensity in each line and store it in the second column before that line. It would be used as the starting position of characters. v. Repeat step (ii) to (iv) for the whole document. Word Detection The following procedure is implemented to find location of words in each line. i. Create a 2-D array. ii. For each line move from 0th pixel up to width. iii. Calculate the number of one’s in first column from the starting location of line to the ending position of line. iv. If number of 1’s are not zero, that is the starting location of word. Save that location in that array. Keep on moving to the right until we get no one in any column. The column with 0 1’s is the ending location of the word. Store that location in array too. v. Repeat this until we reach the width. vi. And repeat step (ii) to (v) for each line. Character Detection The following procedure is implemented to find the location of character in each word. i. Create a 3-d array. Its first index will represent line number. Second index will represent word number and third index will contain the location of character. This array will be created dynamically. ii. Repeat the step (iii) to (iv) for each line and each word detected so far. iii. Move from starting position of the word to the ending position of the word. iv. Start from the starting position of line and move downwards to the ending position. Count the number of one’s in that column leaving the location of line with maximum intensity. If it is not zero, that is the starting position of character. Move to right until we get column with no ones. that will be the ending location of character. This process will generate the location of characters. The above approach was put to number of documents; the image of one such scanned document is given below. FIGURE 3: Scanned Image of a Document The result of the scanned document after processing is given below.
  • 5. Rajiv K. Sharma & Dr. Amardeep Singh International Journal of Image Processing, Volume (2): Issue (3) 16 FIGURE 4: Processed Document The main objective of the work was to segment the lines, words and to segment the touching characters present in handwritten document in Gurmukhi script. We obtained the following table after putting handwritten Gurmukhi documents for segmentation. The results are summarized as in following tables: Document No of Lines Correctly Detected Inaccurate segmentation Accuracy Doc1 5 4 1 80% Doc2 8 7 1 87.5% Doc3 10 8 2 80% Doc4 13 11 2 84.61 TABLE 1: ACCURACY for Line Segmentation Document No of Words Correctly Detected Inaccurate segmentation Accuracy Doc1 38 32 6 84.21% Doc2 56 49 7 87.5% Doc3 95 79 16 83.15% Doc4 110 90 20 81.81 TABLE 2: ACCURACY for Word Segmentation Document No of Characters Correctly Detected Inaccurate segmentation Accuracy Doc1 79 71 8 89.8% Doc2 168 145 23 86.30% Doc3 224 175 49 78.12% Doc4 289 232 57 80.27 TABLE 3: ACCURACY for Character Segmentation 5. CONCLUSION AND FUTURE WORK This work was carried out to detect lines present in scanned document in handwritten Gurumukhi script. So firstly we are to find out the lines present in the document then to find words present in each line detected at the first step. Using the detected words it is to segment characters present in each word. Therefore using line detection algorithm (the first approach) lines were detected. Mostly we found the correct lines, but some were not detected correctly. The reason behind this may be the writing style of Gurumukhi script as shown in FIGURE 1. So the words presents in the
  • 6. Rajiv K. Sharma & Dr. Amardeep Singh International Journal of Image Processing, Volume (2): Issue (3) 17 lower zone were considered as a different line. The correctly detected lines were further put to word detection algorithm. Here the results were good, but sometimes when the words were not joined properly then that was detected as a different word. The locations of the detected words were used to segment the characters. At few point segmentation was good but at few point it was not upto the expectations. This may be because of the similarity in the shapes of few characters. All these issues can be dealt in the future for handwritten documents written in 2-dimensional script like Gurumukhi by making few changes to proposed work. 6. REFERENCES 1. Y. Lu. “Machine Printed Character Segmentation – an Overview”. Pattern Recognition, vol. 29(1): 67-80, 1995 2. M. K. Jindal, G. S. Lehal, and R. K. Sharma. “Segmentation Problems and Solutions in Printed Degraded Gurmukhi Script”. IJSP, Vol 2(4),2005:ISSN 1304-4494. 3. G. S .Lehal and Chandan Singh. “Text segmentation of machine printed Gurmukhi script”. Document Recognition and Retrieval VIII, Proceedings SPIE, USA, vol. 4307: 223-231, 2001. 4. Serban, Rajjan and Raymund. “Proposed Heuristic Procedures to Preprocesses Character Pattern using Line Adjacency Graphs”. Pattern recognition, vol. 29(6): 951- 975, 1996. 5. Veena Bansal and R.M.K. Sinha. “Segmentation of touching and Fused Devanagari characters, ". Pattern recognition, vol. 35: 875-893, 2002. 6. R. G. Casy and E. Lecolinet. “A survey of methods and strategies in character segmentation”. IEEE PAMI, Vol. 18:690 – 706,1996. 7. U. Pal and Sagarika Datta. “Segmentation of Bangla Unconstrained Handwritten Text”. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR ), 2003. 8. U. Pal, S. Sinha and B. B. Chaudhuri. “Multi-Script Line identification from Indian Documents”, Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR) 2003. 9. Rajean Plamondon, Sargur N. Srihari. “On – Line and Off – Line Handwritting Recognition: A Comprehensive Survey”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol 22(1). Janurary, 2000. 10. Giovanni Seni and Edward Cohen. “ External word segmentation of off – line handwritten text lines”. Pattern Recognition, Vol. 27(1): 41-52, 1994.