SlideShare a Scribd company logo
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
DOI : 10.5121/acij.2012.3407 51
FREEMAN CODE BASED ONLINE HANDWRITTEN
CHARACTER RECOGNITION FOR MALAYALAM
USING BACKPROPAGATION NEURAL NETWORKS
Amritha Sampath1
, Tripti C2
and Govindaru V3
1
Department of Computer Science and Engineering, Rajagiri School of Engineering and
Technology, Kochi, India
amrithasampath@yahoo.com
2
Department of Computer Science and Engineering, Rajagiri School of Engineering and
Technology, Kochi, India
triptic@rajagiritech.ac.in
3
Computational Linguistics, Centre for Development of Imaging Technology, Kerala,
India
neithalloor@gmail.com
ABSTRACT
Handwritten character recognition is conversion of handwritten text to machine readable and editable
form. Online character recognition deals with live conversion of characters. Malayalam is a language
spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and
Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The
method aims at training a simple neural network with three layers using backpropagation algorithm.
Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs
to the network during the training and testing phases of the neural network. The output is the character
expressed in the Unicode format.
KEYWORDS
Freeman code;Backpropagation Neural Networks; Unicode
1. INTRODUCTION
Optical character recognition (OCR) can be based on conversion of typewritten or printed
characters as in textbooks or it can deal with conversion of handwritten text into machine
editable form. Both have their own applications. Conversion of handwritten characters is
important for making several important documents related to our history, such as manuscripts,
into machine editable form so that it can be easily accessed and preserved. A search is difficult
when information is available in a form which is not recognizable by the machine. If it is
converted into a machine recognizable form, the search becomes fast and easier. The method of
conversion of already existing information is called Offline character recognition. Such systems
are called OCR systems.
Online recognition is important as an alternate method of data input. Languages like Malayalam
have large character set, hence difficult to have a keyboard which can be used easily. So for
online data acquisition a digital pen or stylus can be used. But due to limitations of the device or
speed of writing, or tremble, it is possible for a single character to be broken into different parts,
hence creating confusion in recognition.
Handwritten character recognition is generally more difficult than conversion of printer and
typed characters since, in latter there is a standard set of fonts to which it can be mapped.
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
52
However, handwritings vary from person to person and also for a person it may vary from time
to time according to his/her mood, urgency, etc. Hence handwritten character recognition is a
difficult task. Existing character recognition software in Malayalam focuses on conversion of
printed or typed texts.
Handwritten character recognition has been developed for several languages. But its difficulty
in Malayalam can be attributed to several reasons like complexity and similarity in the way
characters are written and also due to large character set. Malayalam can be written either in the
old lipi or in new lipi as shown in Table 1. Hence number of characters to be recognized will
almost be doubled.
Table 1. Characters in old and new lipis.
Online and offline character recognition requires basically four steps.
1. Pre-processing
2. Feature extraction
3. Recognition
4. Post processing
The method used in these basic steps varies according to the application.
2. RELATED WORK
Though there has been a lot of study on the handwritten character recognition in many
languages, an efficient system in Malayalam has not yet been developed. Most of the research
has been based on the offline character recognition and on typed text. Malayalam consists of
characters with loops and curves, with most of the characters being written in the clockwise
direction.
An OCR system for Malayalam has been developed which uses the number of horizontal and
vertical lines for the identification of the characters[1]. It includes pre-processing, character
extraction and skeltonization phases before the actual recognition takes place. The recognition
module include functions which calculate the number and position of horizontal and vertical
lines which forms the feature that distinguishes each character from another. Offline recognition
of Malayalam characters using chain code histogram and normalised chain code histogram has
also been developed[2]. Chain code is used to represent the boundary of the character and is
stored as location and direction of line segments of specified length. Centroid of the image is
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
53
also taken to improve the result. Online system which uses a combination of context bitmap and
normalised (x,y) co-ordinates has also been developed[3]. It uses Kohonen network for
recognition. A recognition system developed in Tamil[4], which is another prominent language
used in south India, uses a post-processing stage to distinguish between two confusing
characters.
3. METHOD OF IMPLEMENTATION
The method proposes different processing techniques for each of the four steps mentioned.
3.1. Pre-processing
Pre-processing includes noise removal.
A noise is a mark made on the writing surface which is not to be taken as a part of the input.
Noise will be different from the actual input in its characteristics. A stroke can be defined as a
set of points taken from a pen-down position to pen-up position. It is a trajectory followed by
the pin tip from the point when it makes the first contact with the writing surface to the point
when it leaves the surface. The time taken to make a noise stroke will be either too high or too
low when compared with the average time to make a stoke of the actual character. Also, if noise
is much away from the rectangular area and number of pixels is less than a threshold, it can be
removed as noise.
3.2. Feature Extraction
Feature extraction is the next step after pre-processing. We need to identify unique features that
can be used to uniquely identify every character in the character set of the language.
Feature extracted can be either low level or high level. Low level features include width, height,
curliness, aspect ratio etc of the character. These alone cannot be used to distinguish one
character from another in the character set of the language. So, there are a number of other high
level features which include number and position of loops, straight lines, head lines, curves etc.
One feature that can be used for identification is direction information which is collected online.
It is based on Freeman codes as shown in Figure 1.
Figure 1. Freeman codes[5]
Starting from the point when first contact is made with the writing surface, direction in which
the pen tip moves is recorded. 1 for NE, 2 for E, 3 for SE etc will be stored as a single
dimensional array. Direction is recorded only when there is change in direction to avoid
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
54
dependence of length of line segments in the character. Also in order to mark crossings of line
segments, a character 9 can be used. This array is used as a feature vector for classification.
Figure 2. Sample input given during training
Figure 2 shows a sample input ‘ga’ given as input for training. The input will be coded into
feature list which is stored as a linked list. For the given sample input, the feature vector is ‘[3,
4, 3, 2, 3, 2, 1, 2, 1, 0, 1, 0, 1, 0, 1, 2, 3, 4, 3, 4, 5]’. An issue, that arise when creating the
feature vector based on direction of pen movement is that, instead of storing a ‘1’ in the feature
vector for the NE direction, it may store it as ‘2’ followed by ‘0’. This issue arises due to
irregularities in writing caused due to the inexperience of the user in using the device, shivering
during writing etc. This can be avoided by extracting the direction formed between points 2
pixels apart rather than adjacent pixels. This greatly helps to reduce the size of feature vector
and makes it more accurate.
3.3. Classification
Several techniques such as k-Nearest Neighbor (k-NN) [6], Bayes Classifier, Neural Networks
(NN), Hidden Markov Models (HMM), Support Vector Machines (SVM), etc exist for the
purpose of classification. One of the commonly used techniques is neural networks. Neural
networks consist of a number of nodes and links arranged as different layers as shown in Figure
3. Different links which connect different layers are associated with weights. Input to a node is
the sum of product of activation and weight associated with the link. The weights must be
selected such that inputs map to their corresponding outputs.
Usually a neural network consists of a training phase, a validating phase and a testing phase.
During the training phase, features extracted will be used for training the network to map input
to output. The training set consists of feature vectors for each of the characters that must be
recognized by the network. A training cycle consists of a forward pass and a reverse pass.
Backpropagation algorithm is a supervised training algorithm which is used to train the network
by adjusting the weights according to Delta rule.
Initially, the weights in the neural network are assigned small random weights. The input is
applied to the network. In each layer, the input is multiplied with the corresponding weights and
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
55
an activation function such as sigmoid function is applied at each node. This acts like a
squashing function.
The formula of sigmoid activation is: f(x) =1/(1 + e−input
).
The output obtained from the last layer is compared with the expected output. This gives the
error. This is propagated backwards and the weights associated with the links in each layer is
modified as weight(old) + learning rate * output error * output(neurons i) * output(neurons i+1)
* ( 1 - output(neurons i+1) )
Figure 3. A simple neural network
Figure 4. A node in a neural network[7]
After one training cycle is complete, the next input feature vector is applied and the process is
repeated for all the feature vectors in the training set. This completes one training epoch. The
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
56
network requires several training epochs before the network learns to recognize the characters
in the training set.
After training, the input feature vectors are applied to test the network. In the validating phase,
feature vectors that were not applied during traing is given as input and the results are verified.
3.4. Post-processing
Post-processing [4] involves steps to be taken after classification using neural network is
completed. It may include steps like representing the output in Unicode format and also
disambiguation of confusing pairs such as ‘Pa’ and ‘Va’(shown in Figure 5).
Figure 5. Pair of characters in confusion set.
This pair will have almost same direction feature vectors. So some additional disambiguating
technique should be used for such confusing pairs. Eg. In case of ‘Pa’ and ‘Va’, the number of
pixels above and below the horizontal axis can be compared. Such disambiguation technique is
to be devised as post-processing mechanism for every confusing pair identified during the
training of the classifier.
Hence the entire methodology is shown in Figure 6.
Figure 6. Steps involved in training and testing phases of a classifier network
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
57
4. OUTPUT
Figure 7. Handwritten character being converted to machine readable form
The handwritten Malayalam acquired by a digital pen or stylus will be converted into editable
characters in the computer in one of the recognized fonts of the Malayalam language.
Characters are recognised using Unicode 5.01 or above.
3. CONCLUSION AND FUTURE WORK
The method identified is used to recognize single character at a time. Here each character,
whether it is a consonant character, a vowel character or a dependent vowel symbol (which
usually comes along with consonant character) will be identified as separate character and will
be assigned separate unicodes as shown in Figure 8.
Figure 8. Instead of 0D08, we store it as two separate characters with unicodes 0D07 and 0D57
The handwritten Malayalam acquired by a digital pen or stylus will be converted into editable
characters in the computer in one of the recognized fonts of the Malayalam language.
Characters are recognised using Unicode 5.01 or above.
When trying to extend the system to identify words, an additional step is required during post-
processing which combines these two characters, which actually form a single entity, into a
single character.
Disambiguation of characters can be done based on the position and meaning each character
gives to the word, hence making the system more efficient. Eg. The letters ‘tta’ having Unicode
0D20 and the Malayalam sign anuswara having Unicode 0D02, have the same representation.
They can be disambiguated based on position and neighboring character rules of the language.
Also, additional mechanisms such as automatic completion of the word, spell checker, etc can
be incorporated into the system.
Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012
58
REFERENCES
[1] Abdul Rahiman M, M S Rajasree, Masha N, Rema M , Meenakshi R, Manoj Kumar G,
“Recognition of Handwritten Malayalam Characters using Vertical & Horizontal Line Positional
Analyzer Algorithm”, IEEE, pp 268-274, 2011.
[2] Jomy John, Pramod K. V, Kannan Balakrishnan, “Offline Handwritten Malayalam Character
Recognition Based on Chain Code Histogram”, Proceedings ofICETECT, pp 736-741, 2011.
[3] Sreeraj.M, Sumam Mary Idicula,“On-Line Handwritten Character Recognition using Kohonen
Networks”, World Congress on Nature & Biologically Inspired Computing (NaBIC 2009),pp
1425-1430,2009
[4] Suresh Sundaram, A G Ramakrishnan, “An Improved Online Tamil Character Recognition
Engine using Post-Processing Methods”, 10th International Conference on Document Analysis
and Recognition, pp 1216-1220, 2009
[5] Marwan Ali.H. Omer, Shi Long Ma, “Online Arabic Handwriting Character Recognition Using
Matching Algorithm”, IEEE, pp259-262, 2010
[6] Sreeraj.M, Sumam Mary Idicula, “k-NN based On-Line Handwritten Character recognition
system”, First International Conference on Integrated Intelligent Computing, pp 171-176, 2010
[7] http://www.learnartificialneuralnetworks.com/
Authors
Amritha Sampath is a postgraduate engineering student in Computer Science and
Engineering at Rajagiri School of Engineering and Technology, Kochi, India. She completed
her graduation in Computer Science and Engineering and secured high score in Graduate
Aptitude Test in Engineering.
Tripti. C is is working as Assistant Professor in the Department of Computer Science and
Engineering at Rajagiri School of Engineering & Technology, India. She is a postgraduate in
Computer Science and Engineering from CDAC Noida, India and is now pursuing Ph.D
from Cochin University of Science and Technology, in the area of vehicular and Adhoc
networks. She did her graduation in electronics and communication engineering from
Rajagiri School of Engineering and Technology.
Dr. Govindaru V did his Ph.d from ISEC, Bangalore, India. He did his post graduation from
Jawaharlal Nehru University, India. Now he is working as head of Research and
Development Division in C-DIT, Triruvananthapuram, India.

More Related Content

PDF
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
PDF
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
PDF
E123440
PDF
Two Methods for Recognition of Hand Written Farsi Characters
PDF
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
PDF
S TRUCTURAL F EATURES F OR R ECOGNITION O F H AND W RITTEN K ANNADA C ...
PDF
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
PPT
An OCR System for recognition of Urdu text in Nastaliq Font
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
OPTICAL BRAILLE TRANSLATOR FOR SINHALA BRAILLE SYSTEM: PAPER COMMUNICATION TO...
E123440
Two Methods for Recognition of Hand Written Farsi Characters
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
S TRUCTURAL F EATURES F OR R ECOGNITION O F H AND W RITTEN K ANNADA C ...
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
An OCR System for recognition of Urdu text in Nastaliq Font

What's hot (18)

PDF
Fragmentation of handwritten touching characters in devanagari script
PDF
40120130406014 2
PDF
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
PDF
Handwritten character recognition in
PDF
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
PDF
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
PPTX
Handwriting Recognition
PPTX
Handwritten Character Recognition
PPTX
Handwriting Recognition Using Deep Learning and Computer Version
PDF
Conversion of braille to text in English, hindi and tamil languages
PDF
Automated Bangla sign language translation system for alphabets by means of M...
PDF
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
PDF
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
PDF
Character recognition for bi lingual mixed-type characters using artificial n...
PDF
RECOGNITION AND CONVERSION OF HANDWRITTEN MODI CHARACTERS
DOCX
Opticalcharacter recognition
PPTX
Handwritten character recognition using artificial neural network
PDF
Performance Comparison between Different Feature Extraction Techniques with S...
Fragmentation of handwritten touching characters in devanagari script
40120130406014 2
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten character recognition in
Preprocessing Phase for Offline Arabic Handwritten Character Recognition
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
Handwriting Recognition
Handwritten Character Recognition
Handwriting Recognition Using Deep Learning and Computer Version
Conversion of braille to text in English, hindi and tamil languages
Automated Bangla sign language translation system for alphabets by means of M...
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH
Character recognition for bi lingual mixed-type characters using artificial n...
RECOGNITION AND CONVERSION OF HANDWRITTEN MODI CHARACTERS
Opticalcharacter recognition
Handwritten character recognition using artificial neural network
Performance Comparison between Different Feature Extraction Techniques with S...
Ad

Similar to FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS (20)

PDF
O45018291
PDF
Co4201605611
PDF
IRJET- Optical Character Recognition using Image Processing
PDF
An exhaustive font and size invariant classification scheme for ocr of devana...
PDF
Handwritten Script Recognition
PDF
Online Hand Written Character Recognition
PDF
A Comprehensive Study On Handwritten Character Recognition System
PDF
A017240107
PDF
Isolated Kannada Character Recognition using Chain Code Features
PDF
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
PDF
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
PDF
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
PDF
A Review On Recognition Of Online Handwriting In Different Scripts
PDF
Character Recognition (Devanagari Script)
PDF
An effective approach to offline arabic handwriting recognition
PDF
An Optical Character Recognition for Handwritten Devanagari Script
PDF
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
PDF
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
PDF
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
PDF
L017116064
O45018291
Co4201605611
IRJET- Optical Character Recognition using Image Processing
An exhaustive font and size invariant classification scheme for ocr of devana...
Handwritten Script Recognition
Online Hand Written Character Recognition
A Comprehensive Study On Handwritten Character Recognition System
A017240107
Isolated Kannada Character Recognition using Chain Code Features
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
A Review On Recognition Of Online Handwriting In Different Scripts
Character Recognition (Devanagari Script)
An effective approach to offline arabic handwriting recognition
An Optical Character Recognition for Handwritten Devanagari Script
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
OCR-THE 3 LAYERED APPROACH FOR DECISION MAKING STATE AND IDENTIFICATION OF TE...
L017116064
Ad

More from acijjournal (20)

PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PDF
July 2025-Top 10 Read articles ACIJ Advanced Computing: An International Jour...
PDF
MODEL AND ALGORITHM FOR INCREASING THE EFFICIENCY OF REMOTE SERVICE SYSTEMS S...
PDF
15th International Conference on Computer Science, Engineering and Applicatio...
PDF
4th International Conference on Computer Science and Information Technology (...
PDF
APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS
PDF
4th International Conference on Computer Science and Information Technology (...
PDF
Application and Analysis of Ensemble Algorithms in Solving Regression Problems
PDF
17th International Conference on Networks & Communications (NeTCoM 2025)
PDF
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
PDF
Advanced Computing: An International Journal (ACIJ)
PDF
6 th International Conference on Data Mining and Software Engineering (DMSE 2...
PDF
ARTICLE :OVERVIEW OF STRUCTURE FROM MOTION
PDF
14th International Conference on Advanced Information Technologies and Applic...
PDF
2nd International Conference on Information Technology Convergence Services &...
PDF
Advanced Computing: An International Journal ( ACIJ )
PDF
3rd International Conference on Computer Science, Engineering and Artificia...
PDF
6th International Conference on Big Data and Machine Learning (BDML 2025)
PDF
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
PDF
4th International Conference on Computing and Information Technology Trends (...
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
July 2025-Top 10 Read articles ACIJ Advanced Computing: An International Jour...
MODEL AND ALGORITHM FOR INCREASING THE EFFICIENCY OF REMOTE SERVICE SYSTEMS S...
15th International Conference on Computer Science, Engineering and Applicatio...
4th International Conference on Computer Science and Information Technology (...
APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS
4th International Conference on Computer Science and Information Technology (...
Application and Analysis of Ensemble Algorithms in Solving Regression Problems
17th International Conference on Networks & Communications (NeTCoM 2025)
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
Advanced Computing: An International Journal (ACIJ)
6 th International Conference on Data Mining and Software Engineering (DMSE 2...
ARTICLE :OVERVIEW OF STRUCTURE FROM MOTION
14th International Conference on Advanced Information Technologies and Applic...
2nd International Conference on Information Technology Convergence Services &...
Advanced Computing: An International Journal ( ACIJ )
3rd International Conference on Computer Science, Engineering and Artificia...
6th International Conference on Big Data and Machine Learning (BDML 2025)
METHODS AND ALGORITHMS FOR ASSESSING COMPUTER NETWORK PERFORMANCE
4th International Conference on Computing and Information Technology Trends (...

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Welding lecture in detail for understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
composite construction of structures.pdf
PPTX
Geodesy 1.pptx...............................................
DOCX
573137875-Attendance-Management-System-original
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Embodied AI: Ushering in the Next Era of Intelligent Systems
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Welding lecture in detail for understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
UNIT 4 Total Quality Management .pptx
Internet of Things (IOT) - A guide to understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
composite construction of structures.pdf
Geodesy 1.pptx...............................................
573137875-Attendance-Management-System-original
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Automation-in-Manufacturing-Chapter-Introduction.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS

  • 1. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 DOI : 10.5121/acij.2012.3407 51 FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USING BACKPROPAGATION NEURAL NETWORKS Amritha Sampath1 , Tripti C2 and Govindaru V3 1 Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology, Kochi, India amrithasampath@yahoo.com 2 Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology, Kochi, India triptic@rajagiritech.ac.in 3 Computational Linguistics, Centre for Development of Imaging Technology, Kerala, India neithalloor@gmail.com ABSTRACT Handwritten character recognition is conversion of handwritten text to machine readable and editable form. Online character recognition deals with live conversion of characters. Malayalam is a language spoken by millions of people in the state of Kerala and the union territories of Lakshadweep and Pondicherry in India. It is written mostly in clockwise direction and consists of loops and curves. The method aims at training a simple neural network with three layers using backpropagation algorithm. Freeman codes are used to represent each character as feature vector. These feature vectors act as inputs to the network during the training and testing phases of the neural network. The output is the character expressed in the Unicode format. KEYWORDS Freeman code;Backpropagation Neural Networks; Unicode 1. INTRODUCTION Optical character recognition (OCR) can be based on conversion of typewritten or printed characters as in textbooks or it can deal with conversion of handwritten text into machine editable form. Both have their own applications. Conversion of handwritten characters is important for making several important documents related to our history, such as manuscripts, into machine editable form so that it can be easily accessed and preserved. A search is difficult when information is available in a form which is not recognizable by the machine. If it is converted into a machine recognizable form, the search becomes fast and easier. The method of conversion of already existing information is called Offline character recognition. Such systems are called OCR systems. Online recognition is important as an alternate method of data input. Languages like Malayalam have large character set, hence difficult to have a keyboard which can be used easily. So for online data acquisition a digital pen or stylus can be used. But due to limitations of the device or speed of writing, or tremble, it is possible for a single character to be broken into different parts, hence creating confusion in recognition. Handwritten character recognition is generally more difficult than conversion of printer and typed characters since, in latter there is a standard set of fonts to which it can be mapped.
  • 2. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 52 However, handwritings vary from person to person and also for a person it may vary from time to time according to his/her mood, urgency, etc. Hence handwritten character recognition is a difficult task. Existing character recognition software in Malayalam focuses on conversion of printed or typed texts. Handwritten character recognition has been developed for several languages. But its difficulty in Malayalam can be attributed to several reasons like complexity and similarity in the way characters are written and also due to large character set. Malayalam can be written either in the old lipi or in new lipi as shown in Table 1. Hence number of characters to be recognized will almost be doubled. Table 1. Characters in old and new lipis. Online and offline character recognition requires basically four steps. 1. Pre-processing 2. Feature extraction 3. Recognition 4. Post processing The method used in these basic steps varies according to the application. 2. RELATED WORK Though there has been a lot of study on the handwritten character recognition in many languages, an efficient system in Malayalam has not yet been developed. Most of the research has been based on the offline character recognition and on typed text. Malayalam consists of characters with loops and curves, with most of the characters being written in the clockwise direction. An OCR system for Malayalam has been developed which uses the number of horizontal and vertical lines for the identification of the characters[1]. It includes pre-processing, character extraction and skeltonization phases before the actual recognition takes place. The recognition module include functions which calculate the number and position of horizontal and vertical lines which forms the feature that distinguishes each character from another. Offline recognition of Malayalam characters using chain code histogram and normalised chain code histogram has also been developed[2]. Chain code is used to represent the boundary of the character and is stored as location and direction of line segments of specified length. Centroid of the image is
  • 3. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 53 also taken to improve the result. Online system which uses a combination of context bitmap and normalised (x,y) co-ordinates has also been developed[3]. It uses Kohonen network for recognition. A recognition system developed in Tamil[4], which is another prominent language used in south India, uses a post-processing stage to distinguish between two confusing characters. 3. METHOD OF IMPLEMENTATION The method proposes different processing techniques for each of the four steps mentioned. 3.1. Pre-processing Pre-processing includes noise removal. A noise is a mark made on the writing surface which is not to be taken as a part of the input. Noise will be different from the actual input in its characteristics. A stroke can be defined as a set of points taken from a pen-down position to pen-up position. It is a trajectory followed by the pin tip from the point when it makes the first contact with the writing surface to the point when it leaves the surface. The time taken to make a noise stroke will be either too high or too low when compared with the average time to make a stoke of the actual character. Also, if noise is much away from the rectangular area and number of pixels is less than a threshold, it can be removed as noise. 3.2. Feature Extraction Feature extraction is the next step after pre-processing. We need to identify unique features that can be used to uniquely identify every character in the character set of the language. Feature extracted can be either low level or high level. Low level features include width, height, curliness, aspect ratio etc of the character. These alone cannot be used to distinguish one character from another in the character set of the language. So, there are a number of other high level features which include number and position of loops, straight lines, head lines, curves etc. One feature that can be used for identification is direction information which is collected online. It is based on Freeman codes as shown in Figure 1. Figure 1. Freeman codes[5] Starting from the point when first contact is made with the writing surface, direction in which the pen tip moves is recorded. 1 for NE, 2 for E, 3 for SE etc will be stored as a single dimensional array. Direction is recorded only when there is change in direction to avoid
  • 4. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 54 dependence of length of line segments in the character. Also in order to mark crossings of line segments, a character 9 can be used. This array is used as a feature vector for classification. Figure 2. Sample input given during training Figure 2 shows a sample input ‘ga’ given as input for training. The input will be coded into feature list which is stored as a linked list. For the given sample input, the feature vector is ‘[3, 4, 3, 2, 3, 2, 1, 2, 1, 0, 1, 0, 1, 0, 1, 2, 3, 4, 3, 4, 5]’. An issue, that arise when creating the feature vector based on direction of pen movement is that, instead of storing a ‘1’ in the feature vector for the NE direction, it may store it as ‘2’ followed by ‘0’. This issue arises due to irregularities in writing caused due to the inexperience of the user in using the device, shivering during writing etc. This can be avoided by extracting the direction formed between points 2 pixels apart rather than adjacent pixels. This greatly helps to reduce the size of feature vector and makes it more accurate. 3.3. Classification Several techniques such as k-Nearest Neighbor (k-NN) [6], Bayes Classifier, Neural Networks (NN), Hidden Markov Models (HMM), Support Vector Machines (SVM), etc exist for the purpose of classification. One of the commonly used techniques is neural networks. Neural networks consist of a number of nodes and links arranged as different layers as shown in Figure 3. Different links which connect different layers are associated with weights. Input to a node is the sum of product of activation and weight associated with the link. The weights must be selected such that inputs map to their corresponding outputs. Usually a neural network consists of a training phase, a validating phase and a testing phase. During the training phase, features extracted will be used for training the network to map input to output. The training set consists of feature vectors for each of the characters that must be recognized by the network. A training cycle consists of a forward pass and a reverse pass. Backpropagation algorithm is a supervised training algorithm which is used to train the network by adjusting the weights according to Delta rule. Initially, the weights in the neural network are assigned small random weights. The input is applied to the network. In each layer, the input is multiplied with the corresponding weights and
  • 5. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 55 an activation function such as sigmoid function is applied at each node. This acts like a squashing function. The formula of sigmoid activation is: f(x) =1/(1 + e−input ). The output obtained from the last layer is compared with the expected output. This gives the error. This is propagated backwards and the weights associated with the links in each layer is modified as weight(old) + learning rate * output error * output(neurons i) * output(neurons i+1) * ( 1 - output(neurons i+1) ) Figure 3. A simple neural network Figure 4. A node in a neural network[7] After one training cycle is complete, the next input feature vector is applied and the process is repeated for all the feature vectors in the training set. This completes one training epoch. The
  • 6. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 56 network requires several training epochs before the network learns to recognize the characters in the training set. After training, the input feature vectors are applied to test the network. In the validating phase, feature vectors that were not applied during traing is given as input and the results are verified. 3.4. Post-processing Post-processing [4] involves steps to be taken after classification using neural network is completed. It may include steps like representing the output in Unicode format and also disambiguation of confusing pairs such as ‘Pa’ and ‘Va’(shown in Figure 5). Figure 5. Pair of characters in confusion set. This pair will have almost same direction feature vectors. So some additional disambiguating technique should be used for such confusing pairs. Eg. In case of ‘Pa’ and ‘Va’, the number of pixels above and below the horizontal axis can be compared. Such disambiguation technique is to be devised as post-processing mechanism for every confusing pair identified during the training of the classifier. Hence the entire methodology is shown in Figure 6. Figure 6. Steps involved in training and testing phases of a classifier network
  • 7. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 57 4. OUTPUT Figure 7. Handwritten character being converted to machine readable form The handwritten Malayalam acquired by a digital pen or stylus will be converted into editable characters in the computer in one of the recognized fonts of the Malayalam language. Characters are recognised using Unicode 5.01 or above. 3. CONCLUSION AND FUTURE WORK The method identified is used to recognize single character at a time. Here each character, whether it is a consonant character, a vowel character or a dependent vowel symbol (which usually comes along with consonant character) will be identified as separate character and will be assigned separate unicodes as shown in Figure 8. Figure 8. Instead of 0D08, we store it as two separate characters with unicodes 0D07 and 0D57 The handwritten Malayalam acquired by a digital pen or stylus will be converted into editable characters in the computer in one of the recognized fonts of the Malayalam language. Characters are recognised using Unicode 5.01 or above. When trying to extend the system to identify words, an additional step is required during post- processing which combines these two characters, which actually form a single entity, into a single character. Disambiguation of characters can be done based on the position and meaning each character gives to the word, hence making the system more efficient. Eg. The letters ‘tta’ having Unicode 0D20 and the Malayalam sign anuswara having Unicode 0D02, have the same representation. They can be disambiguated based on position and neighboring character rules of the language. Also, additional mechanisms such as automatic completion of the word, spell checker, etc can be incorporated into the system.
  • 8. Advanced Computing: An International Journal ( ACIJ ), Vol.3, No.4, July 2012 58 REFERENCES [1] Abdul Rahiman M, M S Rajasree, Masha N, Rema M , Meenakshi R, Manoj Kumar G, “Recognition of Handwritten Malayalam Characters using Vertical & Horizontal Line Positional Analyzer Algorithm”, IEEE, pp 268-274, 2011. [2] Jomy John, Pramod K. V, Kannan Balakrishnan, “Offline Handwritten Malayalam Character Recognition Based on Chain Code Histogram”, Proceedings ofICETECT, pp 736-741, 2011. [3] Sreeraj.M, Sumam Mary Idicula,“On-Line Handwritten Character Recognition using Kohonen Networks”, World Congress on Nature & Biologically Inspired Computing (NaBIC 2009),pp 1425-1430,2009 [4] Suresh Sundaram, A G Ramakrishnan, “An Improved Online Tamil Character Recognition Engine using Post-Processing Methods”, 10th International Conference on Document Analysis and Recognition, pp 1216-1220, 2009 [5] Marwan Ali.H. Omer, Shi Long Ma, “Online Arabic Handwriting Character Recognition Using Matching Algorithm”, IEEE, pp259-262, 2010 [6] Sreeraj.M, Sumam Mary Idicula, “k-NN based On-Line Handwritten Character recognition system”, First International Conference on Integrated Intelligent Computing, pp 171-176, 2010 [7] http://www.learnartificialneuralnetworks.com/ Authors Amritha Sampath is a postgraduate engineering student in Computer Science and Engineering at Rajagiri School of Engineering and Technology, Kochi, India. She completed her graduation in Computer Science and Engineering and secured high score in Graduate Aptitude Test in Engineering. Tripti. C is is working as Assistant Professor in the Department of Computer Science and Engineering at Rajagiri School of Engineering & Technology, India. She is a postgraduate in Computer Science and Engineering from CDAC Noida, India and is now pursuing Ph.D from Cochin University of Science and Technology, in the area of vehicular and Adhoc networks. She did her graduation in electronics and communication engineering from Rajagiri School of Engineering and Technology. Dr. Govindaru V did his Ph.d from ISEC, Bangalore, India. He did his post graduation from Jawaharlal Nehru University, India. Now he is working as head of Research and Development Division in C-DIT, Triruvananthapuram, India.