SlideShare a Scribd company logo
Innovative Systems Design and Engineering                                                    www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012




   Development of a Feature Extraction Technique for Online
               Character Recognition System
                                           Fenwa Olusayo Deborah*
                              Department of Computer Science and Engineering,
                  Ladoke Akintola University of Technology, P.M.B 4000, Ogbomoso, Nigeria.
                        *E-mail of the corresponding author: odfenwa@lautech.edu.ng


                                           Omidiora Elijah Olusayo
                              Department of Computer Science and Engineering,
                  Ladoke Akintola University of Technology, P.M.B 4000, Ogbomoso, Nigeria.
                                      E-mail: omidiorasayo@yahoo.co.uk


                                          Fakolujo Olaosebikan Alade
                              Department of Computer Science and Engineering,
                  Ladoke Akintola University of Technology, P.M.B 4000, Ogbomoso, Nigeria.
                                           E-mail: ola@fakolujo.com


Abstract
Character recognition has been a popular research area for many years because of its various application
potentials. Some of its application areas are postal automation, bank cheque processing, automatic data
entry, signature verification and so on. Nevertheless, recognition of handwritten characters is a problem
that is currently gathering a lot of attention. It has become a difficult problem because of the high
variability and ambiguity in the character shapes written by individuals. A lot of researchers have proposed
many approaches to solve this complex problem but none has been able to solve the problem completely in
all settings. Some of the problems encountered by researchers include selection of efficient feature
extraction method, long network training time, long recognition time and low recognition accuracy. This
paper developed a feature extraction technique for online character recognition system using hybrid of
geometrical and statistical features. Thus, through the integration of geometrical and statistical features,
insights were gained into new character properties, since these types of features were considered to be
complementary.
Keywords: Character recognition, Feature extraction, Geometrical Feature, Statistical Feature, Character.


1. Introduction
Character recognition is the process of applying pattern-matching methods to character shapes that has
been read into a computer to determine which alpha-numeric character, punctuation marks, and symbols
the shapes represent. The classes of recognition systems that are usually distinguished are online systems
for which handwriting data are captured during the writing process (which makes available the information
on the ordering of the strokes) and offline systems for which recognition takes place on a static image
captured once the writing process is over (Anoop and Anil, 2004; Liu et al., 2004; Mohamad and Zafar,
2004; Naser et al., 2009; Pradeep et al., 2011). The online methods have been shown to be superior to their
offline counterpart in recognising handwriting characters due the temporal information available with the
formal (Pradeep et al., 2011). Handwriting recognition system can further be broken down into two

                                                    10
Innovative Systems Design and Engineering                                                      www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

categories: writer independent recognition system which recognizes wide range of possible writing styles
and a writer dependent recognition system which recognizes writing styles only from specific users
(Santosh and Nattee, 2009).
Online handwriting recognition today has special interest due to increased usage of the hand held devices.
The incorporation of keyboard being difficult in the hand held devices demands for alternatives, and in this
respect, online method of giving input with stylus is gaining quite popularity (Gupta et al., 2007).
Recognition of handwritten characters with respect to any language is difficult due to variability of writing
styles, state of mood of individuals, multiple patterns to represent a single character, cursive representation
of character and number of disconnected and multi-stroke characters (Shanthi and Duraiswamy, 2007).
Current technology supporting pen-based input devices include: Digital Pen by Logitech, Smart Pad by
Pocket PC, Digital Tablets by Wacom and Tablet PC by Compaq (Manuel and Joaquim, 2001). Although
these systems with handwriting recognition capability are already widely available in the market, further
improvements can be made on the recognition performances for these applications.
The challenges posed by the online handwritten character recognition systems are to increase the
recognition accuracy and to reduce the recognition time (Rejean and Sargurl, 2000; Gupta et. al., 2007).
Various approaches that have been used by many researchers to develop character recognition systems,
these include; template matching approach, statistical approach, structural approach, neural networks
approach and hybrid approach. Hybrid approach (combination of multiple classifiers) has become a very
active area of research recently (Kittler and Roli, 2000; 2001). It has been demonstrated in a number of
applications that using more than a single classifier in a recognition task can lead to a significant
improvement of the system’s overall performance. Hence, hybrid approach seems to be a promising
approach to improve the recognition rate and recognition accuracy of current handwriting recognition
systems (Simon and Horst, 2004). However, Selection of a feature extraction method is probably the single
most important factor in achieving high recognition performance in character recognition system (Pradeep,
Srinivasan and Himavathi, 2011). No matter how sophisticated the classifiers and learning algorithms, poor
feature extraction will always lead to poor system performance (Marc, Alexandre76 and Christian, 2001).


2. Research Methodology
Hundreds of features which are available in the literature can be categorized as follows: Global
transformation and series expansion features, Statistical features and Geometrical and topological features.
Many feature extraction techniques have been proposed in literature to improve overall recognition rate;
however, most of the techniques used only one property of the handwritten character. This research focuses
on developing a feature extraction technique that combined three characteristics (stroke information,
contour pixels and zoning) of the handwritten character to create a global feature vector. Hence, in this
research work, a hybrid feature extraction algorithm was developed to alleviate the problem of poor feature
extraction algorithm of online character recognition system.


2.1 Development of the Proposed Hybrid Feature Extraction Algorithm
The most important aspect of handwriting character recognition scheme is the selection of good feature set
which is reasonably invariant with respect to shape variation caused by various writing styles. Feature
extraction is the process of extracting from the raw data the information which is the most relevant for
classification purposes, in the sense of minimizing the within-class pattern variability while enhancing the
between-class pattern variability (Naser et al., 2009). Features are unique characteristics that can represent
an image, i.e. a character in this case. Each character is represented as a feature vector, which becomes its
identity. The goal of feature extraction is to extract a set of features, which maximizes the recognition rate
with the least amount of elements. Many feature extraction techniques have been proposed to improve
overall recognition rate; however, most of the techniques used only one property of the handwritten
character. This research focuses on a feature extraction technique that combined three characteristics of the
handwritten character to create a global feature vector. A hybrid feature extraction algorithm was
developed using Geometrical and Statistical features as shown in Figure 2.10. Integration of Geometrical

                                                      11
Innovative Systems Design and Engineering                                                       www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

and Statistical features was used to highlight different character properties, since these types of features are
considered to be complementary (Heute and Paquet, 1998; Cai and Liu, 1998).
2.1.1 Geometrical Features
Various global and local properties of characters can be represented by geometrical and topological
features with high tolerance to distortions and style variations. This type of representation may also encode
some knowledge about the structure of the object or may provide some knowledge as to what sort of
components make up that object. The geometrical features used in this research work were the Stroke
Information and Contour pixel of the characters.


2.1.1.1 Stroke Information
Stroke Information is a combination of local and global features, which are aimed to capture the
geometrical and topological features of the characters and efficiently distinguish and identify the character
from a small subset of characters. Stroke is storage of pen movements in online handwriting recognition.
These movements appear at various positions on view point and joining these positions in first-come-first-
serve basis shows the appearance of drawn text. A character may consist of single or multiple strokes. The
list formed in data collection includes nodes, where each node includes two fields, namely, point and stroke
number. Here, the point represents a coordinate of view point and stroke number represents identity and
sequential order of stroke. Higher recognition performance would be possible if on-line recognition
methods were able to address drawing motion vector (stroke) information (Nishimura and Timikawa,
2003). The feature sets consist of:
(i)      Stroke Number
Stroke number helps in identifying similar points, gaps and crossings. The pen movement consists of three
functions, namely, Pen-Down, Pen-Move and Pen-Up. When one presses, moves, lifts the pen up
consecutively, and more than one point collected, the stroke number is incremented. Pen-Move function
stores movements of pen on writing pad. An example of a digital pen for generating stroke information is
as shown in figure 2.1. Figure 2.2 shows a typical example of how different stroke numbers are generated.
However, only stroke is not enough because most of the time different character may get the same no of
strokes. Therefore, in this research, PEN-UP is used as a feature to check how well the character matches
the standard one (i.e. the average for the same character in the database). This feature is calculated by using
the average strokes of a specific character as an input using the membership function as in Equation 2.1:
         PEN-UP = e|average - x|                                                                          (2.1)
         where x is the real strokes for the specific character.
(ii)     Pressure of the Stroke
This is the pressure representing Pen Ups and Downs in a continuous manner. The use of pen pressure as a
feature is used for the improvement of a basic performance of the writer- independent online character
recognition. The value of the pen pressure exerted on the writing pad was also used as feature. Moreover,
recognition performance could be raised using writing pressure information of on-line writer identification
systems and on-line character recognition systems (Nishimura and Timikawa, 2003).
(iii)    Number of Junctions and their Location
A black pixel is considered to be a junction if there are more than two black pixels in its 5 by 7
neighbourhood in the resolution of the character image. The number of junctions as well as their positions
in terms of 35(5x7) quadrants are recorded. For example the character image of Figure 2.3 has 2 junctions
in quadrants 2 and 17. Junctions lying within a pre-defined radial distance are merged into a single junction
and the junctions associated with the headline are ignored.
(iv)     Horizontal Projection Count
Horizontal Projection Count is represented as HPC(i) =	∑     , , where F(i,j) is a pixel value (1 for black
background and 0 for white foreground) of a character image, and i and j denote row and column positions

                                                       12
Innovative Systems Design and Engineering                                                         www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

of a pixel, with the image’s top left corner set to F(0,0). It is calculated by scanning the image row-wise and
finding the sum of background pixels in each row (Figure 2.4). To take care of variations in character sizes,
the horizontal projection count of a character image is represented by percentage instead of an absolute
value and in this present work it is stored as a 4 component vector where the four components symbolize
the percentage of rows with 1 pixel, 2 pixels, 3 pixels and more than 3 pixels. The components of this
vector for the character image given in Figure 2.4 will be [50, 0, 10, 10], as there are 5 rows with 1 pixel;
no rows with 2 pixels; 1 row with 3 pixels and 1 row with more than 3 pixels.


2.1.1.2 Contour Pixels
Correct extraction of the contour will produce more accurate features that will increase the chances of
correctly classifying a given character or pattern. But the question that might arise is why first extract the
contour of a pattern and then collect its features? Why not collect features directly from the pattern? One
answer is, the contour pixels are generally a small subset of the total number of pixels representing a
pattern. Therefore, the amount of computation is greatly reduced when feature extracting algorithms are run
on the contour instead of the whole pattern. Because the contour shares a lot of features with the original
pattern, but has fewer pixels, the feature extraction process becomes much more efficient when performed
on the contour rather on the original pattern. Contour tracing is often a major contributor to the efficiency
of the feature extraction process, which is an essential process in pattern recognition (Liu et al., 2003; Liu
et al., 2004).
In order to extract the contour of the pattern, he following actions must be taken: every time a black pixel is
encountered, turn left, and every time a white pixel is encountered, turn right, until the starting pixel is met
again. All the black pixels traced out is the contour of the pattern. The contour tracing algorithm used in
this research is based on the model developed by Yamaguchi et al. (2003). This is demonstrated in Figure
2.5 (Liu et al., 2003; 2004).
Contour Tracing Algorithm (Yamaguchi et al., 2003):
Input: An image I containing a connected component P of black pixels.
Output: A sequence B (b1, b2, . . . , bk) of boundary pixels, that is, the outer contour.
Begin
         Set B to be empty
         From bottom to top and left to right scan the cells of I until a black pixel, S, of P is found
         Insert S in B
         Set the current pixel, P, to be the starting pixel, S
         Turn left, that is, visit the left adjacent pixel of P
         Update P, that is, set it to be the current pixel
         While P not equal to S do
         If the current pixel P is black
         Insert P in B and turn left (visit the left adjacent pixel of P)
         Update P, that is, set it to be the current pixel
         Else
         Turn right (visit the right adjacent pixel of P)
         Update P, that is, set it to be the current pixel.
End


2.1.2 Statistical Features


                                                         13
Innovative Systems Design and Engineering                                                        www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

Statistical features are derived from the statistical distribution of points. They provide high speed and low
complexity and take care of style variations to some extent. They may also be used for reducing the
dimension of the feature set. The statistical feature adopted in this research is ‘Zoning’. Zone-based feature
extraction method provides good result even when certain pre processing steps like filtering, smoothing and
slant removing are not considered.
Image Centroid and zone-based (ICZ) distance metric feature extraction and Zone Centroid and zone-based
(ZCZ) distance metric feature extraction algorithms were proposed by Vanajah and Rajashekararadhya in
2008 for the recognition of four popular Indian scripts (Kannada, Telugu, Tamil and Malayalam) numerals.
In this research, hybrid of modified Image Centroid and zone-based (ICZ) distance metric feature
extraction and modified Zone Centroid and zone-based (ZCZ) distance metric feature extraction methods
was used. Modifications of the two algorithms are in terms of:
(i)      Number of zones being used
(ii)     Measurement of the distances from both the Image Centroid and Zone Centroid
(iii)    The area of application.


2.1.2.1 The Zoning Algorithm
The most important aspect of handwriting recognition scheme is the selection of good feature set, which is
reasonably invariant with respect to shape variations caused by various writing styles. The zoning method
is used to compute the percentage of black pixel in each zone. The rectangle circumscribing the character is
divided into several overlapping, or non-overlapping regions and the densities of black points within these
regions are computed and used as features as shown in Figure 2.6. The major advantage of this approach
stems from its robustness to small variation, ease of implementation and good recognition rate. Zone-based
feature extraction method provides good result even when certain pre processing steps like filtering,
smoothing and slant removing are not considered. The detailed description of Zoning Algorithm is given as
follows:
The image (character) is further divided in to ‘n’ equal parts (Twenty five in this case) as shown in Figure
2.7. The character centroid (i.e. centre of gravity of the character) is computed and the average distance
from the character centroid to each pixel present in the zone is computed. Similarly zone centroid is
computed and average distance from the zone centroid to each pixel present in the zone is to be computed.
This procedure will be repeated for all the zones/grids/boxes present in the character image. There could be
some zones that are empty, and then the value of that particular zone image value in the feature vector is
zero. Finally, 2 x 25 (i.e. Fifty in this case) such features were used to represent the character image feature.
Algorithm 1 and Algorithm 2 were proposed by Vanajah and Rajashekararadhya in 2008 for the
recognition of four popular Indian scripts (Kannada, Telugu, Tamil and Malayalam) numerals.
Algorithm 1: Image Centroid and Zone-based (ICZ) Distance Metric Feature Extraction System
(Rajashekararadhya and Vanajah, 2008a)
Input: Pre processed numeral image
Output: Features for classification and recognition
Begins
         Step 1: Divide the input image in to ‘‘n’’ equal zones
         Step 2: Compute the input image centroid
         Step 3: Compute the distance between the image centroid to each pixel present in the zone
         Step 4: Repeat step 3 for the entire pixel present in the zone
         Step 5: Compute average distance between these points
         Step 6: Repeat this procedure sequentially for the entire zone
         Step 7: Finally, ‘‘n’’ such features will be obtained for classification and recognition


                                                       14
Innovative Systems Design and Engineering                                                        www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

Ends
Algorithm 2: Zone Centroid and Zone-based (ZCZ) Distance Metric Feature Extraction System
(Rajashekararadhya and Vanajah, 2008b)
Input: Pre processed numeral image
Output: Features for classification and recognition
Begins
          Step 1: Divide the input image in to n equal zones
          Step 2: Compute the zone centroid
          Step 3: Compute the distance between the zone centroid to each pixel present in the zone
          Step 4: Repeat step 3 for the entire pixel present in the zone
          Step 5: Compute average distance between these points
          Step 6: Repeat this procedure sequentially for the entire zone
          Step 7: Finally, n such features will be obtained for classification and recognition
Ends


2.1.2.2 Hybrid of the Modified Zoning Feature Extraction Algorithms
The following are the algorithms to show the working procedure of the modified hybrid zoning feature
extraction methods:
Modified Algorithm 1: Image Centroid and Zone-based (ICZ) distance metric feature extraction algorithm.
Input: Pre processed character images
Output: Features for classification and recognition
Method Begins
Step 1:   Divide the input image in to 25 equal zones as shown in Figure 2.7
Step 2:   Compute the input image centroid as shown in Figure 2.8−1
                                                                n
                                                                  using the formula:

                                                                 m∑ x , where n = width
          Centre of gravity in the horizontal direction (x–axis) = −1
                                                                    n=0
                                                                               i                           (2.2)
          Centre of gravity in the vertical direction (y–axis) = ∑ y , where m = height
                                                                           i                               (2.3)
                                                                 m =0
Step 3: Compute the distance between the image centroid to each pixel present in the zone as shown in
        Figure 2.8
Step 4: Repeat step 3 for the entire pixel present in the zone (five points in this case):
          d = d1 + d2 + d3 + d4 + d5                                                                       (2.4)
Step 5: Compute average distance between these points as:
          Average Image Centroid Distance DI = d/5                                                         (2.5)
                                                                                                       0
          where d = total distance between the image centroid to the pixel measured at an of angle 20
Step 6: Repeat this procedure sequentially for the entire zone (25 zones).
          Total Distance (P) = DI1 + DI2 + DI3 + . . .+ DIm                                                (2.6)
                                       m −1
          Total Average Distance j= ∑ DZ
                                    z =0 m                                                                 (2.7)

          where m = 25(total number of zones)
Step 7: Finally, 25 such features was obtained for classification and recognition.
Method Ends.

                                                       15
Innovative Systems Design and Engineering                                                     www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012


Modified Algorithm 2: Zone Centroid and Zone-based (ZCZ) distance metric feature extraction algorithm
Input: Pre processed character image
Output: Features for classification and recognition
Method Begins
Step 1: Divide the input image in to 25 equal zones as shown in Figure 2.7
Step 2: Compute the zone centroid for the entire pixel present in the zone as shown in Figure 2.9 using the
        formula:                                                 n −1

                                                                  ∑ x , where n = width
          Centre of gravity in the horizontal direction (x–axis) m −1
                                                                 =
                                                                          n =0
                                                                                     i                 (2.8)
          Centre of gravity in the vertical direction (y–axis) = ∑ y , where m = height
                                                                                 i                     (2.9)
                                                                   m =0
Step 3: Compute the distance between the zone centroid to each pixel present in the zone
Step 4: Repeat step 3 for the pixel present in a zone (5 points in this case)
          Total Distance D = D1 + D2 + D3 + D4 + D5                                                  (2.10)
Step 5: Compute average distance between these points as:
          Average distance           /5                                                              (2.11)
                                                                                         0
          where D = distance between the zone centroid measured at angle 20 to the pixel in the zone
Step 6: Repeat this procedure sequentially for the entire 25 zones.
          Total Distance (Q) = DZ1 + DZ2 + DZ3 + . . + DZm                                           (2.12)
                                        m −1
                                               DZ
          Total Average Distance k =    ∑
                                        z =0   m                                                     (2.13)
          where m = 25(total number of zones)
Step 7: Finally, 25 such features were obtained for classification and recognition
Method Ends


The Hybrid Zoning Algorithm: Hybrid of Modified ICZ and Modified ZCZ
Input: Pre processed character image
Output: Features for Classification and Recognition
Method Begins
Step 1:   Divide the input image into 25 equal zones
Step 2:   Compute the input image centroid
Step 3:   Compute the distance between the image centroid to each pixel present in the zone
Step 4:   Repeat step 3 for the entire pixel present in the zone
Step 5:   Compute average distance between these points
Step 6:   Compute the zone centroid
Step 7:   Compute the distance between the zone centroid to each pixel present in the zone.
Step 8:   Repeat step 7 for the entire pixel present in the zone
Step 9:   Compute average distance between these points
Step 10: Repeat the steps 3-9 sequentially for the entire zones
Step 11: Finally, 2xn (50) such features were obtained for classification and recognition.


                                                       16
Innovative Systems Design and Engineering                                                       www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

Method Ends


2.2 The Developed Hybrid (Geom-Statistical) Feature Extraction Algorithm
Step 1: Get the stroke information of the input characters from the digitizer (G-pen 450)
         These include:
         (i)      Pressure used in writing the strokes of the characters
         (ii)     Number (s) of strokes used in writing the characters
         (iii)    Number of junctions and the location in the written characters
         (iv)     The horizontal projection count of the character
Step 2: Apply Contour tracing algorithm to trace out the contour of the characters
Step 3: Run Hybrid Zoning algorithm on the contours of the characters
Step 4: Feed the outputs of the extracted features of the characters into the digitization stage in order to
        convert all the extracted features into digital forms


3. Conclusion and Future Work
In this medium, we have been able to develop an effective feature extraction technique for online character
recognition system using hybrid of geometrical and statistical features. Precisely, a hybrid feature
extraction was developed to alleviate the problem of poor feature extraction algorithm of online character
recognition system. However, future and further research may be geared towards developing a hybrid of
modified counter propagation and modified optical backpropagation neural network model for the
aforementioned system. Also, the performance of the online character recognition system under
consideration could be evaluated based on learning rates, image sizes and database sizes.


References
Anoop, M. N. and Anil K. J. (2004). Online Handwritten Script Recognition. IEEE Trans. PAMI, 26, 1,
124-130.
Cai, J. and Liu, Z. (1998). Integration of Structural and Statistical information for Unconstrained
Handwritten Numerals Recognition. Proceedings of the 14th International Conference on Pattern
Recognition, 1, 378-380.
Gupta, K., Rao, S.V. and Viswanath (2007). Speeding up Online Character Recognition. Proceedings of
Image and Vision Computing, Newzealand, Hamilton, 41-45.
Heute, L. T. and Paquet, J. V. (1998). A Structural/Statistical feature base Vector for Handwritten
Character Recognition. Pattern Recognition Letters, 19, 629-641.
Kittler, J. and Roli, F. (2000): 1st International Workshop on Multiple Classifier Systems, Cagliari, Italy.
Kittler, J. and Roli, F. (2001): 2nd International Workshop on Multiple Classifier Systems, Cagliari, Italy.
Liu, C. L., Nakashima, K., Sako, H. and Fujisawa, H. (2004). Handwritten Digit Recognition: Investigation
of Normalization and Feature Extraction Techniques. Pattern Recognition, 37, 2, 265-279.
Liu, C. L., Sako, H. and Fujisawa, H (2003). Handwritten Chinese character recognition: Alternatives to
nonlinear normalization, In Proceedings of the 7th International Conference on Document Analysis and
Recognition, Edinburgh, Scotland, 524-528.
Liu, C., Stefan, J. and Masaki N. (2004). Online Recognition of Chinese Characters: The State-of-the- Art.
IEEE Trans. on Pattern Analysis and Machine Intelligence, 26, 2, 198-203.
Manuel, J., Fonseca, and Joaquim, A. J. (2001). Experimental Evolution of an Online Scribble Recognizer.
Pattern Recognition Letters, 22,12,1311-1319.

                                                      17
Innovative Systems Design and Engineering                                                 www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012

Mohamad, D. and Zafar, M. F. (2004). Comparative Study of Two Novel Feature Vectors for Complex
Image Matching Using Counter propagation Neural Network. Journal of Information Technology, FSKSM,
UTM, 16, 1, 2073-2081.
Naser, M. A., Adnan, M., Arefin, T. M., Golam, S. M. and Naushad, A. (2009). Comparative Analysis of
Radon and Fan-beam based Feature Extraction Techniques for Bangla Character Recognition, IJCSNS
International Journal of Computer Science and Network Security, 9, 9, 120-135.
Nishimura, H. and Timikawa, T. (2003). Offline Character Recognition using Online Character Writing
Information. Proceedings of the seventh IEEE International Conference on Document Analysis and
Recognition.
Pradeep, J., Srinivasan, E. and Himavathi, S. (2011). Diagonal Based Feature Extraction for Handwritten
Alphabets Recognition using Neural Network. International Journal of Computer Science and Information
Technology (IJCS11), 3, 1, 27-37.
Rajashekararadhya, S. V. and Vanaja, P. R. (2008a). Handwritten numeral recognition of three popular
South Indian Scripts: A Novel Approach. Proceedings of the Second International Conference on
information processing ICIP, 162-167.
Rajashekararadhya S. V. and Vanaja, P. R (2008b). Isolated Handwritten Kannada Digit Recognition: A
Novel Approach. Proceedings of the International Conference on Cognition and Recognition, 134-140.
Rejean, P. and Sargur, S. N. (2000). On-line and Off-line Recognition: A Comprehensive Survey. IEEE
Transaction on Pattern Analysis and Machine Intelligence, 22, 1, 63-84.
Santosh, K.C. and Nattee, C. (2009). A Comprehensive Survey on Online Handwriting Recognition
Technology and Its Real Application to the Nepalese Natural Handwriting. Kathmandu University Journal
of Science, Engineering Technology, 5, 1, 31-55.
Shanthi, N., and Duraiwamy, K. (2007). Performance Comparison of Different Image Sizes for
Recognizing Unconstrained Handwritten Tamil Characters Using SVM. Journal of Science, 3, 9, 760-764.
Simon G. and Horst B. (2004). Feature Selection Algorithms for the Generalization of Multiple Classifier
Systems and their Application to Handwritten Word Recognition. Pattern Recognition Letters, 25,
11,1323-1336.
Yamaguchi, T., Nakano, Y., Maruyama, M., Miyao H. and Hananoi T. (2003). Digit Classification on
Signboards for Telephone Number Recognition. In Proceedings of the 7th International Conference on
Document Analysis and Recognition, Edinburgh, Scotland, 359–363.




         Figure 2.1: The snapshot of Genius Pen (G-Pen 450) Digitizer for Character Acquisition

                                                  18
Innovative Systems Design and Engineering                                      www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012




                        Figure 2.2: Writing character “A’’ with 3 Strokes




                   Figure 2.3: Division of character image into 35 quadrants




                                                19
Innovative Systems Design and Engineering                                             www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012




                  Figure 2.4: Horizontal Projection Count of character image ‘‘F’’




                 Figure 2.5: The Contour-tracing Algorithm (Yamaguchi et al., 2003)




                                                20
Innovative Systems Design and Engineering                                       www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012




                           Figure 2.6: Feature Extraction using Zoning




                       Figure 2.7: Character ‘‘n’’ in 5 by 5 (25 equal zones)




                                                21
Innovative Systems Design and Engineering                                           www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012




                          Figure 2.8: Image Centroid of character ‘‘n’’ in zoning




                        Figure 2.9: Zone Centroid of character ‘‘n’’ in zoning




                                               22
Innovative Systems Design and Engineering                                       www.iiste.org
ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)
Vol 3, No 3, 2012




                   Figure 2.10: The Developed Hybrid Feature Extraction Model




                                              23

More Related Content

PDF
11.development of a writer independent online handwritten character recogniti...
PDF
Top 20 Cited Article in Computer Science & Information Technology
PDF
Ijetcas14 371
PDF
Volume 2-issue-6-2009-2015
PDF
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
PDF
16 ijcse-01237
PDF
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
PDF
Optical Character Recognition: the What, Why, and How
11.development of a writer independent online handwritten character recogniti...
Top 20 Cited Article in Computer Science & Information Technology
Ijetcas14 371
Volume 2-issue-6-2009-2015
Comparative Analysis of PSO and GA in Geom-Statistical Character Features Sel...
16 ijcse-01237
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
Optical Character Recognition: the What, Why, and How

What's hot (17)

PDF
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
PDF
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
PPTX
Automatic handwriting recognition
PDF
OCR Text Extraction
PDF
Design and implementation of optical character recognition using template mat...
PDF
Ijartes v1-i2-001
PPTX
Text Mining
PDF
CRC Final Report
PDF
Deep convolutional neural network for hand sign language recognition using mo...
PDF
Optical Character Recognition Using Python
PPTX
Handwritten Character Recognition
PDF
Meaning Extraction - IJCTE 2(1)
PDF
Character recognition of kannada text in scene images using neural
PDF
Optical Character Recognition (OCR) System
PPTX
Optical Character Recognition (OCR) based Retrieval
DOCX
Project report of OCR Recognition
PDF
Ijetcas14 619
Integrated Approach to Handwritten Character Recognition using ANN and it’s I...
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
Automatic handwriting recognition
OCR Text Extraction
Design and implementation of optical character recognition using template mat...
Ijartes v1-i2-001
Text Mining
CRC Final Report
Deep convolutional neural network for hand sign language recognition using mo...
Optical Character Recognition Using Python
Handwritten Character Recognition
Meaning Extraction - IJCTE 2(1)
Character recognition of kannada text in scene images using neural
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) based Retrieval
Project report of OCR Recognition
Ijetcas14 619
Ad

Similar to Development of a feature extraction technique for online character recognition system (20)

PDF
Online Hand Written Character Recognition
PDF
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
PDF
C010221930
PDF
Mixed Language Based Offline Handwritten Character Recognition Using First St...
PDF
L017248388
PDF
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
PDF
journal paper publication
PDF
O45018291
PDF
20120140502008
PDF
A Review on Geometrical Analysis in Character Recognition
PDF
I017256165
PDF
A017240107
PDF
A Comprehensive Study On Handwritten Character Recognition System
PDF
A Survey of Modern Character Recognition Techniques
PDF
A Study on Optical Character Recognition Techniques
PDF
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
PDF
En31919926
PDF
08 8879 10060-1-sm (ijict sj) edit iqbal
PDF
IRJET- Review on Optical Character Recognition
PDF
A Review On Recognition Of Online Handwriting In Different Scripts
Online Hand Written Character Recognition
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
C010221930
Mixed Language Based Offline Handwritten Character Recognition Using First St...
L017248388
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
journal paper publication
O45018291
20120140502008
A Review on Geometrical Analysis in Character Recognition
I017256165
A017240107
A Comprehensive Study On Handwritten Character Recognition System
A Survey of Modern Character Recognition Techniques
A Study on Optical Character Recognition Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
En31919926
08 8879 10060-1-sm (ijict sj) edit iqbal
IRJET- Review on Optical Character Recognition
A Review On Recognition Of Online Handwriting In Different Scripts
Ad

More from Alexander Decker (20)

PDF
Abnormalities of hormones and inflammatory cytokines in women affected with p...
PDF
A validation of the adverse childhood experiences scale in
PDF
A usability evaluation framework for b2 c e commerce websites
PDF
A universal model for managing the marketing executives in nigerian banks
PDF
A unique common fixed point theorems in generalized d
PDF
A trends of salmonella and antibiotic resistance
PDF
A transformational generative approach towards understanding al-istifham
PDF
A time series analysis of the determinants of savings in namibia
PDF
A therapy for physical and mental fitness of school children
PDF
A theory of efficiency for managing the marketing executives in nigerian banks
PDF
A systematic evaluation of link budget for
PDF
A synthetic review of contraceptive supplies in punjab
PDF
A synthesis of taylor’s and fayol’s management approaches for managing market...
PDF
A survey paper on sequence pattern mining with incremental
PDF
A survey on live virtual machine migrations and its techniques
PDF
A survey on data mining and analysis in hadoop and mongo db
PDF
A survey on challenges to the media cloud
PDF
A survey of provenance leveraged
PDF
A survey of private equity investments in kenya
PDF
A study to measures the financial health of
Abnormalities of hormones and inflammatory cytokines in women affected with p...
A validation of the adverse childhood experiences scale in
A usability evaluation framework for b2 c e commerce websites
A universal model for managing the marketing executives in nigerian banks
A unique common fixed point theorems in generalized d
A trends of salmonella and antibiotic resistance
A transformational generative approach towards understanding al-istifham
A time series analysis of the determinants of savings in namibia
A therapy for physical and mental fitness of school children
A theory of efficiency for managing the marketing executives in nigerian banks
A systematic evaluation of link budget for
A synthetic review of contraceptive supplies in punjab
A synthesis of taylor’s and fayol’s management approaches for managing market...
A survey paper on sequence pattern mining with incremental
A survey on live virtual machine migrations and its techniques
A survey on data mining and analysis in hadoop and mongo db
A survey on challenges to the media cloud
A survey of provenance leveraged
A survey of private equity investments in kenya
A study to measures the financial health of

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Assigned Numbers - 2025 - Bluetooth® Document
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
A comparative analysis of optical character recognition models for extracting...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Development of a feature extraction technique for online character recognition system

  • 1. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Development of a Feature Extraction Technique for Online Character Recognition System Fenwa Olusayo Deborah* Department of Computer Science and Engineering, Ladoke Akintola University of Technology, P.M.B 4000, Ogbomoso, Nigeria. *E-mail of the corresponding author: odfenwa@lautech.edu.ng Omidiora Elijah Olusayo Department of Computer Science and Engineering, Ladoke Akintola University of Technology, P.M.B 4000, Ogbomoso, Nigeria. E-mail: omidiorasayo@yahoo.co.uk Fakolujo Olaosebikan Alade Department of Computer Science and Engineering, Ladoke Akintola University of Technology, P.M.B 4000, Ogbomoso, Nigeria. E-mail: ola@fakolujo.com Abstract Character recognition has been a popular research area for many years because of its various application potentials. Some of its application areas are postal automation, bank cheque processing, automatic data entry, signature verification and so on. Nevertheless, recognition of handwritten characters is a problem that is currently gathering a lot of attention. It has become a difficult problem because of the high variability and ambiguity in the character shapes written by individuals. A lot of researchers have proposed many approaches to solve this complex problem but none has been able to solve the problem completely in all settings. Some of the problems encountered by researchers include selection of efficient feature extraction method, long network training time, long recognition time and low recognition accuracy. This paper developed a feature extraction technique for online character recognition system using hybrid of geometrical and statistical features. Thus, through the integration of geometrical and statistical features, insights were gained into new character properties, since these types of features were considered to be complementary. Keywords: Character recognition, Feature extraction, Geometrical Feature, Statistical Feature, Character. 1. Introduction Character recognition is the process of applying pattern-matching methods to character shapes that has been read into a computer to determine which alpha-numeric character, punctuation marks, and symbols the shapes represent. The classes of recognition systems that are usually distinguished are online systems for which handwriting data are captured during the writing process (which makes available the information on the ordering of the strokes) and offline systems for which recognition takes place on a static image captured once the writing process is over (Anoop and Anil, 2004; Liu et al., 2004; Mohamad and Zafar, 2004; Naser et al., 2009; Pradeep et al., 2011). The online methods have been shown to be superior to their offline counterpart in recognising handwriting characters due the temporal information available with the formal (Pradeep et al., 2011). Handwriting recognition system can further be broken down into two 10
  • 2. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 categories: writer independent recognition system which recognizes wide range of possible writing styles and a writer dependent recognition system which recognizes writing styles only from specific users (Santosh and Nattee, 2009). Online handwriting recognition today has special interest due to increased usage of the hand held devices. The incorporation of keyboard being difficult in the hand held devices demands for alternatives, and in this respect, online method of giving input with stylus is gaining quite popularity (Gupta et al., 2007). Recognition of handwritten characters with respect to any language is difficult due to variability of writing styles, state of mood of individuals, multiple patterns to represent a single character, cursive representation of character and number of disconnected and multi-stroke characters (Shanthi and Duraiswamy, 2007). Current technology supporting pen-based input devices include: Digital Pen by Logitech, Smart Pad by Pocket PC, Digital Tablets by Wacom and Tablet PC by Compaq (Manuel and Joaquim, 2001). Although these systems with handwriting recognition capability are already widely available in the market, further improvements can be made on the recognition performances for these applications. The challenges posed by the online handwritten character recognition systems are to increase the recognition accuracy and to reduce the recognition time (Rejean and Sargurl, 2000; Gupta et. al., 2007). Various approaches that have been used by many researchers to develop character recognition systems, these include; template matching approach, statistical approach, structural approach, neural networks approach and hybrid approach. Hybrid approach (combination of multiple classifiers) has become a very active area of research recently (Kittler and Roli, 2000; 2001). It has been demonstrated in a number of applications that using more than a single classifier in a recognition task can lead to a significant improvement of the system’s overall performance. Hence, hybrid approach seems to be a promising approach to improve the recognition rate and recognition accuracy of current handwriting recognition systems (Simon and Horst, 2004). However, Selection of a feature extraction method is probably the single most important factor in achieving high recognition performance in character recognition system (Pradeep, Srinivasan and Himavathi, 2011). No matter how sophisticated the classifiers and learning algorithms, poor feature extraction will always lead to poor system performance (Marc, Alexandre76 and Christian, 2001). 2. Research Methodology Hundreds of features which are available in the literature can be categorized as follows: Global transformation and series expansion features, Statistical features and Geometrical and topological features. Many feature extraction techniques have been proposed in literature to improve overall recognition rate; however, most of the techniques used only one property of the handwritten character. This research focuses on developing a feature extraction technique that combined three characteristics (stroke information, contour pixels and zoning) of the handwritten character to create a global feature vector. Hence, in this research work, a hybrid feature extraction algorithm was developed to alleviate the problem of poor feature extraction algorithm of online character recognition system. 2.1 Development of the Proposed Hybrid Feature Extraction Algorithm The most important aspect of handwriting character recognition scheme is the selection of good feature set which is reasonably invariant with respect to shape variation caused by various writing styles. Feature extraction is the process of extracting from the raw data the information which is the most relevant for classification purposes, in the sense of minimizing the within-class pattern variability while enhancing the between-class pattern variability (Naser et al., 2009). Features are unique characteristics that can represent an image, i.e. a character in this case. Each character is represented as a feature vector, which becomes its identity. The goal of feature extraction is to extract a set of features, which maximizes the recognition rate with the least amount of elements. Many feature extraction techniques have been proposed to improve overall recognition rate; however, most of the techniques used only one property of the handwritten character. This research focuses on a feature extraction technique that combined three characteristics of the handwritten character to create a global feature vector. A hybrid feature extraction algorithm was developed using Geometrical and Statistical features as shown in Figure 2.10. Integration of Geometrical 11
  • 3. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 and Statistical features was used to highlight different character properties, since these types of features are considered to be complementary (Heute and Paquet, 1998; Cai and Liu, 1998). 2.1.1 Geometrical Features Various global and local properties of characters can be represented by geometrical and topological features with high tolerance to distortions and style variations. This type of representation may also encode some knowledge about the structure of the object or may provide some knowledge as to what sort of components make up that object. The geometrical features used in this research work were the Stroke Information and Contour pixel of the characters. 2.1.1.1 Stroke Information Stroke Information is a combination of local and global features, which are aimed to capture the geometrical and topological features of the characters and efficiently distinguish and identify the character from a small subset of characters. Stroke is storage of pen movements in online handwriting recognition. These movements appear at various positions on view point and joining these positions in first-come-first- serve basis shows the appearance of drawn text. A character may consist of single or multiple strokes. The list formed in data collection includes nodes, where each node includes two fields, namely, point and stroke number. Here, the point represents a coordinate of view point and stroke number represents identity and sequential order of stroke. Higher recognition performance would be possible if on-line recognition methods were able to address drawing motion vector (stroke) information (Nishimura and Timikawa, 2003). The feature sets consist of: (i) Stroke Number Stroke number helps in identifying similar points, gaps and crossings. The pen movement consists of three functions, namely, Pen-Down, Pen-Move and Pen-Up. When one presses, moves, lifts the pen up consecutively, and more than one point collected, the stroke number is incremented. Pen-Move function stores movements of pen on writing pad. An example of a digital pen for generating stroke information is as shown in figure 2.1. Figure 2.2 shows a typical example of how different stroke numbers are generated. However, only stroke is not enough because most of the time different character may get the same no of strokes. Therefore, in this research, PEN-UP is used as a feature to check how well the character matches the standard one (i.e. the average for the same character in the database). This feature is calculated by using the average strokes of a specific character as an input using the membership function as in Equation 2.1: PEN-UP = e|average - x| (2.1) where x is the real strokes for the specific character. (ii) Pressure of the Stroke This is the pressure representing Pen Ups and Downs in a continuous manner. The use of pen pressure as a feature is used for the improvement of a basic performance of the writer- independent online character recognition. The value of the pen pressure exerted on the writing pad was also used as feature. Moreover, recognition performance could be raised using writing pressure information of on-line writer identification systems and on-line character recognition systems (Nishimura and Timikawa, 2003). (iii) Number of Junctions and their Location A black pixel is considered to be a junction if there are more than two black pixels in its 5 by 7 neighbourhood in the resolution of the character image. The number of junctions as well as their positions in terms of 35(5x7) quadrants are recorded. For example the character image of Figure 2.3 has 2 junctions in quadrants 2 and 17. Junctions lying within a pre-defined radial distance are merged into a single junction and the junctions associated with the headline are ignored. (iv) Horizontal Projection Count Horizontal Projection Count is represented as HPC(i) = ∑ , , where F(i,j) is a pixel value (1 for black background and 0 for white foreground) of a character image, and i and j denote row and column positions 12
  • 4. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 of a pixel, with the image’s top left corner set to F(0,0). It is calculated by scanning the image row-wise and finding the sum of background pixels in each row (Figure 2.4). To take care of variations in character sizes, the horizontal projection count of a character image is represented by percentage instead of an absolute value and in this present work it is stored as a 4 component vector where the four components symbolize the percentage of rows with 1 pixel, 2 pixels, 3 pixels and more than 3 pixels. The components of this vector for the character image given in Figure 2.4 will be [50, 0, 10, 10], as there are 5 rows with 1 pixel; no rows with 2 pixels; 1 row with 3 pixels and 1 row with more than 3 pixels. 2.1.1.2 Contour Pixels Correct extraction of the contour will produce more accurate features that will increase the chances of correctly classifying a given character or pattern. But the question that might arise is why first extract the contour of a pattern and then collect its features? Why not collect features directly from the pattern? One answer is, the contour pixels are generally a small subset of the total number of pixels representing a pattern. Therefore, the amount of computation is greatly reduced when feature extracting algorithms are run on the contour instead of the whole pattern. Because the contour shares a lot of features with the original pattern, but has fewer pixels, the feature extraction process becomes much more efficient when performed on the contour rather on the original pattern. Contour tracing is often a major contributor to the efficiency of the feature extraction process, which is an essential process in pattern recognition (Liu et al., 2003; Liu et al., 2004). In order to extract the contour of the pattern, he following actions must be taken: every time a black pixel is encountered, turn left, and every time a white pixel is encountered, turn right, until the starting pixel is met again. All the black pixels traced out is the contour of the pattern. The contour tracing algorithm used in this research is based on the model developed by Yamaguchi et al. (2003). This is demonstrated in Figure 2.5 (Liu et al., 2003; 2004). Contour Tracing Algorithm (Yamaguchi et al., 2003): Input: An image I containing a connected component P of black pixels. Output: A sequence B (b1, b2, . . . , bk) of boundary pixels, that is, the outer contour. Begin Set B to be empty From bottom to top and left to right scan the cells of I until a black pixel, S, of P is found Insert S in B Set the current pixel, P, to be the starting pixel, S Turn left, that is, visit the left adjacent pixel of P Update P, that is, set it to be the current pixel While P not equal to S do If the current pixel P is black Insert P in B and turn left (visit the left adjacent pixel of P) Update P, that is, set it to be the current pixel Else Turn right (visit the right adjacent pixel of P) Update P, that is, set it to be the current pixel. End 2.1.2 Statistical Features 13
  • 5. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Statistical features are derived from the statistical distribution of points. They provide high speed and low complexity and take care of style variations to some extent. They may also be used for reducing the dimension of the feature set. The statistical feature adopted in this research is ‘Zoning’. Zone-based feature extraction method provides good result even when certain pre processing steps like filtering, smoothing and slant removing are not considered. Image Centroid and zone-based (ICZ) distance metric feature extraction and Zone Centroid and zone-based (ZCZ) distance metric feature extraction algorithms were proposed by Vanajah and Rajashekararadhya in 2008 for the recognition of four popular Indian scripts (Kannada, Telugu, Tamil and Malayalam) numerals. In this research, hybrid of modified Image Centroid and zone-based (ICZ) distance metric feature extraction and modified Zone Centroid and zone-based (ZCZ) distance metric feature extraction methods was used. Modifications of the two algorithms are in terms of: (i) Number of zones being used (ii) Measurement of the distances from both the Image Centroid and Zone Centroid (iii) The area of application. 2.1.2.1 The Zoning Algorithm The most important aspect of handwriting recognition scheme is the selection of good feature set, which is reasonably invariant with respect to shape variations caused by various writing styles. The zoning method is used to compute the percentage of black pixel in each zone. The rectangle circumscribing the character is divided into several overlapping, or non-overlapping regions and the densities of black points within these regions are computed and used as features as shown in Figure 2.6. The major advantage of this approach stems from its robustness to small variation, ease of implementation and good recognition rate. Zone-based feature extraction method provides good result even when certain pre processing steps like filtering, smoothing and slant removing are not considered. The detailed description of Zoning Algorithm is given as follows: The image (character) is further divided in to ‘n’ equal parts (Twenty five in this case) as shown in Figure 2.7. The character centroid (i.e. centre of gravity of the character) is computed and the average distance from the character centroid to each pixel present in the zone is computed. Similarly zone centroid is computed and average distance from the zone centroid to each pixel present in the zone is to be computed. This procedure will be repeated for all the zones/grids/boxes present in the character image. There could be some zones that are empty, and then the value of that particular zone image value in the feature vector is zero. Finally, 2 x 25 (i.e. Fifty in this case) such features were used to represent the character image feature. Algorithm 1 and Algorithm 2 were proposed by Vanajah and Rajashekararadhya in 2008 for the recognition of four popular Indian scripts (Kannada, Telugu, Tamil and Malayalam) numerals. Algorithm 1: Image Centroid and Zone-based (ICZ) Distance Metric Feature Extraction System (Rajashekararadhya and Vanajah, 2008a) Input: Pre processed numeral image Output: Features for classification and recognition Begins Step 1: Divide the input image in to ‘‘n’’ equal zones Step 2: Compute the input image centroid Step 3: Compute the distance between the image centroid to each pixel present in the zone Step 4: Repeat step 3 for the entire pixel present in the zone Step 5: Compute average distance between these points Step 6: Repeat this procedure sequentially for the entire zone Step 7: Finally, ‘‘n’’ such features will be obtained for classification and recognition 14
  • 6. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Ends Algorithm 2: Zone Centroid and Zone-based (ZCZ) Distance Metric Feature Extraction System (Rajashekararadhya and Vanajah, 2008b) Input: Pre processed numeral image Output: Features for classification and recognition Begins Step 1: Divide the input image in to n equal zones Step 2: Compute the zone centroid Step 3: Compute the distance between the zone centroid to each pixel present in the zone Step 4: Repeat step 3 for the entire pixel present in the zone Step 5: Compute average distance between these points Step 6: Repeat this procedure sequentially for the entire zone Step 7: Finally, n such features will be obtained for classification and recognition Ends 2.1.2.2 Hybrid of the Modified Zoning Feature Extraction Algorithms The following are the algorithms to show the working procedure of the modified hybrid zoning feature extraction methods: Modified Algorithm 1: Image Centroid and Zone-based (ICZ) distance metric feature extraction algorithm. Input: Pre processed character images Output: Features for classification and recognition Method Begins Step 1: Divide the input image in to 25 equal zones as shown in Figure 2.7 Step 2: Compute the input image centroid as shown in Figure 2.8−1 n using the formula: m∑ x , where n = width Centre of gravity in the horizontal direction (x–axis) = −1 n=0 i (2.2) Centre of gravity in the vertical direction (y–axis) = ∑ y , where m = height i (2.3) m =0 Step 3: Compute the distance between the image centroid to each pixel present in the zone as shown in Figure 2.8 Step 4: Repeat step 3 for the entire pixel present in the zone (five points in this case): d = d1 + d2 + d3 + d4 + d5 (2.4) Step 5: Compute average distance between these points as: Average Image Centroid Distance DI = d/5 (2.5) 0 where d = total distance between the image centroid to the pixel measured at an of angle 20 Step 6: Repeat this procedure sequentially for the entire zone (25 zones). Total Distance (P) = DI1 + DI2 + DI3 + . . .+ DIm (2.6) m −1 Total Average Distance j= ∑ DZ z =0 m (2.7) where m = 25(total number of zones) Step 7: Finally, 25 such features was obtained for classification and recognition. Method Ends. 15
  • 7. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Modified Algorithm 2: Zone Centroid and Zone-based (ZCZ) distance metric feature extraction algorithm Input: Pre processed character image Output: Features for classification and recognition Method Begins Step 1: Divide the input image in to 25 equal zones as shown in Figure 2.7 Step 2: Compute the zone centroid for the entire pixel present in the zone as shown in Figure 2.9 using the formula: n −1 ∑ x , where n = width Centre of gravity in the horizontal direction (x–axis) m −1 = n =0 i (2.8) Centre of gravity in the vertical direction (y–axis) = ∑ y , where m = height i (2.9) m =0 Step 3: Compute the distance between the zone centroid to each pixel present in the zone Step 4: Repeat step 3 for the pixel present in a zone (5 points in this case) Total Distance D = D1 + D2 + D3 + D4 + D5 (2.10) Step 5: Compute average distance between these points as: Average distance /5 (2.11) 0 where D = distance between the zone centroid measured at angle 20 to the pixel in the zone Step 6: Repeat this procedure sequentially for the entire 25 zones. Total Distance (Q) = DZ1 + DZ2 + DZ3 + . . + DZm (2.12) m −1 DZ Total Average Distance k = ∑ z =0 m (2.13) where m = 25(total number of zones) Step 7: Finally, 25 such features were obtained for classification and recognition Method Ends The Hybrid Zoning Algorithm: Hybrid of Modified ICZ and Modified ZCZ Input: Pre processed character image Output: Features for Classification and Recognition Method Begins Step 1: Divide the input image into 25 equal zones Step 2: Compute the input image centroid Step 3: Compute the distance between the image centroid to each pixel present in the zone Step 4: Repeat step 3 for the entire pixel present in the zone Step 5: Compute average distance between these points Step 6: Compute the zone centroid Step 7: Compute the distance between the zone centroid to each pixel present in the zone. Step 8: Repeat step 7 for the entire pixel present in the zone Step 9: Compute average distance between these points Step 10: Repeat the steps 3-9 sequentially for the entire zones Step 11: Finally, 2xn (50) such features were obtained for classification and recognition. 16
  • 8. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Method Ends 2.2 The Developed Hybrid (Geom-Statistical) Feature Extraction Algorithm Step 1: Get the stroke information of the input characters from the digitizer (G-pen 450) These include: (i) Pressure used in writing the strokes of the characters (ii) Number (s) of strokes used in writing the characters (iii) Number of junctions and the location in the written characters (iv) The horizontal projection count of the character Step 2: Apply Contour tracing algorithm to trace out the contour of the characters Step 3: Run Hybrid Zoning algorithm on the contours of the characters Step 4: Feed the outputs of the extracted features of the characters into the digitization stage in order to convert all the extracted features into digital forms 3. Conclusion and Future Work In this medium, we have been able to develop an effective feature extraction technique for online character recognition system using hybrid of geometrical and statistical features. Precisely, a hybrid feature extraction was developed to alleviate the problem of poor feature extraction algorithm of online character recognition system. However, future and further research may be geared towards developing a hybrid of modified counter propagation and modified optical backpropagation neural network model for the aforementioned system. Also, the performance of the online character recognition system under consideration could be evaluated based on learning rates, image sizes and database sizes. References Anoop, M. N. and Anil K. J. (2004). Online Handwritten Script Recognition. IEEE Trans. PAMI, 26, 1, 124-130. Cai, J. and Liu, Z. (1998). Integration of Structural and Statistical information for Unconstrained Handwritten Numerals Recognition. Proceedings of the 14th International Conference on Pattern Recognition, 1, 378-380. Gupta, K., Rao, S.V. and Viswanath (2007). Speeding up Online Character Recognition. Proceedings of Image and Vision Computing, Newzealand, Hamilton, 41-45. Heute, L. T. and Paquet, J. V. (1998). A Structural/Statistical feature base Vector for Handwritten Character Recognition. Pattern Recognition Letters, 19, 629-641. Kittler, J. and Roli, F. (2000): 1st International Workshop on Multiple Classifier Systems, Cagliari, Italy. Kittler, J. and Roli, F. (2001): 2nd International Workshop on Multiple Classifier Systems, Cagliari, Italy. Liu, C. L., Nakashima, K., Sako, H. and Fujisawa, H. (2004). Handwritten Digit Recognition: Investigation of Normalization and Feature Extraction Techniques. Pattern Recognition, 37, 2, 265-279. Liu, C. L., Sako, H. and Fujisawa, H (2003). Handwritten Chinese character recognition: Alternatives to nonlinear normalization, In Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 524-528. Liu, C., Stefan, J. and Masaki N. (2004). Online Recognition of Chinese Characters: The State-of-the- Art. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26, 2, 198-203. Manuel, J., Fonseca, and Joaquim, A. J. (2001). Experimental Evolution of an Online Scribble Recognizer. Pattern Recognition Letters, 22,12,1311-1319. 17
  • 9. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Mohamad, D. and Zafar, M. F. (2004). Comparative Study of Two Novel Feature Vectors for Complex Image Matching Using Counter propagation Neural Network. Journal of Information Technology, FSKSM, UTM, 16, 1, 2073-2081. Naser, M. A., Adnan, M., Arefin, T. M., Golam, S. M. and Naushad, A. (2009). Comparative Analysis of Radon and Fan-beam based Feature Extraction Techniques for Bangla Character Recognition, IJCSNS International Journal of Computer Science and Network Security, 9, 9, 120-135. Nishimura, H. and Timikawa, T. (2003). Offline Character Recognition using Online Character Writing Information. Proceedings of the seventh IEEE International Conference on Document Analysis and Recognition. Pradeep, J., Srinivasan, E. and Himavathi, S. (2011). Diagonal Based Feature Extraction for Handwritten Alphabets Recognition using Neural Network. International Journal of Computer Science and Information Technology (IJCS11), 3, 1, 27-37. Rajashekararadhya, S. V. and Vanaja, P. R. (2008a). Handwritten numeral recognition of three popular South Indian Scripts: A Novel Approach. Proceedings of the Second International Conference on information processing ICIP, 162-167. Rajashekararadhya S. V. and Vanaja, P. R (2008b). Isolated Handwritten Kannada Digit Recognition: A Novel Approach. Proceedings of the International Conference on Cognition and Recognition, 134-140. Rejean, P. and Sargur, S. N. (2000). On-line and Off-line Recognition: A Comprehensive Survey. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22, 1, 63-84. Santosh, K.C. and Nattee, C. (2009). A Comprehensive Survey on Online Handwriting Recognition Technology and Its Real Application to the Nepalese Natural Handwriting. Kathmandu University Journal of Science, Engineering Technology, 5, 1, 31-55. Shanthi, N., and Duraiwamy, K. (2007). Performance Comparison of Different Image Sizes for Recognizing Unconstrained Handwritten Tamil Characters Using SVM. Journal of Science, 3, 9, 760-764. Simon G. and Horst B. (2004). Feature Selection Algorithms for the Generalization of Multiple Classifier Systems and their Application to Handwritten Word Recognition. Pattern Recognition Letters, 25, 11,1323-1336. Yamaguchi, T., Nakano, Y., Maruyama, M., Miyao H. and Hananoi T. (2003). Digit Classification on Signboards for Telephone Number Recognition. In Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 359–363. Figure 2.1: The snapshot of Genius Pen (G-Pen 450) Digitizer for Character Acquisition 18
  • 10. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Figure 2.2: Writing character “A’’ with 3 Strokes Figure 2.3: Division of character image into 35 quadrants 19
  • 11. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Figure 2.4: Horizontal Projection Count of character image ‘‘F’’ Figure 2.5: The Contour-tracing Algorithm (Yamaguchi et al., 2003) 20
  • 12. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Figure 2.6: Feature Extraction using Zoning Figure 2.7: Character ‘‘n’’ in 5 by 5 (25 equal zones) 21
  • 13. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Figure 2.8: Image Centroid of character ‘‘n’’ in zoning Figure 2.9: Zone Centroid of character ‘‘n’’ in zoning 22
  • 14. Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol 3, No 3, 2012 Figure 2.10: The Developed Hybrid Feature Extraction Model 23