SlideShare a Scribd company logo
ISSN 1858-1633 @2006 ICTS
518
A CHINESE CHARACTER RECOGNITION METHOD BASED ON
POPULATION MATRIX AND RELATIONAL DATABASE
Teady Matius Surya Mulyana1)
and Agus Harjoko2)
1
Central Library, Petra Christian University, Surabaya, Indonesia
2
Electronic and Instrumentation Lab., FMIPA, Gadjah Mada University, Yogyakarta, Indonesia 55281
email : aharjoko@ugm.ac.id
ABSTRACT
A Chinese character has many different forms,
information in its features will have many
variations. Therefore, it needs a relational database
to store many variations of their features. The use of
the relational database to store the sets of features
enables the use of distance measurements methods
while measuring the sets of feature that owned by a
Chinese character to recognize a Chinese character
image inputted.
The feature used in this thesis is the pixel
population matrix. The sets of the features are
stored and queried by using the relational database.
This paper discusses about how to recognize the
Chinese character image and Chinese radical image
by using relational database and pixel population
matrix.
Keywords: Optical character recognition, relational
database, population matrix, Chinese character
Recognition.
1 INTRODUCTION
Han zhi or the han alphabets, also known as the
Kanji, have thousands of characters[1]. Every Kanji,
due to the evolution, has three models, namely the
ancient, traditional and popular characters. Each
model uses various fonts.
Kanji can be formed either from single
characters or the combinations of the single
characters that end as a new character. Those single
characters used to form Kanji are called ”radical”.
A single character can be either displayed in its
new form or in its original form when it is formed
into the radical of another Kanji. For example, the
character 心 will be displayed as 忄as the radical in
the character of 情, meanwhile, in the character of
意, the form retains its original form. The placement
of these radicals varies: on the right, in the middle,
on the left, either up or down. Another example is in
the character 唱. That character is formed from
the radical 口 on the left, and two radicals of 日 on
the top right and bottom right.
Low [2], explains the method to recognize the
alpahabet from an image by dividing the pixels of
the character’s image into matrix cells. The
percentage of the pixels towards the matrix cells
becomes the pattern vector used to identify a
character. Lu [3], explains the implementation of
database to store such features.
Based on Lu’s and Low’s idea, the authors
combine Low’s features of pixel population matrix
and the relational database to store and manipulate
data and stored features to accomodate the
variations of font from every Kanji needed in Kanji
recognition. The use of the relational database is
meant to ease the searching of Kanji stored in the
database from the input of radical image.
2 THE IMAGE PROCESSING
SYSTEM
The recognition of Kanji characters is based on
features of Kanji. Features used in the research is
the ratio between the width and height, and the pixel
population matrix[2]. The pixel population matrix
used is the 2x2, 3x3, 4x4 and 6x6 pixel, which can
be picked by the users. The features of the pixel
population matrix is stored in the matrix form of
12x12. The features of this pixel matrix is then
converted to the matrix desired by the users before
it is used to identify the image input. The features
are then stored in a relational database.
A Chinese character is recognized by computing
the distance between the character and the
characters in the database. There are a number of
similarity measures that are widely used in the
computer vision literatures[4,5,6,7,8], among them
are the Mahalanobis family of distance measures
which inlcudes the L1 metrix and the Euclidean
distance. In this research the L1 metric is used for
its simplicity. The L1-distance is expressed in
equation (1).
A Chinese Character Recognition Method Based On Population Matrix And Relational Database –
Teady Matius Surya Mulyana
ISSN 1858-1633 @2006 ICTS
519
∑
=
−=
n
l
ll hiHId
1
||),( (1)
The use of the relational database brings the
possibility for the use of distance measurment
towards the sets of features from several image that
a Kanji character might possess. The distance of the
image I to the image H in a Kanji sample are
represented by ds(I,H). The vakue of ds is obtained
from equation (2), derived from equation (1).
mshiHId
n
l
lls ...1,||),(
1
=−= ∑
=
(2)
Based on equation (1), six methods for distance
measurement are obtained. Those six methods are:
• The smallest distance, it is the smallest
distance between the Kanji images from a
Kanji with an image input This distance is
obtained from equation (3).
msdMINHId s
k
...1),(),( == (3)
• The average distance, it is the avarage distance
between the Kanji images from a Kanji
character with an image input. This avarage
distance is saught by using equation (4).
m
d
HId
m
s
s
r
∑
=
= 1
),( (4)
• The biggest distance, it is the biggest distance
between the Kanji images from a Kanji with an
image input. This distance is gained from
equation (5).
msdMAXHId s
b
...1),(),( == (5)
• The smallest range of each feature, the distance
obtained from the smallest range of each
featrues between the Kanji images in a Kanji
character with the same feature from an image
input. The smallest distance range of each
features is gained through equation (6).
∑
=
=−=
n
l
sll
u
mshiMINHId
1
...1),|(|),( (6)
• The average distance range of each feature,
that is the distance obtained from the average
difference of each feature between the images
of a Kanji with the same features of the same
Kanji from an image input. This average range
of each feature is resulted from equation (7)
∑
∑
=
=
−
=
n
l
m
s
sll
v
m
hi
HId
1
1
||
),( (7)
• The biggest range of each feature that is the
distance resulted from the biggest range of
each feature between the images of a Kanji
with the same features from an image input.
The biggest distance range from each feature is
resulted from equation (8).
∑
=
=−=
n
l
slls
w
mshiMAXHId
1
...1),|(|),( (8)
All the smallest distance, average and the
biggest distance are obtained by computing the
distance of images a Kanji character has. One of the
distances resulted from the implementation will be
used according to the previously desired method.
Kindly examine the illustration on Figure 1.
The smallest, average, and the biggest distance
range of each feature are implemented by
computing the difference between each Kanji’s
feature and the same features from the image input.
The result will then be considered as the distance
between the Kanji’s feature and the Kanji’s
image-input. Furthermore, the result is summed up
and becomes the distance between Kanji and the
Kanji’s image input. Further illustration can be
obsereved on Figure 2.
After the distance for each Kanji and Kanji’s
image input is obtained, by using one of the sixth
methods, kanji that has the smallest distance range
toward the kanji’s image input will be treated as the
kanji recognized as the inputted image.
The entity relation is as shown on Figure 3. A
kanji character has several Kanji images with their
features. Every Kanji image uses a font. Moreover,
every Kanji’s image also has several radical
images.
The Kanji search by using the radical image
input is done by using the distance between radical
images possessed by a Kanji character. The
distance calculation is done by using the smallest
distance range. It is done so due to the different
radical images a Kanji character could possess. The
Kanji found is then the one that has radical image
with distance on certain treshold towards the image
input.
If every radical image associates to certain
radical entity as a reference for a one-to-many
relation (one is for radical and many is for radical
images), radical reference will be able to be use as
the feature of Kanji’s image; however, this research
2nd Information and Communication Technology Seminar, August 2006
ISSN 1858-1633 @2006 ICTS
520
does not expect any radical features. Therefore,
relation diagram. there is no radical entities added
to the entity relation diagram
kanji 章
36 51 34 width 78
I 42 55 42 height 81
26 48 26
width : height
H
simsun ds(I,H)
16 38 22 width 87 20 13 12 d1(I,H)
21 24 22 height 87 21 31 20 → 160,037037
13 29 15 → 13 19 11
width : height
Batang
21 36 25 width 81 15 15 9 d2(I,H) dk
= 71,023529
19 27 22 height 88 23 28 20 → 168,0795455 dr
= 133,0467
15 29 18 → 11 19 8 db
= 168,07955
width : height
Simhei
39 58 38 width 83 3 7 4 d3(I,H)
39 50 41 height 85 3 5 1 → 71,02352941
32 59 35 → 6 11 9
width : height 0,976470588 22,02352941
章
1 0,037037037
章
0,920454545 20,07954545
章
章 0,962962963
↓
|I-H|
Figure 1. The Illustration of smallest avarage and biggest distance method
kanji 章
36 51 34 width 78
I 42 55 42 height 81
26 48 26
width : height
H
Simsun d
u
(I,H)
16 38 22 width 87 20 13 12 3 7 4
21 24 22 height 87 21 31 20 3 5 1 d
u
= 48,04
13 29 15 → 13 19 11 6 11 8
width : height
Batang d
v
(I,H)
21 36 25 width 81 15 15 9 13 11,7 8
19 27 22 height 88 23 28 20 16 21,3 14 d
v
= 133
15 29 18 → 11 19 8 10 16,3 9
width : height
Simhei d
w
(I,H)
39 58 38 width 83 3 7 4 20 15 12
39 50 41 height 85 3 5 1 23 31 20 d
w
= 186
32 59 35 → 6 11 9 13 19 11
width : height 0,976470588 22,02352941
0,037037037
22,02352941
0,920454545 20,07954545 14,04670397
章
章
1 0,037037037
章
章
0,962962963
↓
|I-H|
Figure 2.The illustration of each feature distance-range method
A Chinese Character Recognition Method Based On Population Matrix And Relational Database –
Teady Matius Surya Mulyana
ISSN 1858-1633 @2006 ICTS
521
Figure 3. Entity Relationship Diagram
measured in distance towards a set of radical image
features possessed by Kanji’s images of a specific
Kanji character.
3 RESULT
There are 51 kanji characters stored in the
database for the purpose of research as it is seen on
Table 1. Each Kanji has three images. Each Kanji’s
image has various radical images.
The test on the feature’s accuracy used and the
methods on distance measurement are shown on
Table 1. Stored Kanji Characters
No. Name
Ba
Tang
Sim
Sun
Sim
Hei
No. Name
Ba
Tang
Sim
Sun
Sim
Hei
No. Name
Ba
Tang
Sim
Sun
Sim
Hei
1 yin 音 音 音 18 hao 好 好 好 35 xiao 小 小 小
2 zhang 章 章 章 19 de 的 的 的 36 shao 少 少 少
3 Yi 意 意 意 20 dian 点 点 点 37 tu 土 土 土
4 Li 立 立 立 21 ren 人 人 人 38 ba 吧 吧 吧
5 Rui 瑞 瑞 瑞 22 ru 入 入 入 39 zai 在 在 在
6 Lin 林 林 林 23 ru 如 如 如 40 wen 文 文 文
7 Jing 京 京 京 24 shui 水 水 水 41 ting 听 听 听
8 Fei 飞 飞 飞 25 dong 东 东 东 42 jiao 叫 叫 叫
9 Men 们 们 们 26 xi 西 西 西 43 xie 谢 谢 谢
10 Men 门 门 门 27 nan 南 南 南 44 guang 光 光 光
11 Er 儿 儿 儿 28 bei 北 北 北 45 ta 他 他 他
12 Er 而 而 而 29 kou 口 口 口 46 ta 她 她 她
13 He 河 河 河 30 chang 唱 唱 唱 47 ta 它 它 它
14 He 何 何 何 31 yin 因 因 因 48 jia 家 家 家
15 Ge 哥 哥 哥 32 hui 回 回 回 49 ai 爱 爱 爱
16 Ke 可 可 可 33 le 了 了 了 50 gu 古 古 古
17 Shi 是 是 是 34 bu 不 不 不 51 Ni 你 你 你
The entity of a Kanji’s image of a certain
Kanji characters, with its features, are used to
recognise a Kanji character by applying one of the
six distancee measurement methods. The models
in the radical images and Font Entity are used to
restrain the radical image being measured on its
distance.
The radical image entity is also used to search
a Kanji with certain radical image which is being
The test resulted in the matrix of 6 x 6 with
average distance measurement methods, the
smallest distance range on each features and the
average distance range of each features in this
method reaches the success rate of 100 %.
Matrix 3 x 3 and matrix 4 x 4 are considered to
be more successful with the smallest range
distance method. It is due to the characteristics of
the Kanji characters themselves that have cell
dispersion on the segmentation on the matrix of 3
x 3 or 4 x 4.
2nd Information and Communication Technology Seminar, August 2006
ISSN 1858-1633 @2006 ICTS
522
The 2 x 2 matrix reaches the success rate of
only 5 to 20 %. This is the resulted from the
improper pixel divisions on all four matrixes to be
used as the feature of an image. Hence, the pixel
dispersion cannot be further detected.
Table 2. The Result of Kanji Character Recognition Test
min avg max min sel avg sel max sel
No Name
Kan
ji
2
x
2
3
x
3
4
x
4
6
x
6
2
x
2
3
x
3
4
x
4
6
x
6
2
x
2
3
x
3
4
x
4
6
x
6
2
x
2
3
x
3
4
x
4
6
x
6
2
x
2
3
x
3
4
x
4
6
x
6
2
x
2
3
x
3
4
x
4
6
x
6
1 bei 北 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1
2 de 的 1 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1
3 dian 点 1 1 1 1 0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
4 dong 东 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1
5 er 儿 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
6 er 而 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
7 fei 飞 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
8 ge 哥 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
9 hao 好 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1
10 he 何 0 1 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0 1
11 he 河 0 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 0 1
12 jing 京 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1
13 ke 可 0 1 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1
14 li 立 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1
15 lin 林 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1
16 men 们 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
17 men 门 0 0 1 1 0 0 1 1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 1 1
18 nan 南 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
19 ren 人 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
20 Ru 入 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 0 0
Success 4
1
3
1
6
1
9
2 9
1
4
2
0
1 5 5
1
3
2 5
1
4
2
0
2 7
1
2
2
0
2 6 7
1
6
Success rate
2
0
%
6
5
%
8
0
%
9
5
%
1
0
%
4
5
%
7
0
%
1
0
0
%
5
%
2
5
%
2
5
%
6
5
%
1
0
%
2
5
%
7
0
%
1
0
0
%
1
0
%
3
5
%
6
0
%
1
0
0
%
1
0
%
3
0
%
3
5
%
8
0
%
The success rate of obtained using the radical
images is 73%. In this test the population matrix is
3x3. Due to the space limitation this result is not
presented in this paper, readers are encouraged to
consult [9]. The success rate can be very much
likely increased if the 6 x 6- feature matrix is used.
4 CONCLUSION
Based on the research upon the use of pixel
population matrix and realtional database to
recognize a kanji charcter and its radicals, the
writers come up with several conclusions:
1. Amidst the pixel population matrix of 2x2,
3x3, 4x4 and 6x6, it is the 6 x 6 pixel that is
more accurate to recognize the Kanji’s
image.
2. The cell features of the 2x2 pixel population
matrix are not sufficient to accomodate the
characterization of the pixel dispersion from
its whole four cells.
3. The sets of image features stored in a
relational database helps the implementation
of distance range measurement methods
which need the set themseleves in character
recognition.
There are several suggestions that the writers
can propose for the upcoming research. Those
suggestions are:
A Chinese Character Recognition Method Based On Population Matrix And Relational Database –
Teady Matius Surya Mulyana
ISSN 1858-1633 @2006 ICTS
523
1. The use of the six methods of distance range
measurement are also applicable for a
character recognition using other features.
2. There has not been any detailed research upon
the possible conditions that determine the
success of one method towards the others.
These methods can be used for further
research in the future.
REFERENCE
[1] Kasmito and Tanzil, J., Petunjuk Termudah
Belajar Mandarin, Binarupa Aksara, Jakarta,
Indonesia, 1997.
[2] Low, A., Introductory Computer Vision and
Image Processing, McGraw-Hill, Berkshire,
UK, 1991.
[3] Lu, G., Multimedia Database Management
Systems, Artech House, London, 1999.
[4] Gonzales, R.C and Woods, R.E. Digital
Image Processing, Addison-Wesley
Publishing Company, 1992.
[5] Hearn, D. and Baker, P.M., Computer
Graphics, Prentice Hall, USA, , 1986.
[6] Kastury, R. and C. Fain, R., Computer Vision:
Collection of Computer Vision Journal 1951
– 1991, IEEE Computer Society Press, Los
Amitos, CA, 1991.
[7] Steinmetz, R. dan Nahrstedt, K., Multimedia
Computing, Communication & Application –
Innovative Technology Series, Prentice Hall,
USA, 1995.
[8] Liu, C., Jaeger, S., Nakagawa, S., 2004,
“Online Recognition of Chinese Characters:
The State of The Art”, IEEE Transaction On
Pattern Analysis And Machine Intelligence,
26(2), 2004, 198 - 213.
[9] Mulyana, T.M.S., Penggunaan Matriks
Populasi Pixel Dan Relational Database
Untuk Mengenali Huruf Kanji Dan
Radikalnya, Masters Thesis, Computer
Science Study Program, Gadjah Mada
University, 2006.

More Related Content

PDF
SVM Based Identification of Psychological Personality Using Handwritten Text
PDF
Textural Feature Extraction of Natural Objects for Image Classification
PDF
IMAGE RETRIEVAL USING QUADRATIC DISTANCE BASED ON COLOR FEATURE AND PYRAMID S...
PDF
EFFECTIVE SEARCH OF COLOR-SPATIAL IMAGE USING SEMANTIC INDEXING
PDF
International Journal of Engineering Research and Development
PDF
Cursive Handwriting Segmentation Using Ideal Distance Approach
PDF
Colour-Texture Image Segmentation using Hypercomplex Gabor Analysis
PDF
A Survey of Modern Character Recognition Techniques
SVM Based Identification of Psychological Personality Using Handwritten Text
Textural Feature Extraction of Natural Objects for Image Classification
IMAGE RETRIEVAL USING QUADRATIC DISTANCE BASED ON COLOR FEATURE AND PYRAMID S...
EFFECTIVE SEARCH OF COLOR-SPATIAL IMAGE USING SEMANTIC INDEXING
International Journal of Engineering Research and Development
Cursive Handwriting Segmentation Using Ideal Distance Approach
Colour-Texture Image Segmentation using Hypercomplex Gabor Analysis
A Survey of Modern Character Recognition Techniques

What's hot (20)

PDF
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
PDF
E018212935
PDF
50120130405020
PDF
A novel tool for stereo matching of images
PDF
A novel tool for stereo matching of images
PDF
Image segmentation based on color
PDF
F045053236
PDF
A comparative study on content based image retrieval methods
PDF
PDF
FULL PAPER.PDF
PDF
Text Extraction from Image using Python
PDF
HSV Brightness Factor Matching for Gesture Recognition System
PDF
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
PDF
Combined cosine-linear regression model similarity with application to handwr...
PDF
Paper id 24201433
PDF
Feature integration for image information retrieval using image mining techni...
PDF
Ac03401600163.
PDF
ijrrest_vol-2_issue-2_013
PDF
Hybrid Technique for Copy-Move Forgery Detection Using L*A*B* Color Space
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
E018212935
50120130405020
A novel tool for stereo matching of images
A novel tool for stereo matching of images
Image segmentation based on color
F045053236
A comparative study on content based image retrieval methods
FULL PAPER.PDF
Text Extraction from Image using Python
HSV Brightness Factor Matching for Gesture Recognition System
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Combined cosine-linear regression model similarity with application to handwr...
Paper id 24201433
Feature integration for image information retrieval using image mining techni...
Ac03401600163.
ijrrest_vol-2_issue-2_013
Hybrid Technique for Copy-Move Forgery Detection Using L*A*B* Color Space
Ad

Viewers also liked (11)

PDF
Project11
PDF
Acta 11
PDF
Negació, empara legal i denúncies inverses.
PDF
AppianBrochure
DOCX
Sistem informasi-pengolahan-nilai-sekolah
PDF
M Patel
DOCX
Motivation
PDF
SEGMENTASI CITRA DENGAN VARIASI RGB DAN ALGORITMA PERCEPTRON
PDF
Binarisasi Citra Menggunakan Pencocokan Piksel
PPTX
KARYA TULIS ANTIDIABETES OLEH ANNA MARIA M
PPTX
Meningitis
Project11
Acta 11
Negació, empara legal i denúncies inverses.
AppianBrochure
Sistem informasi-pengolahan-nilai-sekolah
M Patel
Motivation
SEGMENTASI CITRA DENGAN VARIASI RGB DAN ALGORITMA PERCEPTRON
Binarisasi Citra Menggunakan Pencocokan Piksel
KARYA TULIS ANTIDIABETES OLEH ANNA MARIA M
Meningitis
Ad

Similar to A CHINESE CHARACTER RECOGNITION METHOD BASED ON POPULATION MATRIX AND RELATIONAL DATABASE (20)

PDF
Comparison of Distance Transform Based Features
PDF
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
PDF
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
PDF
A binary graphics recognition algorithm based on fitting function
PDF
I017256165
PDF
A Review on Geometrical Analysis in Character Recognition
PDF
International Journal of Computer Science, Engineering and Information Techno...
PDF
Farsi character recognition using new hybrid feature extraction methods
PDF
FARSI CHARACTER RECOGNITION USING NEW HYBRID FEATURE EXTRACTION METHODS
PDF
Persian character recognition using new
PDF
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
PDF
Recognition of basic kannada characters in scene images using euclidean dis
DOCX
Adaptive membership functions for hand written character recognition by voron...
PDF
FinalReportFoxMelle
PDF
Optical character recognition performance analysis of sif and ldf based ocr
PDF
Two Methods for Recognition of Hand Written Farsi Characters
PDF
Recognition of Handwritten Mathematical Equations
PPTX
Representation and recognition of handwirten digits using deformable templates
PDF
A Comprehensive Study On Handwritten Character Recognition System
PDF
A017240107
Comparison of Distance Transform Based Features
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta Function
A binary graphics recognition algorithm based on fitting function
I017256165
A Review on Geometrical Analysis in Character Recognition
International Journal of Computer Science, Engineering and Information Techno...
Farsi character recognition using new hybrid feature extraction methods
FARSI CHARACTER RECOGNITION USING NEW HYBRID FEATURE EXTRACTION METHODS
Persian character recognition using new
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Recognition of basic kannada characters in scene images using euclidean dis
Adaptive membership functions for hand written character recognition by voron...
FinalReportFoxMelle
Optical character recognition performance analysis of sif and ldf based ocr
Two Methods for Recognition of Hand Written Farsi Characters
Recognition of Handwritten Mathematical Equations
Representation and recognition of handwirten digits using deformable templates
A Comprehensive Study On Handwritten Character Recognition System
A017240107

Recently uploaded (20)

PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Introcution to Microbes Burton's Biology for the Health
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Overview of calcium in human muscles.pptx
PPTX
perinatal infections 2-171220190027.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Microbes in human welfare class 12 .pptx
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPTX
The Minerals for Earth and Life Science SHS.pptx
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Sciences of Europe No 170 (2025)
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
veterinary parasitology ````````````.ppt
PPTX
Pharmacology of Autonomic nervous system
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Introcution to Microbes Burton's Biology for the Health
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Overview of calcium in human muscles.pptx
perinatal infections 2-171220190027.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
7. General Toxicologyfor clinical phrmacy.pptx
Microbes in human welfare class 12 .pptx
. Radiology Case Scenariosssssssssssssss
CORDINATION COMPOUND AND ITS APPLICATIONS
The Minerals for Earth and Life Science SHS.pptx
6.1 High Risk New Born. Padetric health ppt
Sciences of Europe No 170 (2025)
Fluid dynamics vivavoce presentation of prakash
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Hypertension_Training_materials_English_2024[1] (1).pptx
Introduction to Cardiovascular system_structure and functions-1
veterinary parasitology ````````````.ppt
Pharmacology of Autonomic nervous system

A CHINESE CHARACTER RECOGNITION METHOD BASED ON POPULATION MATRIX AND RELATIONAL DATABASE

  • 1. ISSN 1858-1633 @2006 ICTS 518 A CHINESE CHARACTER RECOGNITION METHOD BASED ON POPULATION MATRIX AND RELATIONAL DATABASE Teady Matius Surya Mulyana1) and Agus Harjoko2) 1 Central Library, Petra Christian University, Surabaya, Indonesia 2 Electronic and Instrumentation Lab., FMIPA, Gadjah Mada University, Yogyakarta, Indonesia 55281 email : aharjoko@ugm.ac.id ABSTRACT A Chinese character has many different forms, information in its features will have many variations. Therefore, it needs a relational database to store many variations of their features. The use of the relational database to store the sets of features enables the use of distance measurements methods while measuring the sets of feature that owned by a Chinese character to recognize a Chinese character image inputted. The feature used in this thesis is the pixel population matrix. The sets of the features are stored and queried by using the relational database. This paper discusses about how to recognize the Chinese character image and Chinese radical image by using relational database and pixel population matrix. Keywords: Optical character recognition, relational database, population matrix, Chinese character Recognition. 1 INTRODUCTION Han zhi or the han alphabets, also known as the Kanji, have thousands of characters[1]. Every Kanji, due to the evolution, has three models, namely the ancient, traditional and popular characters. Each model uses various fonts. Kanji can be formed either from single characters or the combinations of the single characters that end as a new character. Those single characters used to form Kanji are called ”radical”. A single character can be either displayed in its new form or in its original form when it is formed into the radical of another Kanji. For example, the character 心 will be displayed as 忄as the radical in the character of 情, meanwhile, in the character of 意, the form retains its original form. The placement of these radicals varies: on the right, in the middle, on the left, either up or down. Another example is in the character 唱. That character is formed from the radical 口 on the left, and two radicals of 日 on the top right and bottom right. Low [2], explains the method to recognize the alpahabet from an image by dividing the pixels of the character’s image into matrix cells. The percentage of the pixels towards the matrix cells becomes the pattern vector used to identify a character. Lu [3], explains the implementation of database to store such features. Based on Lu’s and Low’s idea, the authors combine Low’s features of pixel population matrix and the relational database to store and manipulate data and stored features to accomodate the variations of font from every Kanji needed in Kanji recognition. The use of the relational database is meant to ease the searching of Kanji stored in the database from the input of radical image. 2 THE IMAGE PROCESSING SYSTEM The recognition of Kanji characters is based on features of Kanji. Features used in the research is the ratio between the width and height, and the pixel population matrix[2]. The pixel population matrix used is the 2x2, 3x3, 4x4 and 6x6 pixel, which can be picked by the users. The features of the pixel population matrix is stored in the matrix form of 12x12. The features of this pixel matrix is then converted to the matrix desired by the users before it is used to identify the image input. The features are then stored in a relational database. A Chinese character is recognized by computing the distance between the character and the characters in the database. There are a number of similarity measures that are widely used in the computer vision literatures[4,5,6,7,8], among them are the Mahalanobis family of distance measures which inlcudes the L1 metrix and the Euclidean distance. In this research the L1 metric is used for its simplicity. The L1-distance is expressed in equation (1).
  • 2. A Chinese Character Recognition Method Based On Population Matrix And Relational Database – Teady Matius Surya Mulyana ISSN 1858-1633 @2006 ICTS 519 ∑ = −= n l ll hiHId 1 ||),( (1) The use of the relational database brings the possibility for the use of distance measurment towards the sets of features from several image that a Kanji character might possess. The distance of the image I to the image H in a Kanji sample are represented by ds(I,H). The vakue of ds is obtained from equation (2), derived from equation (1). mshiHId n l lls ...1,||),( 1 =−= ∑ = (2) Based on equation (1), six methods for distance measurement are obtained. Those six methods are: • The smallest distance, it is the smallest distance between the Kanji images from a Kanji with an image input This distance is obtained from equation (3). msdMINHId s k ...1),(),( == (3) • The average distance, it is the avarage distance between the Kanji images from a Kanji character with an image input. This avarage distance is saught by using equation (4). m d HId m s s r ∑ = = 1 ),( (4) • The biggest distance, it is the biggest distance between the Kanji images from a Kanji with an image input. This distance is gained from equation (5). msdMAXHId s b ...1),(),( == (5) • The smallest range of each feature, the distance obtained from the smallest range of each featrues between the Kanji images in a Kanji character with the same feature from an image input. The smallest distance range of each features is gained through equation (6). ∑ = =−= n l sll u mshiMINHId 1 ...1),|(|),( (6) • The average distance range of each feature, that is the distance obtained from the average difference of each feature between the images of a Kanji with the same features of the same Kanji from an image input. This average range of each feature is resulted from equation (7) ∑ ∑ = = − = n l m s sll v m hi HId 1 1 || ),( (7) • The biggest range of each feature that is the distance resulted from the biggest range of each feature between the images of a Kanji with the same features from an image input. The biggest distance range from each feature is resulted from equation (8). ∑ = =−= n l slls w mshiMAXHId 1 ...1),|(|),( (8) All the smallest distance, average and the biggest distance are obtained by computing the distance of images a Kanji character has. One of the distances resulted from the implementation will be used according to the previously desired method. Kindly examine the illustration on Figure 1. The smallest, average, and the biggest distance range of each feature are implemented by computing the difference between each Kanji’s feature and the same features from the image input. The result will then be considered as the distance between the Kanji’s feature and the Kanji’s image-input. Furthermore, the result is summed up and becomes the distance between Kanji and the Kanji’s image input. Further illustration can be obsereved on Figure 2. After the distance for each Kanji and Kanji’s image input is obtained, by using one of the sixth methods, kanji that has the smallest distance range toward the kanji’s image input will be treated as the kanji recognized as the inputted image. The entity relation is as shown on Figure 3. A kanji character has several Kanji images with their features. Every Kanji image uses a font. Moreover, every Kanji’s image also has several radical images. The Kanji search by using the radical image input is done by using the distance between radical images possessed by a Kanji character. The distance calculation is done by using the smallest distance range. It is done so due to the different radical images a Kanji character could possess. The Kanji found is then the one that has radical image with distance on certain treshold towards the image input. If every radical image associates to certain radical entity as a reference for a one-to-many relation (one is for radical and many is for radical images), radical reference will be able to be use as the feature of Kanji’s image; however, this research
  • 3. 2nd Information and Communication Technology Seminar, August 2006 ISSN 1858-1633 @2006 ICTS 520 does not expect any radical features. Therefore, relation diagram. there is no radical entities added to the entity relation diagram kanji 章 36 51 34 width 78 I 42 55 42 height 81 26 48 26 width : height H simsun ds(I,H) 16 38 22 width 87 20 13 12 d1(I,H) 21 24 22 height 87 21 31 20 → 160,037037 13 29 15 → 13 19 11 width : height Batang 21 36 25 width 81 15 15 9 d2(I,H) dk = 71,023529 19 27 22 height 88 23 28 20 → 168,0795455 dr = 133,0467 15 29 18 → 11 19 8 db = 168,07955 width : height Simhei 39 58 38 width 83 3 7 4 d3(I,H) 39 50 41 height 85 3 5 1 → 71,02352941 32 59 35 → 6 11 9 width : height 0,976470588 22,02352941 章 1 0,037037037 章 0,920454545 20,07954545 章 章 0,962962963 ↓ |I-H| Figure 1. The Illustration of smallest avarage and biggest distance method kanji 章 36 51 34 width 78 I 42 55 42 height 81 26 48 26 width : height H Simsun d u (I,H) 16 38 22 width 87 20 13 12 3 7 4 21 24 22 height 87 21 31 20 3 5 1 d u = 48,04 13 29 15 → 13 19 11 6 11 8 width : height Batang d v (I,H) 21 36 25 width 81 15 15 9 13 11,7 8 19 27 22 height 88 23 28 20 16 21,3 14 d v = 133 15 29 18 → 11 19 8 10 16,3 9 width : height Simhei d w (I,H) 39 58 38 width 83 3 7 4 20 15 12 39 50 41 height 85 3 5 1 23 31 20 d w = 186 32 59 35 → 6 11 9 13 19 11 width : height 0,976470588 22,02352941 0,037037037 22,02352941 0,920454545 20,07954545 14,04670397 章 章 1 0,037037037 章 章 0,962962963 ↓ |I-H| Figure 2.The illustration of each feature distance-range method
  • 4. A Chinese Character Recognition Method Based On Population Matrix And Relational Database – Teady Matius Surya Mulyana ISSN 1858-1633 @2006 ICTS 521 Figure 3. Entity Relationship Diagram measured in distance towards a set of radical image features possessed by Kanji’s images of a specific Kanji character. 3 RESULT There are 51 kanji characters stored in the database for the purpose of research as it is seen on Table 1. Each Kanji has three images. Each Kanji’s image has various radical images. The test on the feature’s accuracy used and the methods on distance measurement are shown on Table 1. Stored Kanji Characters No. Name Ba Tang Sim Sun Sim Hei No. Name Ba Tang Sim Sun Sim Hei No. Name Ba Tang Sim Sun Sim Hei 1 yin 音 音 音 18 hao 好 好 好 35 xiao 小 小 小 2 zhang 章 章 章 19 de 的 的 的 36 shao 少 少 少 3 Yi 意 意 意 20 dian 点 点 点 37 tu 土 土 土 4 Li 立 立 立 21 ren 人 人 人 38 ba 吧 吧 吧 5 Rui 瑞 瑞 瑞 22 ru 入 入 入 39 zai 在 在 在 6 Lin 林 林 林 23 ru 如 如 如 40 wen 文 文 文 7 Jing 京 京 京 24 shui 水 水 水 41 ting 听 听 听 8 Fei 飞 飞 飞 25 dong 东 东 东 42 jiao 叫 叫 叫 9 Men 们 们 们 26 xi 西 西 西 43 xie 谢 谢 谢 10 Men 门 门 门 27 nan 南 南 南 44 guang 光 光 光 11 Er 儿 儿 儿 28 bei 北 北 北 45 ta 他 他 他 12 Er 而 而 而 29 kou 口 口 口 46 ta 她 她 她 13 He 河 河 河 30 chang 唱 唱 唱 47 ta 它 它 它 14 He 何 何 何 31 yin 因 因 因 48 jia 家 家 家 15 Ge 哥 哥 哥 32 hui 回 回 回 49 ai 爱 爱 爱 16 Ke 可 可 可 33 le 了 了 了 50 gu 古 古 古 17 Shi 是 是 是 34 bu 不 不 不 51 Ni 你 你 你 The entity of a Kanji’s image of a certain Kanji characters, with its features, are used to recognise a Kanji character by applying one of the six distancee measurement methods. The models in the radical images and Font Entity are used to restrain the radical image being measured on its distance. The radical image entity is also used to search a Kanji with certain radical image which is being The test resulted in the matrix of 6 x 6 with average distance measurement methods, the smallest distance range on each features and the average distance range of each features in this method reaches the success rate of 100 %. Matrix 3 x 3 and matrix 4 x 4 are considered to be more successful with the smallest range distance method. It is due to the characteristics of the Kanji characters themselves that have cell dispersion on the segmentation on the matrix of 3 x 3 or 4 x 4.
  • 5. 2nd Information and Communication Technology Seminar, August 2006 ISSN 1858-1633 @2006 ICTS 522 The 2 x 2 matrix reaches the success rate of only 5 to 20 %. This is the resulted from the improper pixel divisions on all four matrixes to be used as the feature of an image. Hence, the pixel dispersion cannot be further detected. Table 2. The Result of Kanji Character Recognition Test min avg max min sel avg sel max sel No Name Kan ji 2 x 2 3 x 3 4 x 4 6 x 6 2 x 2 3 x 3 4 x 4 6 x 6 2 x 2 3 x 3 4 x 4 6 x 6 2 x 2 3 x 3 4 x 4 6 x 6 2 x 2 3 x 3 4 x 4 6 x 6 2 x 2 3 x 3 4 x 4 6 x 6 1 bei 北 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 2 de 的 1 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1 3 dian 点 1 1 1 1 0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 4 dong 东 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 5 er 儿 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1 6 er 而 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 7 fei 飞 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 8 ge 哥 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 9 hao 好 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1 10 he 何 0 1 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0 1 11 he 河 0 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 0 1 12 jing 京 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 13 ke 可 0 1 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 14 li 立 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 15 lin 林 0 1 1 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 1 16 men 们 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 17 men 门 0 0 1 1 0 0 1 1 0 1 1 1 0 0 1 1 0 0 1 1 0 1 1 1 18 nan 南 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 19 ren 人 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 Ru 入 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 0 0 Success 4 1 3 1 6 1 9 2 9 1 4 2 0 1 5 5 1 3 2 5 1 4 2 0 2 7 1 2 2 0 2 6 7 1 6 Success rate 2 0 % 6 5 % 8 0 % 9 5 % 1 0 % 4 5 % 7 0 % 1 0 0 % 5 % 2 5 % 2 5 % 6 5 % 1 0 % 2 5 % 7 0 % 1 0 0 % 1 0 % 3 5 % 6 0 % 1 0 0 % 1 0 % 3 0 % 3 5 % 8 0 % The success rate of obtained using the radical images is 73%. In this test the population matrix is 3x3. Due to the space limitation this result is not presented in this paper, readers are encouraged to consult [9]. The success rate can be very much likely increased if the 6 x 6- feature matrix is used. 4 CONCLUSION Based on the research upon the use of pixel population matrix and realtional database to recognize a kanji charcter and its radicals, the writers come up with several conclusions: 1. Amidst the pixel population matrix of 2x2, 3x3, 4x4 and 6x6, it is the 6 x 6 pixel that is more accurate to recognize the Kanji’s image. 2. The cell features of the 2x2 pixel population matrix are not sufficient to accomodate the characterization of the pixel dispersion from its whole four cells. 3. The sets of image features stored in a relational database helps the implementation of distance range measurement methods which need the set themseleves in character recognition. There are several suggestions that the writers can propose for the upcoming research. Those suggestions are:
  • 6. A Chinese Character Recognition Method Based On Population Matrix And Relational Database – Teady Matius Surya Mulyana ISSN 1858-1633 @2006 ICTS 523 1. The use of the six methods of distance range measurement are also applicable for a character recognition using other features. 2. There has not been any detailed research upon the possible conditions that determine the success of one method towards the others. These methods can be used for further research in the future. REFERENCE [1] Kasmito and Tanzil, J., Petunjuk Termudah Belajar Mandarin, Binarupa Aksara, Jakarta, Indonesia, 1997. [2] Low, A., Introductory Computer Vision and Image Processing, McGraw-Hill, Berkshire, UK, 1991. [3] Lu, G., Multimedia Database Management Systems, Artech House, London, 1999. [4] Gonzales, R.C and Woods, R.E. Digital Image Processing, Addison-Wesley Publishing Company, 1992. [5] Hearn, D. and Baker, P.M., Computer Graphics, Prentice Hall, USA, , 1986. [6] Kastury, R. and C. Fain, R., Computer Vision: Collection of Computer Vision Journal 1951 – 1991, IEEE Computer Society Press, Los Amitos, CA, 1991. [7] Steinmetz, R. dan Nahrstedt, K., Multimedia Computing, Communication & Application – Innovative Technology Series, Prentice Hall, USA, 1995. [8] Liu, C., Jaeger, S., Nakagawa, S., 2004, “Online Recognition of Chinese Characters: The State of The Art”, IEEE Transaction On Pattern Analysis And Machine Intelligence, 26(2), 2004, 198 - 213. [9] Mulyana, T.M.S., Penggunaan Matriks Populasi Pixel Dan Relational Database Untuk Mengenali Huruf Kanji Dan Radikalnya, Masters Thesis, Computer Science Study Program, Gadjah Mada University, 2006.