SlideShare a Scribd company logo
David C. Wyld et al. (Eds) : ICAITA, CDKP, CMC, SOFT, SAI - 2016
pp. 53– 62, 2016. © CS & IT-CSCP 2016 DOI : 10.5121/csit.2016.61305
A SURVEY OF MARKOV CHAIN MODELS IN
LINGUISTICS APPLICATIONS
Fawaz S. Al-Anziand Dia AbuZeina
Department of Computer Engineering, Kuwait University, Kuwait City, Kuwait
fawaz.alanzi@ku.edu.kw,abuzeina@ku.edu.kw
ABSTRACT
Markov chain theory isan important tool in applied probability that is quite useful in modeling
real-world computing applications.For a long time, rresearchers have used Markov chains for
data modeling in a wide range of applications that belong to different fields such as
computational linguists, image processing, communications,bioinformatics, finance systems,
etc. This paper explores the Markov chain theory and its extension hidden Markov models
(HMM) in natural language processing (NLP) applications. This paper also presents some
aspects related to Markov chains and HMM such as creating transition matrices, calculating
data sequence probabilities, and extracting the hidden states.
KEYWORDS
Markov chains,Hidden Markov Models, computational linguistics, pattern recognition,
statistical
1. INTRODUCTION
Markov chains theory is increasingly being adopted in real-world computing applications since it
provides a convenient way for modeling temporal, time-series data. At each clock tick, the system
moves into a new state that can be the same as the previous one. A Markov chain model is
amathematical tool that capture the patterns dependencies in pattern recognition systems. For this
reason, Markov chain theory is appropriate in natural langue processing (NLP) where it is
naturally characterized by dependencies between patterns such as characters or words.
Markov chains are directed graphs (a graphical model) that are generally used with relatively long
data sequences for data-mining tasks. Such tasks include prediction, classification, clustering,
pattern discovery, software testing, multimedia analysis, networks, etc. Reference [1] indicated
that there are two reasons of Markov chains popularity; very rich in mathematical structure and
work well in practice for several important applications. Hidden Markov models (HMM) is an
extension of Markov chains that used to find the hidden system’s states based on the
observations.
In order to facilitate the research in this direction, this paper provides a survey of this so popular
data modeling technique. However, because of the wide range of the research domains that use
this technique. We specifically focuson the linguistics related applications. Reference [2] list
some domains that utilize Markov chains theory which include: physics, chemistry, testing,
speech recognition, information sciences, queueing theory, internet applications, statistics,
economics and finance, social sciences, mathematical biology, genetics, games, music, baseball,
54 Computer Science & Information Technology (CS & IT)
Markov text generators, bioinformatics. Reference [3] lists the five greatest applicati
Markov chains that include Scherr’s application to computer performance evaluation, Brin and
Page’s application to PageRank and Web Search, Baum’s application to
application to information theory, and Markov’s application to Eugeny On
This paper is organized as follows. The next section presents
theory. Section 3 highlights the main concepts of HMM followed by a literature review of
Markov chains and HMM in section 4
2. MARKOV CHAINS
Markov chains are quite useful in modeling
memorylessstochastic model that describes the behaviour of an integer
The behaviour is the simple form of dependency in w
on the current state. According to [4], a random process is said to be Markov if the future of the
process, given the present, is independent of the past. To describe the transitions between states, a
transition diagram is used to describe the model and the probabilities of going from one state to
another. For example, Figure 1 shows a Markov chain
Hard) that belong to exam cases(i.e. states)
for transition from one state to another.
Figure 1. A Simple Markov chain with
The Markov chain diagrams are generally represented using
the transition probabilities from
using the entire states in the system. For example, i
datathat contains N states (e.g. the size of lexicon)
a matrix A= {aij} of size N*N. In matrix A,
a state i to a state j. Table 1 shows how
diagram shown in Figure 1. That is, the matrix carries the state transitions
the involved states(Easy, Ok, and Hard). For illustration, the P(E|
the next exam to be Easy given tha
Table
State
Previous Exam
In Table 1, the sum of the probability values at each row is 1 as the the sum of the probabilities
coming out of each node should be 1. Hence
worthy topic that has many details. For example
Computer Science & Information Technology (CS & IT)
ioinformatics. Reference [3] lists the five greatest applicati
Markov chains that include Scherr’s application to computer performance evaluation, Brin and
Page’s application to PageRank and Web Search, Baum’s application to HMM
application to information theory, and Markov’s application to Eugeny Onegin.
This paper is organized as follows. The next section presents a background of Markov chains
highlights the main concepts of HMM followed by a literature review of
ection 4. Finally, we conclude in section 5.
Markov chains are quite useful in modeling computational linguistics. A Markov chain is a
model that describes the behaviour of an integer-valued random process.
The behaviour is the simple form of dependency in which the next state (or event) depends only
on the current state. According to [4], a random process is said to be Markov if the future of the
process, given the present, is independent of the past. To describe the transitions between states, a
diagram is used to describe the model and the probabilities of going from one state to
another. For example, Figure 1 shows a Markov chain diagram with three states (Easy, Ok, and
(i.e. states). In the figure, each arc represents the probability value
for transition from one state to another.
Figure 1. A Simple Markov chain with three states
The Markov chain diagrams are generally represented using state transition matricesthat
one state to another. Hence, a state transition matrix is created
using the entire states in the system. For example, if a particular textual application has a
(e.g. the size of lexicon), then the state transition matrix is described
. In matrix A, the element aij denote the transition probability from
Table 1 shows how the state transition matrix used to characterize the Markov
diagram shown in Figure 1. That is, the matrix carries the state transitions probabilities between
(Easy, Ok, and Hard). For illustration, the P(E|H) denote to the probability of
the next exam to be Easy given that the previous exam was Hard.
Table 1. A state transition matrix of three states
Next Exam
Easy (E) Ok (O) Hard (H)
Easy (E) P(E|E) P(O|E) P(H|E)
Ok (O) P(E|O) P(O|O) P(H|O)
Hard (H) P(E|H) P(O|H) P(H|H)
In Table 1, the sum of the probability values at each row is 1 as the the sum of the probabilities
coming out of each node should be 1. Hence,P(E|E)+P(O|E)+P(P(H|E) equal 1. Markov chain is a
worthy topic that has many details. For examples, it contains discrete-time, continuous
ioinformatics. Reference [3] lists the five greatest applications of
Markov chains that include Scherr’s application to computer performance evaluation, Brin and
HMM, Shannon’s
Markov chains
highlights the main concepts of HMM followed by a literature review of
linguistics. A Markov chain is a
valued random process.
hich the next state (or event) depends only
on the current state. According to [4], a random process is said to be Markov if the future of the
process, given the present, is independent of the past. To describe the transitions between states, a
diagram is used to describe the model and the probabilities of going from one state to
with three states (Easy, Ok, and
probability value
transition matricesthat denote
Hence, a state transition matrix is created
a particular textual application has a training
is described by
the element aij denote the transition probability from
characterize the Markov
probabilities between
the probability of
In Table 1, the sum of the probability values at each row is 1 as the the sum of the probabilities
P(E|E)+P(O|E)+P(P(H|E) equal 1. Markov chain is a
time, continuous-time,
Computer Science & Information Technology (CS & IT)
time-reversed, reversible, and
irreducible case, also called ergodic, where it is possible to go from every state to every state.
To illustrate a simple Markov chain
used to create a transition matrix based on the
are inspirational English quotes picked from [5]:
(1) Power perceived is power achieved.
Figure 2 shows the transition matrix of these quotes by counting the total number of occurrences
of the adjacent two character sequences
number of unique characters appeared in
creating transition matrix is case insensitive where D is same as d, as an example. In addition, a
space between two words discarded and
shows that the maximum number in the matrix’s entries is 3 (a highlighted underlined value)
which means that moving from character e to r (e
this small corpus. The words that cont
Figure 2. A transition matrix of
Based on the information provided in the transition matrix shown in Figure 2. It is possible to
answer some questions related to the give
number of the two characterssequences appeared in the given data set
characters sequences that did not
characters sequences in the data set
such as weather forecasting. Therefore,
to the today’s weather. For example, if we have two states (Sunny, Rainy), and
to find the probability P(Sunny|Rainy)
provided in the probability transition matrix.
banking industry. A big portfolio of banks is b
classify loans to different states such as Good, Risky, and Bad loans.
For simplicity, the information presented in Figure 2 shows t
number of occurrences. Figure 3 shows
Computer Science & Information Technology (CS & IT)
, and irreducible Markov chains. The case shown in Figure 1 is
ergodic, where it is possible to go from every state to every state.
chain data model, a small data set contains two English sentences
used to create a transition matrix based on the neighbouringcharacters sequences. The sentences
are inspirational English quotes picked from [5]:
Power perceived is power achieved. (2) If you come to a fork in the road, take it.
Figure 2 shows the transition matrix of these quotes by counting the total number of occurrences
of the adjacent two character sequences. It is a 19 × 19 matrix where the value 19 is the
number of unique characters appeared in thesentences (i.e the two quotes). In this example,
creating transition matrix is case insensitive where D is same as d, as an example. In addition, a
discarded and not considered in the transition matrix. Figure 2 also
shows that the maximum number in the matrix’s entries is 3 (a highlighted underlined value)
which means that moving from character e to r (e r) is the most frequently sequence
. The words that contains this sequence are :{ Power (two times) and
Figure 2. A transition matrix of two characters sequences
Based on the information provided in the transition matrix shown in Figure 2. It is possible to
answer some questions related to the given data collection. Among inquires, what
sequences appeared in the given data set?What
that did not appear in the data collection?What is the least frequently
in the data set? Accordingly, Markov chains are used as prediction systems
. Therefore, it is possible to predict the tomorrow’s weather according
to the today’s weather. For example, if we have two states (Sunny, Rainy), and the requirement is
to find the probability P(Sunny|Rainy), Markov chains make it possible based on the information
probability transition matrix. Another example of the using Markov chains is
banking industry. A big portfolio of banks is based on loans. Therefore, Markov chain
classify loans to different states such as Good, Risky, and Bad loans.
For simplicity, the information presented in Figure 2 shows the transition matrixbased on total
number of occurrences. Figure 3 shows the same information but using probabilities instead of
55
The case shown in Figure 1 is
ergodic, where it is possible to go from every state to every state.
English sentences
characters sequences. The sentences
If you come to a fork in the road, take it.
Figure 2 shows the transition matrix of these quotes by counting the total number of occurrences
19 is the total
. In this example,
creating transition matrix is case insensitive where D is same as d, as an example. In addition, a
ition matrix. Figure 2 also
shows that the maximum number in the matrix’s entries is 3 (a highlighted underlined value)
r) is the most frequently sequence appeared in
ins this sequence are :{ Power (two times) andperceived}.
Based on the information provided in the transition matrix shown in Figure 2. It is possible to
collection. Among inquires, what is the total
are the two
is the least frequently two
prediction systems
it is possible to predict the tomorrow’s weather according
the requirement is
the information
Markov chains is
ased on loans. Therefore, Markov chainsare used to
based on total
the same information but using probabilities instead of
56 Computer Science & Information Technology (CS & IT)
the number of occurrences. That is, i
another. As previously indicated, the sum of entries at each row is equal 1. In Figure 3, any
matrix entry that has 0 means that there is no transition at that case. Similarly, if the matrix entry
is 1, it means that there is only one possible output of that state. For example, the character “o”
comes after “y”, and this is the only possible arc of the stat
Figure 3. A probability transition matrix of
3. HIDDEN MARKOV MODELS
Hidden Markov models (HMM)
fortemporal data modeling. However, the difference is that the
directly observed while they are
based on Figure 1 that shows athreeexam
supposed that a student’s parents want to
naturally, it is possible to recognize the exam as Easy or Ok if the
possible to recognize the exam as Hard
the required states (i.e. Easy, Ok,
student’s reaction or feeling. Hence, the parents
to know the hidden states. HMM is described using three matrices: the initial probability matrix,
the observation probability matrix, and the state transition matrix.
diagram that shows the states and the observations. In the figure, each arc represents the
probability between the states and between the states and the observations.
Figure 4. A HMM diagram with the transition and the observation arcs
Computer Science & Information Technology (CS & IT)
the number of occurrences. That is, it contains the probability of moving from one character to
another. As previously indicated, the sum of entries at each row is equal 1. In Figure 3, any
try that has 0 means that there is no transition at that case. Similarly, if the matrix entry
is 1, it means that there is only one possible output of that state. For example, the character “o”
comes after “y”, and this is the only possible arc of the state “y”.
robability transition matrix of two characters sequences
ODELS
(HMM) is an extension to Markov chains models as both
. However, the difference is that the states in Markov chain models
they are hidden in the case of HMM.We explain the concept of HMM
threeexam’s states Markov diagram. As a very simple example,
parents want to know the levels (i.e the difficulty) of theirson’s
it is possible to recognize the exam as Easy or Ok if the son feels Fine. Similarly,
as Hard if the son looks Scared. From the parents’ point of view,
Easy, Ok, or Hard) are hidden. However, they directly observe the
student’s reaction or feeling. Hence, the parents might use the observed reactionas an indication
HMM is described using three matrices: the initial probability matrix,
the observation probability matrix, and the state transition matrix. Figure 4 shows a HMM
diagram that shows the states and the observations. In the figure, each arc represents the
bability between the states and between the states and the observations.
HMM diagram with the transition and the observation arcs
t contains the probability of moving from one character to
another. As previously indicated, the sum of entries at each row is equal 1. In Figure 3, any
try that has 0 means that there is no transition at that case. Similarly, if the matrix entry
is 1, it means that there is only one possible output of that state. For example, the character “o”
models as both used
states in Markov chain models are
We explain the concept of HMM
As a very simple example,
theirson’s exams,
feels Fine. Similarly, it is
the parents’ point of view,
Hard) are hidden. However, they directly observe the
reactionas an indication
HMM is described using three matrices: the initial probability matrix,
Figure 4 shows a HMM
diagram that shows the states and the observations. In the figure, each arc represents the
Computer Science & Information Technology (CS & IT)
Based on the information provided in the matrices, either Baum
Viterbi (also called best path) algorithms used to find the probability scores during recognition
phase. Figure 5 shows the trellis diagram
used to compute the recognitin probability of a sequence, Viterbi is used to find t
sequence associated with the given observtatin, this procoss is also known as back
Hence, after computing the observations sequence probability and finding the
maximumprobability (supposed the star in Figure 5), the Viterbi
to identify the states (sources) from which the observations sequence have been emitted.
5, the maximum probalities supposed to be
Ok, Easy, Hard, respectively.
Figure 5.Trellis diagram of three states HMM
4. LINGUISTIC APPLICATIONS
In the literature, there are quite
applications. Markov chain models and HMMs are of great interest to linguistic scholar who
primarily work on data sequences. Even though this study focuses on linguistic applications,
however, Markov chains used to model a variety of phenom
are some of studies employed Markov chains.
literature hastoo many studies employed Markov chains:
The following two subsections include some of the
theory. Linguistic applications topics
recognition,speech emotion recognition
classification, text summarization, optical character recognition (OCR),
question answering,authorship attribution,
[6] is a good reference as it demonstrates
image processing, text and image compression,
networking, signal processing, communications, software testing, genetics, bioinformatics,
genome structure recognition, anomaly detection, tumour classification, water quality,
epidemic spread, wind power, malicious and cyber
physics, chemistry, mathematical biology, games, music, multimedia processing, business
activities, frauds detection.
Computer Science & Information Technology (CS & IT)
Based on the information provided in the matrices, either Baum-Welch (also called any path) or
led best path) algorithms used to find the probability scores during recognition
shows the trellis diagram forexam states HMM. While Baum-Welch algorithm is
ompute the recognitin probability of a sequence, Viterbi is used to find t
sequence associated with the given observtatin, this procoss is also known as back
Hence, after computing the observations sequence probability and finding the
the star in Figure 5), the Viterbi algorithm leads the process back
) from which the observations sequence have been emitted.
supposed to be achieved at the states shown using the
Figure 5.Trellis diagram of three states HMM
PPLICATIONS
In the literature, there are quite many works on modelingcontent dependencies for linguistics
applications. Markov chain models and HMMs are of great interest to linguistic scholar who
primarily work on data sequences. Even though this study focuses on linguistic applications,
Markov chains used to model a variety of phenomena in different fields. The following
some of studies employed Markov chains. We intentionally ignored the references
studies employed Markov chains:
subsections include some of the linguistic studies that utilized Markov chain
Linguistic applications topics mainly include (but not limited) speech
recognition,part-of-speech tagging, machine translation, text
classification, text summarization, optical character recognition (OCR), named entity recognition
authorship attribution, etc.For the reader who interested in NLP,
is a good reference as it demonstrates a thorough study of NLP (Almost) from Scratch.
image processing, text and image compression, video segmentation ,forecasting,
networking, signal processing, communications, software testing, genetics, bioinformatics,
genome structure recognition, anomaly detection, tumour classification, water quality,
epidemic spread, wind power, malicious and cyber-attack detection, traffic management,
physics, chemistry, mathematical biology, games, music, multimedia processing, business
57
Welch (also called any path) or
led best path) algorithms used to find the probability scores during recognition
Welch algorithm is
ompute the recognitin probability of a sequence, Viterbi is used to find the best-state
sequence associated with the given observtatin, this procoss is also known as back-trakcing.
Hence, after computing the observations sequence probability and finding the
algorithm leads the process back
) from which the observations sequence have been emitted. In Figure
shown using the dotted lines:
for linguistics
applications. Markov chain models and HMMs are of great interest to linguistic scholar who
primarily work on data sequences. Even though this study focuses on linguistic applications,
The following
We intentionally ignored the references as the
studies that utilized Markov chain
(but not limited) speech
, machine translation, text
ity recognition,
For the reader who interested in NLP, Reference
NLP (Almost) from Scratch.
forecasting,
networking, signal processing, communications, software testing, genetics, bioinformatics,
genome structure recognition, anomaly detection, tumour classification, water quality,
, traffic management,
physics, chemistry, mathematical biology, games, music, multimedia processing, business
58 Computer Science & Information Technology (CS & IT)
4.1. Markov chains based research
The literature has a large number of studies that employ Markov chains for NLP applications. The
following are some linguistic related applications. Reference [7] proposed a word-dividing
algorithm based on statistical language models and Markov chain theory for Chinese speech
processing. Reference [8] presented a semantic indexing Markov chains algorithm that uses both
audio and visual information for event detection in soccer programs. Reference [9] investigated
the use of Markov Chains and sequence kernels for the task of authorship attribution. Reference
[10] implemented a probabilistic framework for support vector machine (SVM) that allows for
automatic tuning of the penalty coefficient parameters and the kernel parameters via Markov
chain for web searching via text categorization. Reference [11] demonstrated an automatic video
annotation using multimodal Dirichlet process mixture model by collecting samples from the
corresponding Markov chain. Reference [12] used a linguistic steganography detection method
based on Markov chain models. Reference [13] showed how probabilistic Markov chain models
can be used to detect topical structure in large text corpora.
Reference [14] proposed a method of recognizing location names from Chinese texts based on
Max-Margin Markov Network. Reference [15] utilized Markov chain and statistical language
models in a linguistic steganography detection algorithm. Reference [16] proposed a Markov
chain based algorithm for Chinese word segmentation. Reference [17] presented two new textual
feature selection methods based on Markov chains rank aggregation techniques. Reference [18]
proposed a Markov chain model for radical descriptors in Arabic Text Mining. Reference [19]
presented statistical Markov chain models for the distributions of words in text lines. Reference
[20] proposed a method for handwritten Chinese/Japanese text (character string) recognition
based on semi-Markov conditional random fields (semi-CRFs). Reference [21] presented a
Markov chain method to find authorship attribution on relational data between function words.
Reference [22] utilized a probabilistic Markov chain model to infer the location of Twitter users.
Reference [23] proposed a Markov chain based technique to determine the number of clusters of a
corpus of short-text documents.Reference [24] proposed a Markov chain based method for digital
document authentication. Reference [25] used Markov chain for authorship attribution in Arabic
poetry.
4.2. Hidden Markov modelsbased research
Linguistic HMM based research has been for long an active research area due to the rapid
development in NLP applications. The literature has many studies as follows. Reference [26]
proposed to extract acronyms and their meaning from unstructured text as a stochastic process
using HMM. Reference [27] proposed a morphological segmentation method with HMM method
for Mongolian.Reference [28] employed HMM for Arabic handwritten word recognition based on
HMM. Reference [29] presented a scheme for off-line recognition of large-set handwritten
characters in the framework of the first-order HMMs. Reference [30] proposed the use of hybrid
HMM/Artificial Neural Network (ANN) models for recognizing unconstrained offline
handwritten texts. Reference [31] used HMMs for recognizing Farsi handwritten words.
Reference [32] describes recent advances in HMM based OCR for machine-printed Arabic
documents.Reference [33] proposed a HMMbased method fornamed entity recognition.
Reference [34] combined text classification and HMM techniques for structuring randomized
clinical trial abstracts. Reference [35] employed HMM for medical text classification. Reference
[36] propose text (sequences of pages) categorization architecture based on HMM.Reference [37]
described a model for machine translation based on first-order HMM.Reference [38] introduced
speech emotion recognition by use of HMM.Reference [39] presented a HMMbased method for
speech emotion recognition. Reference [40] discussed the role of HMM in speech recognition.
Reference [41] indicated that almost all present day large vocabulary continuous speech
Computer Science & Information Technology (CS & IT) 59
recognition (LVCSR) systems based on HMMs.Reference [42] presented a text summarization
method based on HMM. Reference [43] presented a method for summarizing speech documents
using HMM. Reference [44] used HMM for part-of-speech tagging task. Reference [45]
presented a second-order approximation of HMM for part-of-speech tagging task.
5. CONCLUSIONS
This work demonstrates the potential and the size of Markov chains research. The study reveals
that the Markov chain and HMM is of high important for linguistic applications. Similarly,
Markov chains are also widely used in many other applications. For future work, it worthy to
explore the power of Markov chain in new linguistic and scientific directions with more details.
ACKNOWLEDGEMENTS
This work is supported by Kuwait Foundation of Advancement of Science (KFAS), Research Grant
Number P11418EO01 and Kuwait University Research Administration Research Project Number EO06/12.
REFERENCES
[1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech
recognition." Proceedings of the IEEE 77.2 (1989): 257-286.
[2] Markov_chain. (2016, August). Retrieved from https://en.wikipedia.org/wiki/Markov_chain
[3] Von Hilgers, Philipp, and Amy N. Langville. "The five greatest applications of Markov Chains."
Proceedings of the Markov Anniversary Meeting, Boston Press, Boston, MA. 2006.
[4] Leon-Garcia, Alberto, and Alberto. Leon-Garcia. Probability, statistics, and random processes for
electrical engineering. Upper Saddle River, NJ: Pearson/Prentice Hall, 2008.
[5] California Indian Education. (2016, August). Retrieved from
http://www.californiaindianeducation.org/inspire/world/
[6] Collobert, Ronan, et al. "Natural language processing (almost) from scratch."Journal of Machine
Learning Research 12.Aug (2011): 2493-2537.
[7] Bin, Tian, et al. "A Chinese word dividing algorithm based on statistical language models." Signal
Processing, 1996., 3rd International Conference on. Vol. 1. IEEE, 1996.
[8] Leonardi, Riccardo, PierangeloMigliorati, and Maria Prandini. "Semantic indexing of soccer audio-
visual sequences: a multimodal approach based on controlled Markov chains." IEEE Transactions on
Circuits and Systems for Video Technology 14.5 (2004): 634-643.
[9] Sanderson, Conrad, and Simon Guenter. "On authorship attribution via Markov chains and sequence
kernels." 18th International Conference on Pattern Recognition (ICPR'06). Vol. 3. IEEE, 2006.
[10] Lim, Bresley Pin Cheong, et al. "Web search with text categorization using probabilistic framework
of SVM." 2006 IEEE International Conference on Systems, Man and Cybernetics. Vol. 4. IEEE,
2006.
[11] Velivelli, Atulya, and Thomas S. Huang. "Automatic video annotation using multimodal Dirichlet
process mixture model." Networking, Sensing and Control, 2008. ICNSC 2008. IEEE International
Conference on. IEEE, 2008.
60 Computer Science & Information Technology (CS & IT)
[12] Chen, Zhi-li, et al. "Effective linguistic steganography detection." Computer and Information
Technology Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference on. IEEE,
2008.
[13] Dowman, Mike, et al. "A probabilistic model of meetings that combines words and discourse
features." IEEE Transactions on Audio, Speech, and Language Processing 16.7 (2008): 1238-1248.
[14] Li, Lishuang, Zhuoye Ding, and Degen Huang. "Recognizing location names from Chinese texts
based on max-margin markov network." Natural Language Processing and Knowledge Engineering,
2008. NLP-KE'08. International Conference on. IEEE, 2008.
[15] Meng, Peng, et al. "Linguistic steganography detection algorithm using statistical language model."
Information Technology and Computer Science, 2009. ITCS 2009. International Conference on. Vol.
2. IEEE, 2009.
[16] Baomao, Pang, and Shi Haoshan. "Research on improved algorithm for Chinese word segmentation
based on Markov chain." Information Assurance and Security, 2009. IAS'09. Fifth International
Conference on. Vol. 1. IEEE, 2009.
[17] Wu, Ou, et al. "Rank aggregation based text feature selection." Web Intelligence and Intelligent Agent
Technologies, 2009. WI-IAT'09. IEEE/WIC/ACM International Joint Conferences on. Vol. 1. IET,
2009.
[18] El Hassani, Ibtissam, AbdelazizKriouile, and Youssef BenGhabrit. "Measure of fuzzy presence of
descriptors on Arabic Text Mining." 2012 Colloquium in Information Science and Technology. IEEE,
2012.
[19] Haji, Mehdi, et al. "Statistical Hypothesis Testing for Handwritten Word Segmentation Algorithms."
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on. IEEE, 2012.
[20] Zhou, Xiang-Dong, et al. "Handwritten Chinese/Japanese text recognition using semi-Markov
conditional random fields." IEEE transactions on pattern analysis and machine intelligence 35.10
(2013): 2413-2426.
[21] Segarra, Santiago, Mark Eisen, and Alejandro Ribeiro. "Authorship attribution using function words
adjacency networks." 2013 IEEE International Conference on Acoustics, Speech and Signal
Processing. IEEE, 2013.
[22] Rodrigues, Erica, et al. "Uncovering the location of Twitter users." Intelligent Systems (BRACIS),
2013 Brazilian Conference on. IEEE, 2013.
[23] Goyal, Anil, Mukesh K. Jadon, and Arun K. Pujari. "Spectral approach to find number of clusters of
short-text documents." Computer Vision, Pattern Recognition, Image Processing and Graphics
(NCVPRIPG), 2013 Fourth National Conference on. IEEE, 2013.
[24] Shen, Jau Ji, and Ken Tzu Liu. "A Novel Approach by Applying Image Authentication Technique on
a Digital Document." Computer, Consumer and Control (IS3C), 2014 International Symposium on.
IEEE, 2014.
[25] Ahmed, Al-Falahi, et al. "Authorship attribution in Arabic poetry." 2015 10th International
Conference on Intelligent Systems: Theories and Applications (SITA). IEEE, 2015.
[26] Osiek, Bruno Adam, Geraldo Xexéo, and Luis Alfredo Vidal de Carvalho. "A language-independent
acronym extraction from biomedical texts with hidden Markov models." IEEE Transactions on
Biomedical Engineering 57.11 (2010): 2677-2688.
[27] He, Miantao, Miao Li, and Lei Chen. "Mongolian Morphological Segmentation with Hidden Markov
Model." Asian Language Processing (IALP), 2012 International Conference on. IEEE, 2012.
Computer Science & Information Technology (CS & IT) 61
[28] Alma'adeed, Somaya, Colin Higgens, and Dave Elliman. "Recognition of off-line handwritten Arabic
words using hidden Markov model approach." Pattern Recognition, 2002. Proceedings. 16th
International Conference on. Vol. 3. IEEE, 2002.
[29] Park, Hee-Seon, and Seong-Whan Lee. "Off-line recognition of large-set handwritten characters with
multiple hidden Markov models." Pattern Recognition 29.2 (1996): 231-244.
[30] Espana-Boquera, Salvador, et al. "Improving offline handwritten text recognition with hybrid
HMM/ANN models." IEEE transactions on pattern analysis and machine intelligence 33.4 (2011):
767-779.
[31] Imani, Zahra, et al. "offline Handwritten Farsi cursive text recognition using Hidden Markov
Models." Machine Vision and Image Processing (MVIP), 2013 8th Iranian Conference on. IEEE,
2013.
[32] Prasad, Rohit, et al. "Improvements in hidden Markov model based Arabic OCR." Pattern
Recognition, 2008. ICPR 2008. 19th International Conference on. IEEE, 2008.
[33] Zhou, GuoDong, and Jian Su. "Named entity recognition using an HMM-based chunk tagger."
proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association
for Computational Linguistics, 2002.
[34] Xu, Rong, et al. "Combining Text Classification and Hidden Markov Modeling Techniques for
Structuring Randomized Clinical Trial Abstracts." AMIA. 2006.
[35] Yi, Kwan, and JamshidBeheshti. "A hidden Markov model-based text classification of medical
documents." Journal of Information Science (2008).
[36] Frasconi, Paolo, Giovanni Soda, and Alessandro Vullo. "Hidden markov models for text
categorization in multi-page documents." Journal of Intelligent Information Systems 18.2-3 (2002):
195-217.
[37] Vogel, Stephan, Hermann Ney, and Christoph Tillmann. "HMM-based word alignment in statistical
translation." Proceedings of the 16th conference on Computational linguistics-Volume 2. Association
for Computational Linguistics, 1996.
[38] Schuller, Björn, Gerhard Rigoll, and Manfred Lang. "Hidden Markov model-based speech emotion
recognition." Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). 2003 IEEE
International Conference on. Vol. 2. IEEE, 2003.
[39] Nwe, Tin Lay, Say Wei Foo, and Liyanage C. De Silva. "Speech emotion recognition using hidden
Markov models." Speech communication 41.4 (2003): 603-623.
[40] Juang, Biing Hwang, and Laurence R. Rabiner. "Hidden Markov models for speech recognition."
Technometrics 33.3 (1991): 251-272.
[41] Gales, Mark, and Steve Young. "The application of hidden Markov models in speech recognition."
Foundations and trends in signal processing 1.3 (2008): 195-304.
[42] Conroy, John M., and Dianne P. O'leary. "Text summarization via hidden markov models."
Proceedings of the 24th annual international ACM SIGIR conference on Research and development
in information retrieval. ACM, 2001.
[43] Maskey, Sameer, and Julia Hirschberg. "Summarizing speech without text using hidden markov
models." Proceedings of the Human Language Technology Conference of the NAACL, Companion
Volume: Short Papers. Association for Computational Linguistics, 2006.
62 Computer Science & Information Technology (CS & IT)
[44] Kupiec, Julian. "Robust part-of-speech tagging using a hidden Markov model." Computer Speech &
Language 6.3 (1992): 225-242.
[45] Thede, Scott M., and Mary P. Harper. "A second-order hidden Markov model for part-of-speech
tagging." Proceedings of the 37th annual meeting of the Association for Computational Linguistics on
Computational Linguistics. Association for Computational Linguistics, 1999.

More Related Content

PDF
Search for a substring of characters using the theory of non-deterministic fi...
PDF
A Survey of String Matching Algorithms
PDF
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
PDF
B046021319
PDF
Paper Explained: Understanding the wiring evolution in differentiable neural ...
PDF
Efficient Forecasting of Exchange rates with Recurrent FLANN
PDF
A study and implementation of the transit route network design problem for a ...
PDF
ssc_icml13
Search for a substring of characters using the theory of non-deterministic fi...
A Survey of String Matching Algorithms
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
B046021319
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Efficient Forecasting of Exchange rates with Recurrent FLANN
A study and implementation of the transit route network design problem for a ...
ssc_icml13

What's hot (17)

PDF
The Improved Hybrid Algorithm for the Atheer and Berry-ravindran Algorithms
PDF
Comparison of search algorithms in Javanese-Indonesian dictionary application
PDF
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
PDF
PDF
llorma_jmlr copy
DOC
4 report format
DOCX
A survey of xml tree patterns
PDF
Relevance feature discovery for text mining
PDF
Image-Based Literal Node Matching for Linked Data Integration
PDF
An Application of Pattern matching for Motif Identification
PDF
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
PDF
A Simple Method for Solving Type-2 and Type-4 Fuzzy Transportation Problems
PDF
Burr Type III Software Reliability Growth Model
DOC
Monoton-working version-1995.doc
PDF
Conceptual similarity measurement algorithm for domain specific ontology[
PDF
Converting UML Class Diagrams into Temporal Object Relational DataBase
PDF
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
The Improved Hybrid Algorithm for the Atheer and Berry-ravindran Algorithms
Comparison of search algorithms in Javanese-Indonesian dictionary application
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
llorma_jmlr copy
4 report format
A survey of xml tree patterns
Relevance feature discovery for text mining
Image-Based Literal Node Matching for Linked Data Integration
An Application of Pattern matching for Motif Identification
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
A Simple Method for Solving Type-2 and Type-4 Fuzzy Transportation Problems
Burr Type III Software Reliability Growth Model
Monoton-working version-1995.doc
Conceptual similarity measurement algorithm for domain specific ontology[
Converting UML Class Diagrams into Temporal Object Relational DataBase
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
Ad

Viewers also liked (20)

PDF
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
PDF
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
PDF
TOPIC BASED ANALYSIS OF TEXT CORPORA
PDF
THE IMPACT OF EXISTING SOUTH AFRICAN ICT POLICIES AND REGULATORY LAWS ON CLOU...
PDF
MODEL CHECKERS –TOOLS AND LANGUAGES FOR SYSTEM DESIGN- A SURVEY
PDF
FORMAL MODELING AND VERIFICATION OF MULTI-AGENTS SYSTEM USING WELLFORMED NETS
PDF
RECOGNITION OF RECAPTURED IMAGES USING PHYSICAL BASED FEATURES
PDF
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
PDF
APPROACH MULTI-AGENTS EMBEDDED ALARM IN POTROOMS
PDF
STOCHASTIC MODELING TECHNOLOGY FOR GRAIN CROPS STORAGE APPLICATION: REVIEW
PDF
EFFICIENCY OF SOFTWARE DEVELOPMENT AFTER IMPROVEMENTS IN REQUIREMENTS ENGINEE...
PDF
COMBINING REUSABLE TEST CASES AND CONTINUOUS SECURITY TESTING FOR REDUCING WE...
PDF
PERFORMANCE EVALUATION OF OSPF AND RIP ON IPV4 & IPV6 TECHNOLOGY USING G.711 ...
PDF
ON ESTIMATION OF TIME SCALES OF MASS TRANSPORT IN INHOMOGENOUS MATERIAL
PDF
EVALUATION OF SOFTWARE DEGRADATION AND FORECASTING FUTURE DEVELOPMENT NEEDS I...
PDF
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
PDF
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
PDF
UBIQUITOUS COMPUTING AND SCRUM SOFTWARE ANALYSIS FOR COMMUNITY SOFTWARE
PDF
TRACEABILITY OF UNIFIED MODELING LANGUAGE DIAGRAMS FROM USE CASE MAPS
PDF
CENTROG FEATURE TECHNIQUE FOR VEHICLE TYPE RECOGNITION AT DAY AND NIGHT TIMES
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
TOPIC BASED ANALYSIS OF TEXT CORPORA
THE IMPACT OF EXISTING SOUTH AFRICAN ICT POLICIES AND REGULATORY LAWS ON CLOU...
MODEL CHECKERS –TOOLS AND LANGUAGES FOR SYSTEM DESIGN- A SURVEY
FORMAL MODELING AND VERIFICATION OF MULTI-AGENTS SYSTEM USING WELLFORMED NETS
RECOGNITION OF RECAPTURED IMAGES USING PHYSICAL BASED FEATURES
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
APPROACH MULTI-AGENTS EMBEDDED ALARM IN POTROOMS
STOCHASTIC MODELING TECHNOLOGY FOR GRAIN CROPS STORAGE APPLICATION: REVIEW
EFFICIENCY OF SOFTWARE DEVELOPMENT AFTER IMPROVEMENTS IN REQUIREMENTS ENGINEE...
COMBINING REUSABLE TEST CASES AND CONTINUOUS SECURITY TESTING FOR REDUCING WE...
PERFORMANCE EVALUATION OF OSPF AND RIP ON IPV4 & IPV6 TECHNOLOGY USING G.711 ...
ON ESTIMATION OF TIME SCALES OF MASS TRANSPORT IN INHOMOGENOUS MATERIAL
EVALUATION OF SOFTWARE DEGRADATION AND FORECASTING FUTURE DEVELOPMENT NEEDS I...
COMPARATIVE STUDY FOR PERFORMANCE ANALYSIS OF VOIP CODECS OVER WLAN IN NONMOB...
AN INVESTIGATION OF THE MONITORING ACTIVITY IN SELF ADAPTIVE SYSTEMS
UBIQUITOUS COMPUTING AND SCRUM SOFTWARE ANALYSIS FOR COMMUNITY SOFTWARE
TRACEABILITY OF UNIFIED MODELING LANGUAGE DIAGRAMS FROM USE CASE MAPS
CENTROG FEATURE TECHNIQUE FOR VEHICLE TYPE RECOGNITION AT DAY AND NIGHT TIMES
Ad

Similar to A SURVEY OF MARKOV CHAIN MODELS IN LINGUISTICS APPLICATIONS (20)

PPTX
Markov chain-model
PPTX
Lecture 6 - Marcov Chain introduction.pptx
PDF
I05745368
PDF
makov chain_basic
 
PPTX
Artificial Intelligence_MARKOV MODEL.pptx
PDF
A STUDY ON MARKOV CHAIN WITH TRANSITION DIAGRAM
PPTX
Markov Chains.pptx
PPTX
Hidden markov model
PPTX
Markov presentation
PPTX
Hidden Markov Models
PDF
CS-438 COMPUTER SYSTEM MODELINGWK9+10LEC17-19.pdf
PPTX
Markov Model chains
PPT
Markov Chains
PDF
Markov Chains | Edureka
PDF
12 Machine Learning Supervised Hidden Markov Chains
PPTX
NLP_KASHK:Markov Models
PDF
Book chapter-5
PDF
Introduction To Markov Chains | Markov Chains in Python | Edureka
PPTX
Teradata Analytics Meet @ Linkedin - May 2017
PDF
Markor chain presentation
Markov chain-model
Lecture 6 - Marcov Chain introduction.pptx
I05745368
makov chain_basic
 
Artificial Intelligence_MARKOV MODEL.pptx
A STUDY ON MARKOV CHAIN WITH TRANSITION DIAGRAM
Markov Chains.pptx
Hidden markov model
Markov presentation
Hidden Markov Models
CS-438 COMPUTER SYSTEM MODELINGWK9+10LEC17-19.pdf
Markov Model chains
Markov Chains
Markov Chains | Edureka
12 Machine Learning Supervised Hidden Markov Chains
NLP_KASHK:Markov Models
Book chapter-5
Introduction To Markov Chains | Markov Chains in Python | Edureka
Teradata Analytics Meet @ Linkedin - May 2017
Markor chain presentation

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Lesson notes of climatology university.
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
master seminar digital applications in india
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Lesson notes of climatology university.
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPH.pptx obstetrics and gynecology in nursing
STATICS OF THE RIGID BODIES Hibbelers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial diseases, their pathogenesis and prophylaxis
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Insiders guide to clinical Medicine.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
GDM (1) (1).pptx small presentation for students
master seminar digital applications in india
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
FourierSeries-QuestionsWithAnswers(Part-A).pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Abdominal Access Techniques with Prof. Dr. R K Mishra
VCE English Exam - Section C Student Revision Booklet
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

A SURVEY OF MARKOV CHAIN MODELS IN LINGUISTICS APPLICATIONS

  • 1. David C. Wyld et al. (Eds) : ICAITA, CDKP, CMC, SOFT, SAI - 2016 pp. 53– 62, 2016. © CS & IT-CSCP 2016 DOI : 10.5121/csit.2016.61305 A SURVEY OF MARKOV CHAIN MODELS IN LINGUISTICS APPLICATIONS Fawaz S. Al-Anziand Dia AbuZeina Department of Computer Engineering, Kuwait University, Kuwait City, Kuwait fawaz.alanzi@ku.edu.kw,abuzeina@ku.edu.kw ABSTRACT Markov chain theory isan important tool in applied probability that is quite useful in modeling real-world computing applications.For a long time, rresearchers have used Markov chains for data modeling in a wide range of applications that belong to different fields such as computational linguists, image processing, communications,bioinformatics, finance systems, etc. This paper explores the Markov chain theory and its extension hidden Markov models (HMM) in natural language processing (NLP) applications. This paper also presents some aspects related to Markov chains and HMM such as creating transition matrices, calculating data sequence probabilities, and extracting the hidden states. KEYWORDS Markov chains,Hidden Markov Models, computational linguistics, pattern recognition, statistical 1. INTRODUCTION Markov chains theory is increasingly being adopted in real-world computing applications since it provides a convenient way for modeling temporal, time-series data. At each clock tick, the system moves into a new state that can be the same as the previous one. A Markov chain model is amathematical tool that capture the patterns dependencies in pattern recognition systems. For this reason, Markov chain theory is appropriate in natural langue processing (NLP) where it is naturally characterized by dependencies between patterns such as characters or words. Markov chains are directed graphs (a graphical model) that are generally used with relatively long data sequences for data-mining tasks. Such tasks include prediction, classification, clustering, pattern discovery, software testing, multimedia analysis, networks, etc. Reference [1] indicated that there are two reasons of Markov chains popularity; very rich in mathematical structure and work well in practice for several important applications. Hidden Markov models (HMM) is an extension of Markov chains that used to find the hidden system’s states based on the observations. In order to facilitate the research in this direction, this paper provides a survey of this so popular data modeling technique. However, because of the wide range of the research domains that use this technique. We specifically focuson the linguistics related applications. Reference [2] list some domains that utilize Markov chains theory which include: physics, chemistry, testing, speech recognition, information sciences, queueing theory, internet applications, statistics, economics and finance, social sciences, mathematical biology, genetics, games, music, baseball,
  • 2. 54 Computer Science & Information Technology (CS & IT) Markov text generators, bioinformatics. Reference [3] lists the five greatest applicati Markov chains that include Scherr’s application to computer performance evaluation, Brin and Page’s application to PageRank and Web Search, Baum’s application to application to information theory, and Markov’s application to Eugeny On This paper is organized as follows. The next section presents theory. Section 3 highlights the main concepts of HMM followed by a literature review of Markov chains and HMM in section 4 2. MARKOV CHAINS Markov chains are quite useful in modeling memorylessstochastic model that describes the behaviour of an integer The behaviour is the simple form of dependency in w on the current state. According to [4], a random process is said to be Markov if the future of the process, given the present, is independent of the past. To describe the transitions between states, a transition diagram is used to describe the model and the probabilities of going from one state to another. For example, Figure 1 shows a Markov chain Hard) that belong to exam cases(i.e. states) for transition from one state to another. Figure 1. A Simple Markov chain with The Markov chain diagrams are generally represented using the transition probabilities from using the entire states in the system. For example, i datathat contains N states (e.g. the size of lexicon) a matrix A= {aij} of size N*N. In matrix A, a state i to a state j. Table 1 shows how diagram shown in Figure 1. That is, the matrix carries the state transitions the involved states(Easy, Ok, and Hard). For illustration, the P(E| the next exam to be Easy given tha Table State Previous Exam In Table 1, the sum of the probability values at each row is 1 as the the sum of the probabilities coming out of each node should be 1. Hence worthy topic that has many details. For example Computer Science & Information Technology (CS & IT) ioinformatics. Reference [3] lists the five greatest applicati Markov chains that include Scherr’s application to computer performance evaluation, Brin and Page’s application to PageRank and Web Search, Baum’s application to HMM application to information theory, and Markov’s application to Eugeny Onegin. This paper is organized as follows. The next section presents a background of Markov chains highlights the main concepts of HMM followed by a literature review of ection 4. Finally, we conclude in section 5. Markov chains are quite useful in modeling computational linguistics. A Markov chain is a model that describes the behaviour of an integer-valued random process. The behaviour is the simple form of dependency in which the next state (or event) depends only on the current state. According to [4], a random process is said to be Markov if the future of the process, given the present, is independent of the past. To describe the transitions between states, a diagram is used to describe the model and the probabilities of going from one state to another. For example, Figure 1 shows a Markov chain diagram with three states (Easy, Ok, and (i.e. states). In the figure, each arc represents the probability value for transition from one state to another. Figure 1. A Simple Markov chain with three states The Markov chain diagrams are generally represented using state transition matricesthat one state to another. Hence, a state transition matrix is created using the entire states in the system. For example, if a particular textual application has a (e.g. the size of lexicon), then the state transition matrix is described . In matrix A, the element aij denote the transition probability from Table 1 shows how the state transition matrix used to characterize the Markov diagram shown in Figure 1. That is, the matrix carries the state transitions probabilities between (Easy, Ok, and Hard). For illustration, the P(E|H) denote to the probability of the next exam to be Easy given that the previous exam was Hard. Table 1. A state transition matrix of three states Next Exam Easy (E) Ok (O) Hard (H) Easy (E) P(E|E) P(O|E) P(H|E) Ok (O) P(E|O) P(O|O) P(H|O) Hard (H) P(E|H) P(O|H) P(H|H) In Table 1, the sum of the probability values at each row is 1 as the the sum of the probabilities coming out of each node should be 1. Hence,P(E|E)+P(O|E)+P(P(H|E) equal 1. Markov chain is a worthy topic that has many details. For examples, it contains discrete-time, continuous ioinformatics. Reference [3] lists the five greatest applications of Markov chains that include Scherr’s application to computer performance evaluation, Brin and HMM, Shannon’s Markov chains highlights the main concepts of HMM followed by a literature review of linguistics. A Markov chain is a valued random process. hich the next state (or event) depends only on the current state. According to [4], a random process is said to be Markov if the future of the process, given the present, is independent of the past. To describe the transitions between states, a diagram is used to describe the model and the probabilities of going from one state to with three states (Easy, Ok, and probability value transition matricesthat denote Hence, a state transition matrix is created a particular textual application has a training is described by the element aij denote the transition probability from characterize the Markov probabilities between the probability of In Table 1, the sum of the probability values at each row is 1 as the the sum of the probabilities P(E|E)+P(O|E)+P(P(H|E) equal 1. Markov chain is a time, continuous-time,
  • 3. Computer Science & Information Technology (CS & IT) time-reversed, reversible, and irreducible case, also called ergodic, where it is possible to go from every state to every state. To illustrate a simple Markov chain used to create a transition matrix based on the are inspirational English quotes picked from [5]: (1) Power perceived is power achieved. Figure 2 shows the transition matrix of these quotes by counting the total number of occurrences of the adjacent two character sequences number of unique characters appeared in creating transition matrix is case insensitive where D is same as d, as an example. In addition, a space between two words discarded and shows that the maximum number in the matrix’s entries is 3 (a highlighted underlined value) which means that moving from character e to r (e this small corpus. The words that cont Figure 2. A transition matrix of Based on the information provided in the transition matrix shown in Figure 2. It is possible to answer some questions related to the give number of the two characterssequences appeared in the given data set characters sequences that did not characters sequences in the data set such as weather forecasting. Therefore, to the today’s weather. For example, if we have two states (Sunny, Rainy), and to find the probability P(Sunny|Rainy) provided in the probability transition matrix. banking industry. A big portfolio of banks is b classify loans to different states such as Good, Risky, and Bad loans. For simplicity, the information presented in Figure 2 shows t number of occurrences. Figure 3 shows Computer Science & Information Technology (CS & IT) , and irreducible Markov chains. The case shown in Figure 1 is ergodic, where it is possible to go from every state to every state. chain data model, a small data set contains two English sentences used to create a transition matrix based on the neighbouringcharacters sequences. The sentences are inspirational English quotes picked from [5]: Power perceived is power achieved. (2) If you come to a fork in the road, take it. Figure 2 shows the transition matrix of these quotes by counting the total number of occurrences of the adjacent two character sequences. It is a 19 × 19 matrix where the value 19 is the number of unique characters appeared in thesentences (i.e the two quotes). In this example, creating transition matrix is case insensitive where D is same as d, as an example. In addition, a discarded and not considered in the transition matrix. Figure 2 also shows that the maximum number in the matrix’s entries is 3 (a highlighted underlined value) which means that moving from character e to r (e r) is the most frequently sequence . The words that contains this sequence are :{ Power (two times) and Figure 2. A transition matrix of two characters sequences Based on the information provided in the transition matrix shown in Figure 2. It is possible to answer some questions related to the given data collection. Among inquires, what sequences appeared in the given data set?What that did not appear in the data collection?What is the least frequently in the data set? Accordingly, Markov chains are used as prediction systems . Therefore, it is possible to predict the tomorrow’s weather according to the today’s weather. For example, if we have two states (Sunny, Rainy), and the requirement is to find the probability P(Sunny|Rainy), Markov chains make it possible based on the information probability transition matrix. Another example of the using Markov chains is banking industry. A big portfolio of banks is based on loans. Therefore, Markov chain classify loans to different states such as Good, Risky, and Bad loans. For simplicity, the information presented in Figure 2 shows the transition matrixbased on total number of occurrences. Figure 3 shows the same information but using probabilities instead of 55 The case shown in Figure 1 is ergodic, where it is possible to go from every state to every state. English sentences characters sequences. The sentences If you come to a fork in the road, take it. Figure 2 shows the transition matrix of these quotes by counting the total number of occurrences 19 is the total . In this example, creating transition matrix is case insensitive where D is same as d, as an example. In addition, a ition matrix. Figure 2 also shows that the maximum number in the matrix’s entries is 3 (a highlighted underlined value) r) is the most frequently sequence appeared in ins this sequence are :{ Power (two times) andperceived}. Based on the information provided in the transition matrix shown in Figure 2. It is possible to collection. Among inquires, what is the total are the two is the least frequently two prediction systems it is possible to predict the tomorrow’s weather according the requirement is the information Markov chains is ased on loans. Therefore, Markov chainsare used to based on total the same information but using probabilities instead of
  • 4. 56 Computer Science & Information Technology (CS & IT) the number of occurrences. That is, i another. As previously indicated, the sum of entries at each row is equal 1. In Figure 3, any matrix entry that has 0 means that there is no transition at that case. Similarly, if the matrix entry is 1, it means that there is only one possible output of that state. For example, the character “o” comes after “y”, and this is the only possible arc of the stat Figure 3. A probability transition matrix of 3. HIDDEN MARKOV MODELS Hidden Markov models (HMM) fortemporal data modeling. However, the difference is that the directly observed while they are based on Figure 1 that shows athreeexam supposed that a student’s parents want to naturally, it is possible to recognize the exam as Easy or Ok if the possible to recognize the exam as Hard the required states (i.e. Easy, Ok, student’s reaction or feeling. Hence, the parents to know the hidden states. HMM is described using three matrices: the initial probability matrix, the observation probability matrix, and the state transition matrix. diagram that shows the states and the observations. In the figure, each arc represents the probability between the states and between the states and the observations. Figure 4. A HMM diagram with the transition and the observation arcs Computer Science & Information Technology (CS & IT) the number of occurrences. That is, it contains the probability of moving from one character to another. As previously indicated, the sum of entries at each row is equal 1. In Figure 3, any try that has 0 means that there is no transition at that case. Similarly, if the matrix entry is 1, it means that there is only one possible output of that state. For example, the character “o” comes after “y”, and this is the only possible arc of the state “y”. robability transition matrix of two characters sequences ODELS (HMM) is an extension to Markov chains models as both . However, the difference is that the states in Markov chain models they are hidden in the case of HMM.We explain the concept of HMM threeexam’s states Markov diagram. As a very simple example, parents want to know the levels (i.e the difficulty) of theirson’s it is possible to recognize the exam as Easy or Ok if the son feels Fine. Similarly, as Hard if the son looks Scared. From the parents’ point of view, Easy, Ok, or Hard) are hidden. However, they directly observe the student’s reaction or feeling. Hence, the parents might use the observed reactionas an indication HMM is described using three matrices: the initial probability matrix, the observation probability matrix, and the state transition matrix. Figure 4 shows a HMM diagram that shows the states and the observations. In the figure, each arc represents the bability between the states and between the states and the observations. HMM diagram with the transition and the observation arcs t contains the probability of moving from one character to another. As previously indicated, the sum of entries at each row is equal 1. In Figure 3, any try that has 0 means that there is no transition at that case. Similarly, if the matrix entry is 1, it means that there is only one possible output of that state. For example, the character “o” models as both used states in Markov chain models are We explain the concept of HMM As a very simple example, theirson’s exams, feels Fine. Similarly, it is the parents’ point of view, Hard) are hidden. However, they directly observe the reactionas an indication HMM is described using three matrices: the initial probability matrix, Figure 4 shows a HMM diagram that shows the states and the observations. In the figure, each arc represents the
  • 5. Computer Science & Information Technology (CS & IT) Based on the information provided in the matrices, either Baum Viterbi (also called best path) algorithms used to find the probability scores during recognition phase. Figure 5 shows the trellis diagram used to compute the recognitin probability of a sequence, Viterbi is used to find t sequence associated with the given observtatin, this procoss is also known as back Hence, after computing the observations sequence probability and finding the maximumprobability (supposed the star in Figure 5), the Viterbi to identify the states (sources) from which the observations sequence have been emitted. 5, the maximum probalities supposed to be Ok, Easy, Hard, respectively. Figure 5.Trellis diagram of three states HMM 4. LINGUISTIC APPLICATIONS In the literature, there are quite applications. Markov chain models and HMMs are of great interest to linguistic scholar who primarily work on data sequences. Even though this study focuses on linguistic applications, however, Markov chains used to model a variety of phenom are some of studies employed Markov chains. literature hastoo many studies employed Markov chains: The following two subsections include some of the theory. Linguistic applications topics recognition,speech emotion recognition classification, text summarization, optical character recognition (OCR), question answering,authorship attribution, [6] is a good reference as it demonstrates image processing, text and image compression, networking, signal processing, communications, software testing, genetics, bioinformatics, genome structure recognition, anomaly detection, tumour classification, water quality, epidemic spread, wind power, malicious and cyber physics, chemistry, mathematical biology, games, music, multimedia processing, business activities, frauds detection. Computer Science & Information Technology (CS & IT) Based on the information provided in the matrices, either Baum-Welch (also called any path) or led best path) algorithms used to find the probability scores during recognition shows the trellis diagram forexam states HMM. While Baum-Welch algorithm is ompute the recognitin probability of a sequence, Viterbi is used to find t sequence associated with the given observtatin, this procoss is also known as back Hence, after computing the observations sequence probability and finding the the star in Figure 5), the Viterbi algorithm leads the process back ) from which the observations sequence have been emitted. supposed to be achieved at the states shown using the Figure 5.Trellis diagram of three states HMM PPLICATIONS In the literature, there are quite many works on modelingcontent dependencies for linguistics applications. Markov chain models and HMMs are of great interest to linguistic scholar who primarily work on data sequences. Even though this study focuses on linguistic applications, Markov chains used to model a variety of phenomena in different fields. The following some of studies employed Markov chains. We intentionally ignored the references studies employed Markov chains: subsections include some of the linguistic studies that utilized Markov chain Linguistic applications topics mainly include (but not limited) speech recognition,part-of-speech tagging, machine translation, text classification, text summarization, optical character recognition (OCR), named entity recognition authorship attribution, etc.For the reader who interested in NLP, is a good reference as it demonstrates a thorough study of NLP (Almost) from Scratch. image processing, text and image compression, video segmentation ,forecasting, networking, signal processing, communications, software testing, genetics, bioinformatics, genome structure recognition, anomaly detection, tumour classification, water quality, epidemic spread, wind power, malicious and cyber-attack detection, traffic management, physics, chemistry, mathematical biology, games, music, multimedia processing, business 57 Welch (also called any path) or led best path) algorithms used to find the probability scores during recognition Welch algorithm is ompute the recognitin probability of a sequence, Viterbi is used to find the best-state sequence associated with the given observtatin, this procoss is also known as back-trakcing. Hence, after computing the observations sequence probability and finding the algorithm leads the process back ) from which the observations sequence have been emitted. In Figure shown using the dotted lines: for linguistics applications. Markov chain models and HMMs are of great interest to linguistic scholar who primarily work on data sequences. Even though this study focuses on linguistic applications, The following We intentionally ignored the references as the studies that utilized Markov chain (but not limited) speech , machine translation, text ity recognition, For the reader who interested in NLP, Reference NLP (Almost) from Scratch. forecasting, networking, signal processing, communications, software testing, genetics, bioinformatics, genome structure recognition, anomaly detection, tumour classification, water quality, , traffic management, physics, chemistry, mathematical biology, games, music, multimedia processing, business
  • 6. 58 Computer Science & Information Technology (CS & IT) 4.1. Markov chains based research The literature has a large number of studies that employ Markov chains for NLP applications. The following are some linguistic related applications. Reference [7] proposed a word-dividing algorithm based on statistical language models and Markov chain theory for Chinese speech processing. Reference [8] presented a semantic indexing Markov chains algorithm that uses both audio and visual information for event detection in soccer programs. Reference [9] investigated the use of Markov Chains and sequence kernels for the task of authorship attribution. Reference [10] implemented a probabilistic framework for support vector machine (SVM) that allows for automatic tuning of the penalty coefficient parameters and the kernel parameters via Markov chain for web searching via text categorization. Reference [11] demonstrated an automatic video annotation using multimodal Dirichlet process mixture model by collecting samples from the corresponding Markov chain. Reference [12] used a linguistic steganography detection method based on Markov chain models. Reference [13] showed how probabilistic Markov chain models can be used to detect topical structure in large text corpora. Reference [14] proposed a method of recognizing location names from Chinese texts based on Max-Margin Markov Network. Reference [15] utilized Markov chain and statistical language models in a linguistic steganography detection algorithm. Reference [16] proposed a Markov chain based algorithm for Chinese word segmentation. Reference [17] presented two new textual feature selection methods based on Markov chains rank aggregation techniques. Reference [18] proposed a Markov chain model for radical descriptors in Arabic Text Mining. Reference [19] presented statistical Markov chain models for the distributions of words in text lines. Reference [20] proposed a method for handwritten Chinese/Japanese text (character string) recognition based on semi-Markov conditional random fields (semi-CRFs). Reference [21] presented a Markov chain method to find authorship attribution on relational data between function words. Reference [22] utilized a probabilistic Markov chain model to infer the location of Twitter users. Reference [23] proposed a Markov chain based technique to determine the number of clusters of a corpus of short-text documents.Reference [24] proposed a Markov chain based method for digital document authentication. Reference [25] used Markov chain for authorship attribution in Arabic poetry. 4.2. Hidden Markov modelsbased research Linguistic HMM based research has been for long an active research area due to the rapid development in NLP applications. The literature has many studies as follows. Reference [26] proposed to extract acronyms and their meaning from unstructured text as a stochastic process using HMM. Reference [27] proposed a morphological segmentation method with HMM method for Mongolian.Reference [28] employed HMM for Arabic handwritten word recognition based on HMM. Reference [29] presented a scheme for off-line recognition of large-set handwritten characters in the framework of the first-order HMMs. Reference [30] proposed the use of hybrid HMM/Artificial Neural Network (ANN) models for recognizing unconstrained offline handwritten texts. Reference [31] used HMMs for recognizing Farsi handwritten words. Reference [32] describes recent advances in HMM based OCR for machine-printed Arabic documents.Reference [33] proposed a HMMbased method fornamed entity recognition. Reference [34] combined text classification and HMM techniques for structuring randomized clinical trial abstracts. Reference [35] employed HMM for medical text classification. Reference [36] propose text (sequences of pages) categorization architecture based on HMM.Reference [37] described a model for machine translation based on first-order HMM.Reference [38] introduced speech emotion recognition by use of HMM.Reference [39] presented a HMMbased method for speech emotion recognition. Reference [40] discussed the role of HMM in speech recognition. Reference [41] indicated that almost all present day large vocabulary continuous speech
  • 7. Computer Science & Information Technology (CS & IT) 59 recognition (LVCSR) systems based on HMMs.Reference [42] presented a text summarization method based on HMM. Reference [43] presented a method for summarizing speech documents using HMM. Reference [44] used HMM for part-of-speech tagging task. Reference [45] presented a second-order approximation of HMM for part-of-speech tagging task. 5. CONCLUSIONS This work demonstrates the potential and the size of Markov chains research. The study reveals that the Markov chain and HMM is of high important for linguistic applications. Similarly, Markov chains are also widely used in many other applications. For future work, it worthy to explore the power of Markov chain in new linguistic and scientific directions with more details. ACKNOWLEDGEMENTS This work is supported by Kuwait Foundation of Advancement of Science (KFAS), Research Grant Number P11418EO01 and Kuwait University Research Administration Research Project Number EO06/12. REFERENCES [1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257-286. [2] Markov_chain. (2016, August). Retrieved from https://en.wikipedia.org/wiki/Markov_chain [3] Von Hilgers, Philipp, and Amy N. Langville. "The five greatest applications of Markov Chains." Proceedings of the Markov Anniversary Meeting, Boston Press, Boston, MA. 2006. [4] Leon-Garcia, Alberto, and Alberto. Leon-Garcia. Probability, statistics, and random processes for electrical engineering. Upper Saddle River, NJ: Pearson/Prentice Hall, 2008. [5] California Indian Education. (2016, August). Retrieved from http://www.californiaindianeducation.org/inspire/world/ [6] Collobert, Ronan, et al. "Natural language processing (almost) from scratch."Journal of Machine Learning Research 12.Aug (2011): 2493-2537. [7] Bin, Tian, et al. "A Chinese word dividing algorithm based on statistical language models." Signal Processing, 1996., 3rd International Conference on. Vol. 1. IEEE, 1996. [8] Leonardi, Riccardo, PierangeloMigliorati, and Maria Prandini. "Semantic indexing of soccer audio- visual sequences: a multimodal approach based on controlled Markov chains." IEEE Transactions on Circuits and Systems for Video Technology 14.5 (2004): 634-643. [9] Sanderson, Conrad, and Simon Guenter. "On authorship attribution via Markov chains and sequence kernels." 18th International Conference on Pattern Recognition (ICPR'06). Vol. 3. IEEE, 2006. [10] Lim, Bresley Pin Cheong, et al. "Web search with text categorization using probabilistic framework of SVM." 2006 IEEE International Conference on Systems, Man and Cybernetics. Vol. 4. IEEE, 2006. [11] Velivelli, Atulya, and Thomas S. Huang. "Automatic video annotation using multimodal Dirichlet process mixture model." Networking, Sensing and Control, 2008. ICNSC 2008. IEEE International Conference on. IEEE, 2008.
  • 8. 60 Computer Science & Information Technology (CS & IT) [12] Chen, Zhi-li, et al. "Effective linguistic steganography detection." Computer and Information Technology Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference on. IEEE, 2008. [13] Dowman, Mike, et al. "A probabilistic model of meetings that combines words and discourse features." IEEE Transactions on Audio, Speech, and Language Processing 16.7 (2008): 1238-1248. [14] Li, Lishuang, Zhuoye Ding, and Degen Huang. "Recognizing location names from Chinese texts based on max-margin markov network." Natural Language Processing and Knowledge Engineering, 2008. NLP-KE'08. International Conference on. IEEE, 2008. [15] Meng, Peng, et al. "Linguistic steganography detection algorithm using statistical language model." Information Technology and Computer Science, 2009. ITCS 2009. International Conference on. Vol. 2. IEEE, 2009. [16] Baomao, Pang, and Shi Haoshan. "Research on improved algorithm for Chinese word segmentation based on Markov chain." Information Assurance and Security, 2009. IAS'09. Fifth International Conference on. Vol. 1. IEEE, 2009. [17] Wu, Ou, et al. "Rank aggregation based text feature selection." Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT'09. IEEE/WIC/ACM International Joint Conferences on. Vol. 1. IET, 2009. [18] El Hassani, Ibtissam, AbdelazizKriouile, and Youssef BenGhabrit. "Measure of fuzzy presence of descriptors on Arabic Text Mining." 2012 Colloquium in Information Science and Technology. IEEE, 2012. [19] Haji, Mehdi, et al. "Statistical Hypothesis Testing for Handwritten Word Segmentation Algorithms." Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on. IEEE, 2012. [20] Zhou, Xiang-Dong, et al. "Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields." IEEE transactions on pattern analysis and machine intelligence 35.10 (2013): 2413-2426. [21] Segarra, Santiago, Mark Eisen, and Alejandro Ribeiro. "Authorship attribution using function words adjacency networks." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013. [22] Rodrigues, Erica, et al. "Uncovering the location of Twitter users." Intelligent Systems (BRACIS), 2013 Brazilian Conference on. IEEE, 2013. [23] Goyal, Anil, Mukesh K. Jadon, and Arun K. Pujari. "Spectral approach to find number of clusters of short-text documents." Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2013 Fourth National Conference on. IEEE, 2013. [24] Shen, Jau Ji, and Ken Tzu Liu. "A Novel Approach by Applying Image Authentication Technique on a Digital Document." Computer, Consumer and Control (IS3C), 2014 International Symposium on. IEEE, 2014. [25] Ahmed, Al-Falahi, et al. "Authorship attribution in Arabic poetry." 2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA). IEEE, 2015. [26] Osiek, Bruno Adam, Geraldo Xexéo, and Luis Alfredo Vidal de Carvalho. "A language-independent acronym extraction from biomedical texts with hidden Markov models." IEEE Transactions on Biomedical Engineering 57.11 (2010): 2677-2688. [27] He, Miantao, Miao Li, and Lei Chen. "Mongolian Morphological Segmentation with Hidden Markov Model." Asian Language Processing (IALP), 2012 International Conference on. IEEE, 2012.
  • 9. Computer Science & Information Technology (CS & IT) 61 [28] Alma'adeed, Somaya, Colin Higgens, and Dave Elliman. "Recognition of off-line handwritten Arabic words using hidden Markov model approach." Pattern Recognition, 2002. Proceedings. 16th International Conference on. Vol. 3. IEEE, 2002. [29] Park, Hee-Seon, and Seong-Whan Lee. "Off-line recognition of large-set handwritten characters with multiple hidden Markov models." Pattern Recognition 29.2 (1996): 231-244. [30] Espana-Boquera, Salvador, et al. "Improving offline handwritten text recognition with hybrid HMM/ANN models." IEEE transactions on pattern analysis and machine intelligence 33.4 (2011): 767-779. [31] Imani, Zahra, et al. "offline Handwritten Farsi cursive text recognition using Hidden Markov Models." Machine Vision and Image Processing (MVIP), 2013 8th Iranian Conference on. IEEE, 2013. [32] Prasad, Rohit, et al. "Improvements in hidden Markov model based Arabic OCR." Pattern Recognition, 2008. ICPR 2008. 19th International Conference on. IEEE, 2008. [33] Zhou, GuoDong, and Jian Su. "Named entity recognition using an HMM-based chunk tagger." proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002. [34] Xu, Rong, et al. "Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts." AMIA. 2006. [35] Yi, Kwan, and JamshidBeheshti. "A hidden Markov model-based text classification of medical documents." Journal of Information Science (2008). [36] Frasconi, Paolo, Giovanni Soda, and Alessandro Vullo. "Hidden markov models for text categorization in multi-page documents." Journal of Intelligent Information Systems 18.2-3 (2002): 195-217. [37] Vogel, Stephan, Hermann Ney, and Christoph Tillmann. "HMM-based word alignment in statistical translation." Proceedings of the 16th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1996. [38] Schuller, Björn, Gerhard Rigoll, and Manfred Lang. "Hidden Markov model-based speech emotion recognition." Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP'03). 2003 IEEE International Conference on. Vol. 2. IEEE, 2003. [39] Nwe, Tin Lay, Say Wei Foo, and Liyanage C. De Silva. "Speech emotion recognition using hidden Markov models." Speech communication 41.4 (2003): 603-623. [40] Juang, Biing Hwang, and Laurence R. Rabiner. "Hidden Markov models for speech recognition." Technometrics 33.3 (1991): 251-272. [41] Gales, Mark, and Steve Young. "The application of hidden Markov models in speech recognition." Foundations and trends in signal processing 1.3 (2008): 195-304. [42] Conroy, John M., and Dianne P. O'leary. "Text summarization via hidden markov models." Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2001. [43] Maskey, Sameer, and Julia Hirschberg. "Summarizing speech without text using hidden markov models." Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 2006.
  • 10. 62 Computer Science & Information Technology (CS & IT) [44] Kupiec, Julian. "Robust part-of-speech tagging using a hidden Markov model." Computer Speech & Language 6.3 (1992): 225-242. [45] Thede, Scott M., and Mary P. Harper. "A second-order hidden Markov model for part-of-speech tagging." Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 1999.