The document presents a rule-based part of speech (POS) tagger for the Arabic language, named MTE tagger, which is developed using grammatical rules without the need for pre-tagged corpora. The MTE tagger was tested against the Stanford tagger, showing comparable results in accuracy (87.88% for MTE vs. 86.67% for Stanford), with significant speed advantages for the MTE tagger. The study highlights the challenges in Arabic NLP due to limited resources and the complexity of the Arabic language compared to more established European languages.
Related topics: