This document describes a lemmatizer for the Russian language. It discusses the algorithm and technical details of the lemmatizer component. The lemmatizer is based on a combination of dictionary search, morphological analysis of unknown words, compound word analysis, number analysis, and rule-based analysis. Example output is provided to demonstrate lemmatization of Russian text.
Related topics: