The document describes prediction models based on maximum stems (max-stems) for labeling text data with imbalanced labels. It discusses 4 episodes covering one-word based models, a combinatorial approach, hyperparameters, and advanced examinations. The first model (Model 1) predicts the label for a document based on the label that has the highest count of stems in the training data, or the second highest if multiple labels have a count of zero. Stemming and using max-stems aims to handle derivation and inflection in agglutinative languages.
Related topics: