The document discusses using support vector machines (SVM) for automatic document categorization. It proposes using an SVM trained on a collection of documents that have been manually categorized into fields and groups. Documents are represented as sparse vectors of words and their TF-IDF weights. An SVM is trained for each category on a subset of documents. The trained SVMs are then used to categorize new documents by predicting the likelihood they belong to each category. The method achieved good recall and precision on test documents from several sample categories. Improvements and future work expanding the approach are also discussed.
Related topics: