This document describes a summarization system that uses machine learning to determine optimal configurations of modules for text summarization. The system combines existing natural language processing software modules to perform tasks like segmentation, keyphrase extraction, and phrase matching. It generates summaries by extracting representative sentences from documents. The system is evaluated by comparing automatically generated summaries to model summaries created by humans. Machine learning is able to improve the quality of summaries by discovering which configuration of system modules performs best for different types of documents or lexical features. The system was tested on summarization tasks in the Document Understanding Conferences of 2001 and 2002.
Related topics: