SlideShare a Scribd company logo
Text
analytics
TEXT
SUMMARIZATION
Text summarization in Natural Language Processing
• Is a process of condensing and presenting the essential information from a source text while retaining its key
ideas and meaning.
• There are two main types of text summarization: extractive and abstractive
1. Extractive Summarization
• Selects and extracts sentences or phrases directly from the source text.
• Uses ranking algorithms to identify the most important sentences.
• Typically relies on sentence scoring based on features like sentence length, word frequency, and
position in the document.
• Preserves the original wording of the selected sentences.
2. Abstractive Summarization:
• Generates a summary by paraphrasing and rephrasing the content, often using natural language generation
techniques.
• Requires a deeper understanding of the content and context.
• Involves the creation of new sentences that convey the main ideas of the source text.
• Can potentially produce more concise and coherent summaries.
Components and Techniques:
• a. Sentence Extraction:
o Identifying the most important sentences using features like term frequency, sentence position, and
more.
• b. Sentence Compression:​
o Reducing the length of sentences while preserving the core information.​
• ​
c. Named Entity Recognition (NER):
o Identifying and preserving important entities in the summary.
• d. Coreference Resolution:
o Ensuring consistent references to entities and concepts throughout the summary.
Code example in python
Continuation of above code
• # Example text for processing
• input_text = """
• Natural language processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the
interaction between computers and humans through natural language. NLP techniques enable
computers to interpret, understand, and generate human language in a meaningful way.
• Sentence compression is a technique that involves shortening sentences while retaining their
core meaning. It helps in creating more concise and focused content.
• Named Entity Recognition (NER) is a process of identifying and classifying entities such as names
of people, locations, organizations, and more in a given text. Coreference resolution is the task of
determining when two or more expressions in a text refer to the same entity. It helps in
understanding the relationships between different parts of the text. In conclusion, NLP
techniques like sentence compression, NER, and coreference resolution contribute to the
efficiency and understanding of natural language for various applications. """
3__Python - Tool Text summarization.pptx
Challenges in Text Summarization
1. Ambiguity:
Resolving ambiguity in language, where words or phrases may have multiple meanings.
2. Coherence:
Ensuring that the summary maintains a coherent and logical flow of ideas.
3. Information Loss:
Balancing the reduction of information with the retention of key concepts.
4. Handling Diverse Content:
Summarizing content from various domains and languages.
Applications of Text Summarization
• a. News Articles:
• Generating concise summaries of news articles for quick information retrieval.
• b. Legal Documents:
• Extracting key points from lengthy legal documents for efficient review.
• c. Academic Papers:
• Summarizing research papers to quickly understand the main contributions.
• d. Conversational Agents:
• Providing succinct responses in chatbots or virtual assistants.
• e. Document Summarization:
• Condensing lengthy documents for easier comprehension.
Popular Text Summarization Models
• . Extractive:
o Latent Semantic Analysis (LSA): Applies singular value decomposition to identify sentence
relationships.
o TextRank: Uses a graph-based ranking algorithm to identify key sentences.
o BERTSUM: Adapts BERT (Bidirectional Encoder Representations from Transformers) for extractive
summarization.
• Abstractive:
o Pointer-Generator Networks: Combines extractive and abstractive approaches by pointing to words in
the source text.
• Transformer-based Models: Models like GPT-3 and T5 have demonstrated strong performance in abstractive
summarization.
Evaluation Metrics
• ROUGE (Recall-Oriented Understudy for Gisting Evaluation):
• Measures overlap in n-grams (unigrams, bigrams, etc.) between the generated summary and reference
summaries.
• BLEU (Bilingual Evaluation Understudy):
• Measures the precision of n-grams in the generated summary compared to reference summaries.
• Future Directions:
• a. Multimodal Summarization:
• Integrating information from text, images, and other modalities for more comprehensive summaries.
• b. Context-aware Summarization:
• Considering broader contextual information to generate more informed summaries.
• Text summarization in NLP continues to be an active area of research and development, with ongoing
efforts to improve the quality, coherence, and adaptability of automated summarization systems across
various domains and languages.

More Related Content

PPTX
Networking lesson 4 chaoter 1 Module 4-1.pptx
PPTX
Final-speech based text summarizers.pptx
PDF
K0936266
PDF
INTRODUCTION TO Natural language processing
PDF
Understanding Natural Languange with Corpora-based Generation of Dependency G...
PPTX
Chapter 2 - Text operations Information retrieval ch2
PDF
Y24168171
PDF
A template based algorithm for automatic summarization and dialogue managemen...
Networking lesson 4 chaoter 1 Module 4-1.pptx
Final-speech based text summarizers.pptx
K0936266
INTRODUCTION TO Natural language processing
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Chapter 2 - Text operations Information retrieval ch2
Y24168171
A template based algorithm for automatic summarization and dialogue managemen...

Similar to 3__Python - Tool Text summarization.pptx (20)

PDF
AbstractiveSurvey of text in today timef
PPTX
Comparative Analysis of Text Summarization Techniques
PDF
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
DOCX
NLP Techniques for Text Summarization.docx
PDF
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
PDF
Abstract Overview.pdf (for getting more idea)
PPTX
Information storage and retrieval system unit two
PPTX
NLP todo
PDF
Mining Opinion Features in Customer Reviews
PPTX
Text summarization-with Extractive Text summarization techniques.pptx
PDF
A domain specific automatic text summarization using fuzzy logic
PDF
poiuytrewqasdfghjkloiuytrescvbjkl,mnbvcxzsdfghjklkjhgfdcvbnmnbvcxcvbn
PDF
Improvement of Text Summarization using Fuzzy Logic Based Method
PPTX
Keyword_extraction.pptx
PDF
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
PPTX
IRS-Cataloging and Indexing-2.1.pptx
PDF
A Survey of Various Methods for Text Summarization
PDF
Natural Language Processing (NLP).pdf
PDF
Text Mining at Feature Level: A Review
PPTX
Text-Summarization-using-Natural language processingP.pptx
AbstractiveSurvey of text in today timef
Comparative Analysis of Text Summarization Techniques
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
NLP Techniques for Text Summarization.docx
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Abstract Overview.pdf (for getting more idea)
Information storage and retrieval system unit two
NLP todo
Mining Opinion Features in Customer Reviews
Text summarization-with Extractive Text summarization techniques.pptx
A domain specific automatic text summarization using fuzzy logic
poiuytrewqasdfghjkloiuytrescvbjkl,mnbvcxzsdfghjklkjhgfdcvbnmnbvcxcvbn
Improvement of Text Summarization using Fuzzy Logic Based Method
Keyword_extraction.pptx
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
IRS-Cataloging and Indexing-2.1.pptx
A Survey of Various Methods for Text Summarization
Natural Language Processing (NLP).pdf
Text Mining at Feature Level: A Review
Text-Summarization-using-Natural language processingP.pptx
Ad

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Mega Projects Data Mega Projects Data
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Introduction to Business Data Analytics.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IB Computer Science - Internal Assessment.pptx
Introduction to Knowledge Engineering Part 1
Launch Your Data Science Career in Kochi – 2025
Clinical guidelines as a resource for EBP(1).pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
climate analysis of Dhaka ,Banglades.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Mega Projects Data Mega Projects Data
Moving the Public Sector (Government) to a Digital Adoption
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction to Business Data Analytics.
Ad

3__Python - Tool Text summarization.pptx

  • 2. Text summarization in Natural Language Processing • Is a process of condensing and presenting the essential information from a source text while retaining its key ideas and meaning. • There are two main types of text summarization: extractive and abstractive 1. Extractive Summarization • Selects and extracts sentences or phrases directly from the source text. • Uses ranking algorithms to identify the most important sentences. • Typically relies on sentence scoring based on features like sentence length, word frequency, and position in the document. • Preserves the original wording of the selected sentences. 2. Abstractive Summarization: • Generates a summary by paraphrasing and rephrasing the content, often using natural language generation techniques. • Requires a deeper understanding of the content and context. • Involves the creation of new sentences that convey the main ideas of the source text. • Can potentially produce more concise and coherent summaries.
  • 3. Components and Techniques: • a. Sentence Extraction: o Identifying the most important sentences using features like term frequency, sentence position, and more. • b. Sentence Compression:​ o Reducing the length of sentences while preserving the core information.​ • ​ c. Named Entity Recognition (NER): o Identifying and preserving important entities in the summary. • d. Coreference Resolution: o Ensuring consistent references to entities and concepts throughout the summary.
  • 5. Continuation of above code • # Example text for processing • input_text = """ • Natural language processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and humans through natural language. NLP techniques enable computers to interpret, understand, and generate human language in a meaningful way. • Sentence compression is a technique that involves shortening sentences while retaining their core meaning. It helps in creating more concise and focused content. • Named Entity Recognition (NER) is a process of identifying and classifying entities such as names of people, locations, organizations, and more in a given text. Coreference resolution is the task of determining when two or more expressions in a text refer to the same entity. It helps in understanding the relationships between different parts of the text. In conclusion, NLP techniques like sentence compression, NER, and coreference resolution contribute to the efficiency and understanding of natural language for various applications. """
  • 7. Challenges in Text Summarization 1. Ambiguity: Resolving ambiguity in language, where words or phrases may have multiple meanings. 2. Coherence: Ensuring that the summary maintains a coherent and logical flow of ideas. 3. Information Loss: Balancing the reduction of information with the retention of key concepts. 4. Handling Diverse Content: Summarizing content from various domains and languages.
  • 8. Applications of Text Summarization • a. News Articles: • Generating concise summaries of news articles for quick information retrieval. • b. Legal Documents: • Extracting key points from lengthy legal documents for efficient review. • c. Academic Papers: • Summarizing research papers to quickly understand the main contributions. • d. Conversational Agents: • Providing succinct responses in chatbots or virtual assistants. • e. Document Summarization: • Condensing lengthy documents for easier comprehension.
  • 9. Popular Text Summarization Models • . Extractive: o Latent Semantic Analysis (LSA): Applies singular value decomposition to identify sentence relationships. o TextRank: Uses a graph-based ranking algorithm to identify key sentences. o BERTSUM: Adapts BERT (Bidirectional Encoder Representations from Transformers) for extractive summarization. • Abstractive: o Pointer-Generator Networks: Combines extractive and abstractive approaches by pointing to words in the source text. • Transformer-based Models: Models like GPT-3 and T5 have demonstrated strong performance in abstractive summarization.
  • 10. Evaluation Metrics • ROUGE (Recall-Oriented Understudy for Gisting Evaluation): • Measures overlap in n-grams (unigrams, bigrams, etc.) between the generated summary and reference summaries. • BLEU (Bilingual Evaluation Understudy): • Measures the precision of n-grams in the generated summary compared to reference summaries. • Future Directions: • a. Multimodal Summarization: • Integrating information from text, images, and other modalities for more comprehensive summaries. • b. Context-aware Summarization: • Considering broader contextual information to generate more informed summaries. • Text summarization in NLP continues to be an active area of research and development, with ongoing efforts to improve the quality, coherence, and adaptability of automated summarization systems across various domains and languages.