SlideShare a Scribd company logo
Introduction to Automatic
keyword extraction for
Text Summarization
Presented by
Biswarup Das
Roll-092217 No.-02220630
9th semester
Under the guidance of Dr. Rakesh Kumar, Assistant Professor,
Department of Computer Science,
Assam University, Silchar.
Contents:
 Introduction
 Objective
 Different Methods
 Keyword Extraction
 Text Summarization
 Problems
 Conclusion
 Future Work
 References
Introduction:-
• In the era of internet, plenty of online information is freely available for
readers in the form of e-Newspapers, journal articles, transcription dialogues
etc. There is a need for an automated system that can extract only relevant
information from these data sources. To achieve this, we need to mine the
text from the documents.
• Summarization is a process where the most salient features of a text are
extracted and compiled into a short abstract of the original document. The
text summarization can be achieved in two ways namely, abstractive
summary and extractive summary.
Objective:-
• The main objective of automatic text summarization is
presenting the source text into a shorter version with
semantics.
• The main advantage of using summary is, it reduces time.
Different Methods for keyword Extraction:-
• Simple Statistical methods:- This method is very simple in nature, and works without
the training dataset that means we can say that it is an unsupervised type.
• Linguistic Method:- In these type of methods, it uses semantic features of the
keyword detection and further used for extraction.
• Machine learning Approach:- This method is achieved by either supervised and
unsupervised learnings, but it is mainly preferred by supervised approach.
• Hybrid Approach:- This type of approach includes the combination above two
methods by using the heuristics values.
Proposed methods for text Summarization:-
• Single document Text summarization:- In these type of summarization,
it takes a single document as an input for summarization and gives the
single output document.
• Multiple Document Text Summarization:- In these types of
summarization, it takes various documents as an input and gives a single
output document.
• Query-Based Text Summarization:-In this summarization technique, a
particular portion of a document is utilized and extract the useful
keywords to make the summary of that document.
• Extractive Text Summarization:- In the Extractive based
summarization method it selects the information from the
document as exactly it appears in the source based to form the
summary.
• For this task, the term sentence scoring system is enforced based
on the sentence features.
• It assigns a score to each sentence based on their features and
ranks the sentences according to their score.
Methods for Extractive text summarization:-
• Term Frequency – Inverse Document Frequency Method:
• These are basically are numerical statistics to determine how a
word is important in a given document.
• Term-frequency can be defined as number of terms occur in a
document.
• Whereas IDF can be defined as to decrease and increase of
weights of a term which occur rarely and frequently respectively.
Text summarization using Neural Network:-
• A neural network can be defined as a processing system
modeled on the human brain that tries to execute its learning
process.
• Neural network is trained with sentences in paragraph where
each sentences is checked in order to be included in
summary or not.
• There are basically three phases in this process: neural
network training, feature fusion, and sentence selection.
Graph-based Method:-
• The graph based method is an innovative unsupervised method.
• The every sentence of the document is considered as a vertex and
sentence are connected with the edge if there exist any relation and
with this the connecting edge is given weight.
• There are two types of graphs are directed and undirected where
directed graph represents the flow of text and undirected graph
represent an edge captures the relation using co-occurrence of term.
• Abstractive Text Summarization:- In this procedure, a
machine must need to grasp the idea of all the documents which
are being used as input and then it produces summary for a
particular given sentence.
• It basically uses semantic methods to examine and interpret the
particular text.
• Abstractive text summarization broadly divided into two types:
Structure based and Semantic based.
• Structure Based Approach:-
• Structure based approach translates most important
information from the document through cognitive
schemas such as tree, ontology, lead and body phrase
method.
Tree Based Method:-
• This method uses dependency tree in order to represent the
whole text document.
• At first the source text is represented as dependency trees,
which are subsequently consolidated.
• Finally the merged dependency tree is turned into a
sentence, which is referred to as the fused sentence.
Lead and Body Phrase Method:-
• In this method the main sentences are rewritten using the lead and body phrase
method by inserting and substituting phrases.
• Parsing degrades performance, and there is no standard model for
summarization, which is one of the key disadvantages of Lead and body phrase.
• Ishikawa et.al proposed a hybrid summarization method that combined the TF
and LEAD methods.
Graph-Based Method:-
• This system's unique feature is that each node represents a word unit, which
represents the structure of sentences for directed edges.
• Opinosis, a framework that generates compact abstractive summaries of
extremely redundant opinions.
• The model creates an abstractive summary by repeatedly searching the
Opinosis graph for sub-graphs encoding a valid sentence with high redundancy
scores.
Semantic Based Approach:-
• The semantic representation of the document is given to the
natural language generation (NLG) system in the semantic
based method.
• It is done by analysing linguistic data, this strategy focuses
on recognising noun and verb phrases.
Methods on semantic based approach:-
• Multimodal Semantic Model:- This model is built to capture
links between concepts.
• The most essential concepts are graded on a scale of one to ten, and
a single notion is expressed as a summary.
• Greenbecker et.al presented three stage technique namely, ontology
creation, information density matrix, and lastly summary generation.
Information item-based Method:-
• An abstract representation of the source document is used
to create a summary in this approach.
• The strategy based on information items delivers fewer
redundant and concise summaries.
• So the question is “How this method works”?
Figure 1 will show how our method compares to other options in terms of workflow.
Semantic Graph Based Method:-
• In this method the input document is represented
semantically using semantic graph.
• The noun and verb from the sentences are represented
as graph nodes, with edges defining the relationship
between them.
• To represent the semantic of a source document,
Moawad et.al created a semantic graph called rich
semantic graph.
The proposed Architecture:-
The Rich Semantic graph creation phase:-
Problems:-
• Topic Identification.
• Interpretation.
• Summary generation.
• And evaluation of Generated summary.
Conclusion:-
• Extractive and abstractive summarization techniques are
discussed in this presentation. The summarization system
should generate a useful summary in a short amount of time,
with minimal redundancy and grammatically correct sentences.
According to the context in which they are used, both extractive
and abstractive methods produce good results.
• Although extractive text summarization is simpler to implement,
it has a number of drawbacks that can lead to ambiguity and
miscommunication. Abstractive summarization can produce
more relevant and precise summaries.
Future Work:-
• There are still a few issues to solve, which will be
addressed in future work. Clearly, the parsing accuracy
that degraded the sentential completeness in our
experiments needs to be improved.
• Our main objective is to work on automatic text
summarisation using Extractive method, because to
create a summary, extractive summarization selects a
subset of sentences from the text.
References:-
• C. Zhang, “Automatic keyword extraction from documents using conditional random
fields,” Journal of Computational Information Systems, vol. 4 (3), 2008, pp. 1169-
1180.
• S.Mandal, Girish.K.Singh and Anita Pal “A Hybrid Text Summarization Approach”,
Journal of informatics and mathematical Sciences.
• C. Greenbacker, “Towards a framework for abstractive summarization of multimodal
documents,” in 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, pp. 75–80, ACL, jun 2011.
• K. Ishikawa, S. Ando, and A. Okumura, “Hybrid Text Summarization Method based
on the TF Method and the LEAD Method,” in In Proceedings of the 2nd National
Institute of Informatics Test Collection Information Retrieval (NTCIR) Workshop, no.
1, pp. 5–219, 2001.
References:-
• E. Hovy, C.-Y. Lin, “Automated text summarization and the summarist system,” in:
Proceedings of a workshop on held at Baltimore, ACL, 1998, pp. 197-214.
• A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,”
in: Proceedings of the 2003 conference on Empirical methods in natural language
processing, ACL, 2003, pp. 216-223.
• V. R. Embar, S. R. Deshpande, A. K. Vaishnavi, V. Jain, and J. S. Kallimani,
“SArAmsha - A Kannada abstractive summarizer,” in International Conference on
Advances in Computing, Communications and Informatics, pp. 540–544, IEEE, aug
2013.
• C. Lee, Z. Jian, and L. Huang, “A Fuzzy Ontology and Its Application to News
Summarization,” IEEE Transactions on Systems, Man and Cybernetics, Part B
(Cybernetics), vol. 35, pp. 859–880, oct 2005.
THANK YOU

More Related Content

PDF
K0936266
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
PDF
AbstractiveSurvey of text in today timef
PDF
Automatic Text Summarization Using Natural Language Processing (1)
PPTX
Comparative Analysis of Text Summarization Techniques
PDF
A hybrid approach for text summarization using semantic latent Dirichlet allo...
K0936266
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
AbstractiveSurvey of text in today timef
Automatic Text Summarization Using Natural Language Processing (1)
Comparative Analysis of Text Summarization Techniques
A hybrid approach for text summarization using semantic latent Dirichlet allo...

Similar to Keyword_extraction.pptx (20)

PDF
Review of Topic Modeling and Summarization
PDF
IRJET- A Survey Paper on Text Summarization Methods
PDF
Summarization of Software Artifacts : A Review
PDF
Summarization of Software Artifacts : A Review
PDF
An automatic text summarization using lexical cohesion and correlation of sen...
PPT
Query based summarization
PPT
Query based summarization
PPT
Query Based Summarization
PDF
Automatic Text Summarization
PDF
8 efficient multi-document summary generation using neural network
PDF
Text summarization
PDF
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
PDF
Conceptual framework for abstractive text summarization
PPTX
3__Python - Tool Text summarization.pptx
PDF
Automatic Text Summarization: A Critical Review
PDF
A template based algorithm for automatic summarization and dialogue managemen...
PDF
The International Journal of Engineering and Science (IJES)
PDF
A Survey of Various Methods for Text Summarization
PDF
H04564550
PDF
A Comparative Study of Automatic Text Summarization Methodologies
Review of Topic Modeling and Summarization
IRJET- A Survey Paper on Text Summarization Methods
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
An automatic text summarization using lexical cohesion and correlation of sen...
Query based summarization
Query based summarization
Query Based Summarization
Automatic Text Summarization
8 efficient multi-document summary generation using neural network
Text summarization
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
Conceptual framework for abstractive text summarization
3__Python - Tool Text summarization.pptx
Automatic Text Summarization: A Critical Review
A template based algorithm for automatic summarization and dialogue managemen...
The International Journal of Engineering and Science (IJES)
A Survey of Various Methods for Text Summarization
H04564550
A Comparative Study of Automatic Text Summarization Methodologies
Ad

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
PDF
Modernizing your data center with Dell and AMD
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Monthly Chronicles - July 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
Modernizing your data center with Dell and AMD
MYSQL Presentation for SQL database connectivity
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Ad

Keyword_extraction.pptx

  • 1. Introduction to Automatic keyword extraction for Text Summarization Presented by Biswarup Das Roll-092217 No.-02220630 9th semester Under the guidance of Dr. Rakesh Kumar, Assistant Professor, Department of Computer Science, Assam University, Silchar.
  • 2. Contents:  Introduction  Objective  Different Methods  Keyword Extraction  Text Summarization  Problems  Conclusion  Future Work  References
  • 3. Introduction:- • In the era of internet, plenty of online information is freely available for readers in the form of e-Newspapers, journal articles, transcription dialogues etc. There is a need for an automated system that can extract only relevant information from these data sources. To achieve this, we need to mine the text from the documents. • Summarization is a process where the most salient features of a text are extracted and compiled into a short abstract of the original document. The text summarization can be achieved in two ways namely, abstractive summary and extractive summary.
  • 4. Objective:- • The main objective of automatic text summarization is presenting the source text into a shorter version with semantics. • The main advantage of using summary is, it reduces time.
  • 5. Different Methods for keyword Extraction:- • Simple Statistical methods:- This method is very simple in nature, and works without the training dataset that means we can say that it is an unsupervised type. • Linguistic Method:- In these type of methods, it uses semantic features of the keyword detection and further used for extraction. • Machine learning Approach:- This method is achieved by either supervised and unsupervised learnings, but it is mainly preferred by supervised approach. • Hybrid Approach:- This type of approach includes the combination above two methods by using the heuristics values.
  • 6. Proposed methods for text Summarization:- • Single document Text summarization:- In these type of summarization, it takes a single document as an input for summarization and gives the single output document. • Multiple Document Text Summarization:- In these types of summarization, it takes various documents as an input and gives a single output document. • Query-Based Text Summarization:-In this summarization technique, a particular portion of a document is utilized and extract the useful keywords to make the summary of that document.
  • 7. • Extractive Text Summarization:- In the Extractive based summarization method it selects the information from the document as exactly it appears in the source based to form the summary. • For this task, the term sentence scoring system is enforced based on the sentence features. • It assigns a score to each sentence based on their features and ranks the sentences according to their score.
  • 8. Methods for Extractive text summarization:- • Term Frequency – Inverse Document Frequency Method: • These are basically are numerical statistics to determine how a word is important in a given document. • Term-frequency can be defined as number of terms occur in a document. • Whereas IDF can be defined as to decrease and increase of weights of a term which occur rarely and frequently respectively.
  • 9. Text summarization using Neural Network:- • A neural network can be defined as a processing system modeled on the human brain that tries to execute its learning process. • Neural network is trained with sentences in paragraph where each sentences is checked in order to be included in summary or not. • There are basically three phases in this process: neural network training, feature fusion, and sentence selection.
  • 10. Graph-based Method:- • The graph based method is an innovative unsupervised method. • The every sentence of the document is considered as a vertex and sentence are connected with the edge if there exist any relation and with this the connecting edge is given weight. • There are two types of graphs are directed and undirected where directed graph represents the flow of text and undirected graph represent an edge captures the relation using co-occurrence of term.
  • 11. • Abstractive Text Summarization:- In this procedure, a machine must need to grasp the idea of all the documents which are being used as input and then it produces summary for a particular given sentence. • It basically uses semantic methods to examine and interpret the particular text. • Abstractive text summarization broadly divided into two types: Structure based and Semantic based.
  • 12. • Structure Based Approach:- • Structure based approach translates most important information from the document through cognitive schemas such as tree, ontology, lead and body phrase method.
  • 13. Tree Based Method:- • This method uses dependency tree in order to represent the whole text document. • At first the source text is represented as dependency trees, which are subsequently consolidated. • Finally the merged dependency tree is turned into a sentence, which is referred to as the fused sentence.
  • 14. Lead and Body Phrase Method:- • In this method the main sentences are rewritten using the lead and body phrase method by inserting and substituting phrases. • Parsing degrades performance, and there is no standard model for summarization, which is one of the key disadvantages of Lead and body phrase. • Ishikawa et.al proposed a hybrid summarization method that combined the TF and LEAD methods.
  • 15. Graph-Based Method:- • This system's unique feature is that each node represents a word unit, which represents the structure of sentences for directed edges. • Opinosis, a framework that generates compact abstractive summaries of extremely redundant opinions. • The model creates an abstractive summary by repeatedly searching the Opinosis graph for sub-graphs encoding a valid sentence with high redundancy scores.
  • 16. Semantic Based Approach:- • The semantic representation of the document is given to the natural language generation (NLG) system in the semantic based method. • It is done by analysing linguistic data, this strategy focuses on recognising noun and verb phrases.
  • 17. Methods on semantic based approach:- • Multimodal Semantic Model:- This model is built to capture links between concepts. • The most essential concepts are graded on a scale of one to ten, and a single notion is expressed as a summary. • Greenbecker et.al presented three stage technique namely, ontology creation, information density matrix, and lastly summary generation.
  • 18. Information item-based Method:- • An abstract representation of the source document is used to create a summary in this approach. • The strategy based on information items delivers fewer redundant and concise summaries. • So the question is “How this method works”?
  • 19. Figure 1 will show how our method compares to other options in terms of workflow.
  • 20. Semantic Graph Based Method:- • In this method the input document is represented semantically using semantic graph. • The noun and verb from the sentences are represented as graph nodes, with edges defining the relationship between them. • To represent the semantic of a source document, Moawad et.al created a semantic graph called rich semantic graph.
  • 22. The Rich Semantic graph creation phase:-
  • 23. Problems:- • Topic Identification. • Interpretation. • Summary generation. • And evaluation of Generated summary.
  • 24. Conclusion:- • Extractive and abstractive summarization techniques are discussed in this presentation. The summarization system should generate a useful summary in a short amount of time, with minimal redundancy and grammatically correct sentences. According to the context in which they are used, both extractive and abstractive methods produce good results. • Although extractive text summarization is simpler to implement, it has a number of drawbacks that can lead to ambiguity and miscommunication. Abstractive summarization can produce more relevant and precise summaries.
  • 25. Future Work:- • There are still a few issues to solve, which will be addressed in future work. Clearly, the parsing accuracy that degraded the sentential completeness in our experiments needs to be improved. • Our main objective is to work on automatic text summarisation using Extractive method, because to create a summary, extractive summarization selects a subset of sentences from the text.
  • 26. References:- • C. Zhang, “Automatic keyword extraction from documents using conditional random fields,” Journal of Computational Information Systems, vol. 4 (3), 2008, pp. 1169- 1180. • S.Mandal, Girish.K.Singh and Anita Pal “A Hybrid Text Summarization Approach”, Journal of informatics and mathematical Sciences. • C. Greenbacker, “Towards a framework for abstractive summarization of multimodal documents,” in 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 75–80, ACL, jun 2011. • K. Ishikawa, S. Ando, and A. Okumura, “Hybrid Text Summarization Method based on the TF Method and the LEAD Method,” in In Proceedings of the 2nd National Institute of Informatics Test Collection Information Retrieval (NTCIR) Workshop, no. 1, pp. 5–219, 2001.
  • 27. References:- • E. Hovy, C.-Y. Lin, “Automated text summarization and the summarist system,” in: Proceedings of a workshop on held at Baltimore, ACL, 1998, pp. 197-214. • A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,” in: Proceedings of the 2003 conference on Empirical methods in natural language processing, ACL, 2003, pp. 216-223. • V. R. Embar, S. R. Deshpande, A. K. Vaishnavi, V. Jain, and J. S. Kallimani, “SArAmsha - A Kannada abstractive summarizer,” in International Conference on Advances in Computing, Communications and Informatics, pp. 540–544, IEEE, aug 2013. • C. Lee, Z. Jian, and L. Huang, “A Fuzzy Ontology and Its Application to News Summarization,” IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol. 35, pp. 859–880, oct 2005.