SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1146
Review of Topic Modeling and Summarization
Chinmay Patil[1], Parag Wayangankar[2], Pranay Yadav[3], Shweta Sharma[4]
[1] ,[2], [3]Student, [4] Professor, Department of Computer Engineering, Atharva College of Engineering, Mumbai
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Topic Modeling is a technique of unsupervised
machine learning which is used in discovering topics that
occur in a collection of documents. Latent Dirichlet Allocation
(LDA) is one of the most used algorithm for topic modeling. It
considers that documents are mixture of topics and eachtopic
is a mixture of different tokens or words. While considering
many documents, one can think that the topics extracted by
the LDA algorithm relate to all of the documents together. But
if we consider only one text document, if we try to extract
topics from it using LDA algorithm, we can say that these are
the keywords of the text document as it summarizes the entire
idea of the document in a concise form. This can be useful in
summarization of the document. Summarization with respect
to text is shortening of the text document such that it
highlights all the important pointsofthetextdocument. Inthis
paper, we represent a LDA model which helps to identify the
dominant topics in the textdocument, thenidentifiessentences
that reflect these dominant topicsandstichesthemtogetherto
formulate a human readable summary.
Key Words: Natural Language Processing, Text
Summarization, Latent Dirichlet Allocation, Topic Modeling
1.INTRODUCTION
As there is an ever-increasingamountofdata available,ithas
become important for extracting only important or only
meaningful information from this data sinceeverybitof data
is not useful. This is where topic modeling and
summarization can be of use. Due to the fact that the
algorithm we used here is unsupervised, it eliminates the
need for structured data to be provided to the model for it to
work Motivation for developing this is to reduce the time
required for reading or analyzing a text document. Text
documents come in a variety of form including newsreports,
Research papers, legal documents and many more, the task
can become tedious and some important information might
slip out if not done carefully. The advantage with such a
model doing the task is that one can decide the number of
topics or points one wants to discover in the text. Based on
that, the extraction would be done automatically, thus
reducing the time required for the same task is done
manually. Text summarization has two approaches namely
Abstractive and Extractive. We have chosen the extractive
summarization approach.
2. Literature Survey
Barde et al. [1] discusses various methods and tools usedfor
topic modeling with their features and limitation. Some of
the methods discussedareVectorSpaceModel (VSM),Latent
Semantic Indexing (LSI), Probabilistic Latent Semantic
Analysis (PLSA) and Latent Dirichlet Allocation(LDA).Some
tools discussed are Gensim, Standford topic modeling
toolbox, MALLET, BigARTM.
Surabhi Adhikari et al. [2] discusses different methods that
have been used for text summarization. Mainly, the paper
discusses two methods- Abstractive (ABS) and Extractive
(EXT) summarization. Also query based summarization is
discussed. The paper mostly discusses about the structured
based and semantic based approaches for summarization of
the text documents. Various datasets were used to test the
summaries produced by these models, such as the CNN
corpus, DUC2000, single and multiple text documents etc.
Kenli Li et al. [3] use Latent Dirichlet Allocation (LDA)
algorithm which is used to automatically generate text
corpora topics, and applied to sentences extraction based
multi-document summarization algorithms.Theapproachis
to combine the traditional summary generation algorithm
and the abstract generation algorithm based on deep
learning.
David Alfred Ostrowski [4] uses Latent Dirichlet Allocation
algorithm is used which is a generative probabilistic model
for a collection of discrete data. Evaluating this technique
from the perspective of classificationaswell asidentification
of noteworthy topics as it is applied to a filtered collection of
Twitter messages. Experimentsshowthatthesemethods are
effective for the identification of sub-topics as well as to
support classification within large-scale corpora.
Jinqiang Bian et al. [5] In their paper based on LDA Model, a
new method of sentence-ranking is proposed. The method
combines topic-distribution of each sentence with topic-
importance of the corpus together to calculate the posterior
probability of the sentence, and then, based on the posterior
probability, it selects sentences to form a summary. Topic-
distribution of each sentence represents the likelihood of
sentence belonging to each topic and topic-importance
represents the degree that the topics cover the significant
portion of the corpus. The method highlights the latent
topics and optimizes the summarization. Experimentresults
on the dataset DUC2006 show the advantage of the
multi document summarization algorithm proposed in the
paper document
J. N. Madhuri et al. [6] proposes a system for summarizing
documents using sentence ranking algorithms. Sentenceare
given weights and then ranked based on these weights. The
sentences with the highest rank areselectedinthesummary.
The sentences are ranked on the basis of the preprocessed
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1147
text and the weights are given by frequency of termsdivided
by the total number of terms in the document.
Shohreh Rad Rahimi et al. [7]exploresmanymethodsoftext
mining and text summarization. Text summarization can be
performed on the basis of various criteria. Some discussed
criteria are based on output summary, based on details,
based on contents, based on limitation, based on number of
input texts and based on language acceptance. It also
discusses various similarity measures which areusedintext
summarization
Hirohata et al. [8] presents automatic speechsummarization
techniques and its evaluation metrics. It mainly focuses on
sentence extraction based summarization methods for
making abstracts from some spontaneous presentations.
Some metrics that have been discussed are summarization
accuracy, sentence F-measure, ROUGE-n and some more.
Aditya Jain et al. [9] proposes a Neural Network based
approach for text summarization. The paper proposes an
approach to extract a good set of features followedby neural
network for supervised extractive summarization.Itassigns
a predictive score to each sentence and the sentences with
the highest predictive scores are added to the summary.
Liu Na et al. [10] present a system that use Latent Dirichlet
Allocation topic model for multi summarization. It extracts
title and content for each document provided and creates a
topic model for title and content. In the end it calculates
sentence weights according to the topic model and forms a
summary based on these sentence weights.
Mahsa Afsharizadeh [11] propose a technique of
summarization which is query oriented. Most important
sentences are extracted from the document based on a
feature extraction process where some features like
sentence length, normalized sentence length, sentence
position in the document, topic frequency etc. are used. 11
unique features are extracted. Based on these 11 every
sentence is scored and top ranked sentences are selectedfor
creating the summary.
Shweta Ganiger and K.M.M Rajashekhariah [12] discuss
implementation of some keyword extraction algorithms.
These algorithms were used to find how effective they are
when it comes to extracting important keywords from a
document. The 3 algorithms discussed here are TF-IDF
(Term frequency - Inverse Document Frequency), TextRank
and RAKE (Rapid Automatic Keyword Extraction).
3. CONCLUSIONS
Topic modeling and topic summarization are two important
tasks in natural language processing. With the help of LDA
algorithm for extracting keywords, the need for structured
data was eliminated which helped in reducing the time
required for creating the summary. Also, the extraction of
keywords or dominant topics can help in categorization
purpose which can increase the scope of the project where
suggestions can be made based on the similarity of different
topics with the given document.
4. ACKNOWLEDGEMENT
We owe a sincere thanks to our college Atharva College of
Engineering, especially our HeadofDepartment,Dr.Suvarna
Pansambal, our guide, Prof. Shweta Sharma for their kind
cooperation and guidance whichhelpedusinthe completion
of this project which would have seemed difficult without
their motivation, constant supportandvaluablesuggestions.
Moreover, the completion of this research would have been
impossible without the cooperation, suggestions andhelpof
our family and friends.
5. REFERENCES
[1] Bhagyashree Vyankatrao Barde and Anant
Madhavrao Bainwad, “An Overview of Topic
Modeling Methods and Tools”, International
Conference on Intelligent Computing and Control
Systems, ICICCS 2017.
[2] Rahul, Surabhi Adhikari, Monika, “NLP based
Machine Learning Approaches for Text
Summarization”, Proceedings of the Fourth
International Conference on Computing
Methodologies and Communication (ICCMC 2020).
[3] Ying Zhong, Zhuo Tang, Xiaofei Ding, Li Zhu,
Yuquan Le, Kenli Li, Keqin Li, “An Improved LDA
Multi-Document Summarization Model Based on
TensorFlow”, 2017 International Conference on
Tools with Artificial Intelligence.
[4] David Alfred Ostrowski, “Using Latent Dirichlet
Allocation for Topic Modelling in Twitter”,
Proceedings of the 2015 IEEE 9th International
Conference on Semantic Computing (IEEE ICSC
2015)
[5] Jinqiang Bian, Zengru Jiang, Qian Chen, “Research
On Multi-document Summarization Based On LDA
Topic Model”, 2014 Sixth International Conference
on Intelligent Human Machine Systems and
Cybernetics
[6] J. N. Madhuri and R. Ganesh Kumar, “Extractive
Text Summarization Using Sentence Ranking,”
2019 Int. Conf. Data Sci. Commun.
[7] Shohreh Rad Rahimi, Ali Toofanzadeh Mozhdehi,
Mohamad Abdolahi, “An Overview on Extractive
Text Summarizzation”, 2071 IEEE 4th International
Conference on knowledge-Based Engineering and
Innovation (KBEI)
[8] Hirohata, M., Shinnaka, Y., Iwano, K., & Furui, S.
(n.d.). “Sentence extraction-based presentation
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1148
summarization techniques and evaluationmetrics”,
Proceedings. (ICASSP ’05). IEEE International
Conference on Acoustics, Speech, and Signal
Processing, 2005.
[9] Jain, A., Bhatia, D., & Thakur, M. K. (2017),
“Extractive Text Summarization Using WordVector
Embedding”, 2017 International Conference on
Machine Learning and Data Science (MLDS).
[10] Na, L., Ming-xia, L., Ying, L., Xiao-jun, T.,
Haiwen, W.,  Peng, X. (2014), “Mixture of topic
model for multi-document summarization”, The
26th Chinese Control and Decision Conference
(2014 CCDC).
[11] Ebrahimpour-Komleh,H.,Afsharizadeh,M.,
 Bagheri, A. (2018), “Query-oriented text
summarization using sentence extraction
technique.”, 2018 4th International Conference on
Web Research (ICWR).
[12] K. M. M. Ganiger, S., and Rajashekharaiah,
(2018), “Comparative Study onKeywordExtraction
Algorithms for Single Extractive Document”, 2018
Second International Conference on Intelligent
Computing and Control Systems (ICICCS

More Related Content

PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
PDF
Hybrid model for extractive single document summarization: utilizing BERTopic...
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
PDF
Automatic Text Summarization: A Critical Review
PDF
IRJET- Automatic Text Summarization using Text Rank
PDF
Automatic Text Summarization
PDF
A Survey on Automatic Text Summarization
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
Hybrid model for extractive single document summarization: utilizing BERTopic...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
Automatic Text Summarization: A Critical Review
IRJET- Automatic Text Summarization using Text Rank
Automatic Text Summarization
A Survey on Automatic Text Summarization

Similar to Review of Topic Modeling and Summarization (20)

PDF
A Comparative Study of Automatic Text Summarization Methodologies
PDF
Automatic Text Summarization Using Natural Language Processing (1)
PDF
8 efficient multi-document summary generation using neural network
PPTX
topic modelling through LDA and bertopic model
PDF
IRJET - Text Summarizer.
PDF
IRJET- Automatic Recapitulation of Text Document
PDF
A hybrid approach for text summarization using semantic latent Dirichlet allo...
PPTX
Frontiers of Computational Journalism week 2 - Text Analysis
PPTX
Text summarization-with Extractive Text summarization techniques.pptx
PDF
IRJET- Automated Document Summarization and Classification using Deep Lear...
PDF
Text Summarization and Conversion of Speech to Text
PDF
NLP Based Text Summarization Using Semantic Analysis
PDF
I6 mala3 sowmya
ODP
Topic Modeling
PDF
TopicModels_BleiPaper_Summary.pptx
PPTX
Automatic keyword extraction.pptx
PDF
IRJET- PDF Extraction using Data Mining Techniques
PDF
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
PDF
A Text Mining Research Based on LDA Topic Modelling
PDF
Latent Topic-semantic Indexing based Automatic Text Summarization
A Comparative Study of Automatic Text Summarization Methodologies
Automatic Text Summarization Using Natural Language Processing (1)
8 efficient multi-document summary generation using neural network
topic modelling through LDA and bertopic model
IRJET - Text Summarizer.
IRJET- Automatic Recapitulation of Text Document
A hybrid approach for text summarization using semantic latent Dirichlet allo...
Frontiers of Computational Journalism week 2 - Text Analysis
Text summarization-with Extractive Text summarization techniques.pptx
IRJET- Automated Document Summarization and Classification using Deep Lear...
Text Summarization and Conversion of Speech to Text
NLP Based Text Summarization Using Semantic Analysis
I6 mala3 sowmya
Topic Modeling
TopicModels_BleiPaper_Summary.pptx
Automatic keyword extraction.pptx
IRJET- PDF Extraction using Data Mining Techniques
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A Text Mining Research Based on LDA Topic Modelling
Latent Topic-semantic Indexing based Automatic Text Summarization
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Construction Project Organization Group 2.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Well-logging-methods_new................
PPTX
Welding lecture in detail for understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
web development for engineering and engineering
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CYBER-CRIMES AND SECURITY A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Construction Project Organization Group 2.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT 4 Total Quality Management .pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
bas. eng. economics group 4 presentation 1.pptx
Digital Logic Computer Design lecture notes
Well-logging-methods_new................
Welding lecture in detail for understanding
Foundation to blockchain - A guide to Blockchain Tech
web development for engineering and engineering
Operating System & Kernel Study Guide-1 - converted.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

Review of Topic Modeling and Summarization

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1146 Review of Topic Modeling and Summarization Chinmay Patil[1], Parag Wayangankar[2], Pranay Yadav[3], Shweta Sharma[4] [1] ,[2], [3]Student, [4] Professor, Department of Computer Engineering, Atharva College of Engineering, Mumbai ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Topic Modeling is a technique of unsupervised machine learning which is used in discovering topics that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is one of the most used algorithm for topic modeling. It considers that documents are mixture of topics and eachtopic is a mixture of different tokens or words. While considering many documents, one can think that the topics extracted by the LDA algorithm relate to all of the documents together. But if we consider only one text document, if we try to extract topics from it using LDA algorithm, we can say that these are the keywords of the text document as it summarizes the entire idea of the document in a concise form. This can be useful in summarization of the document. Summarization with respect to text is shortening of the text document such that it highlights all the important pointsofthetextdocument. Inthis paper, we represent a LDA model which helps to identify the dominant topics in the textdocument, thenidentifiessentences that reflect these dominant topicsandstichesthemtogetherto formulate a human readable summary. Key Words: Natural Language Processing, Text Summarization, Latent Dirichlet Allocation, Topic Modeling 1.INTRODUCTION As there is an ever-increasingamountofdata available,ithas become important for extracting only important or only meaningful information from this data sinceeverybitof data is not useful. This is where topic modeling and summarization can be of use. Due to the fact that the algorithm we used here is unsupervised, it eliminates the need for structured data to be provided to the model for it to work Motivation for developing this is to reduce the time required for reading or analyzing a text document. Text documents come in a variety of form including newsreports, Research papers, legal documents and many more, the task can become tedious and some important information might slip out if not done carefully. The advantage with such a model doing the task is that one can decide the number of topics or points one wants to discover in the text. Based on that, the extraction would be done automatically, thus reducing the time required for the same task is done manually. Text summarization has two approaches namely Abstractive and Extractive. We have chosen the extractive summarization approach. 2. Literature Survey Barde et al. [1] discusses various methods and tools usedfor topic modeling with their features and limitation. Some of the methods discussedareVectorSpaceModel (VSM),Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation(LDA).Some tools discussed are Gensim, Standford topic modeling toolbox, MALLET, BigARTM. Surabhi Adhikari et al. [2] discusses different methods that have been used for text summarization. Mainly, the paper discusses two methods- Abstractive (ABS) and Extractive (EXT) summarization. Also query based summarization is discussed. The paper mostly discusses about the structured based and semantic based approaches for summarization of the text documents. Various datasets were used to test the summaries produced by these models, such as the CNN corpus, DUC2000, single and multiple text documents etc. Kenli Li et al. [3] use Latent Dirichlet Allocation (LDA) algorithm which is used to automatically generate text corpora topics, and applied to sentences extraction based multi-document summarization algorithms.Theapproachis to combine the traditional summary generation algorithm and the abstract generation algorithm based on deep learning. David Alfred Ostrowski [4] uses Latent Dirichlet Allocation algorithm is used which is a generative probabilistic model for a collection of discrete data. Evaluating this technique from the perspective of classificationaswell asidentification of noteworthy topics as it is applied to a filtered collection of Twitter messages. Experimentsshowthatthesemethods are effective for the identification of sub-topics as well as to support classification within large-scale corpora. Jinqiang Bian et al. [5] In their paper based on LDA Model, a new method of sentence-ranking is proposed. The method combines topic-distribution of each sentence with topic- importance of the corpus together to calculate the posterior probability of the sentence, and then, based on the posterior probability, it selects sentences to form a summary. Topic- distribution of each sentence represents the likelihood of sentence belonging to each topic and topic-importance represents the degree that the topics cover the significant portion of the corpus. The method highlights the latent topics and optimizes the summarization. Experimentresults on the dataset DUC2006 show the advantage of the multi document summarization algorithm proposed in the paper document J. N. Madhuri et al. [6] proposes a system for summarizing documents using sentence ranking algorithms. Sentenceare given weights and then ranked based on these weights. The sentences with the highest rank areselectedinthesummary. The sentences are ranked on the basis of the preprocessed
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1147 text and the weights are given by frequency of termsdivided by the total number of terms in the document. Shohreh Rad Rahimi et al. [7]exploresmanymethodsoftext mining and text summarization. Text summarization can be performed on the basis of various criteria. Some discussed criteria are based on output summary, based on details, based on contents, based on limitation, based on number of input texts and based on language acceptance. It also discusses various similarity measures which areusedintext summarization Hirohata et al. [8] presents automatic speechsummarization techniques and its evaluation metrics. It mainly focuses on sentence extraction based summarization methods for making abstracts from some spontaneous presentations. Some metrics that have been discussed are summarization accuracy, sentence F-measure, ROUGE-n and some more. Aditya Jain et al. [9] proposes a Neural Network based approach for text summarization. The paper proposes an approach to extract a good set of features followedby neural network for supervised extractive summarization.Itassigns a predictive score to each sentence and the sentences with the highest predictive scores are added to the summary. Liu Na et al. [10] present a system that use Latent Dirichlet Allocation topic model for multi summarization. It extracts title and content for each document provided and creates a topic model for title and content. In the end it calculates sentence weights according to the topic model and forms a summary based on these sentence weights. Mahsa Afsharizadeh [11] propose a technique of summarization which is query oriented. Most important sentences are extracted from the document based on a feature extraction process where some features like sentence length, normalized sentence length, sentence position in the document, topic frequency etc. are used. 11 unique features are extracted. Based on these 11 every sentence is scored and top ranked sentences are selectedfor creating the summary. Shweta Ganiger and K.M.M Rajashekhariah [12] discuss implementation of some keyword extraction algorithms. These algorithms were used to find how effective they are when it comes to extracting important keywords from a document. The 3 algorithms discussed here are TF-IDF (Term frequency - Inverse Document Frequency), TextRank and RAKE (Rapid Automatic Keyword Extraction). 3. CONCLUSIONS Topic modeling and topic summarization are two important tasks in natural language processing. With the help of LDA algorithm for extracting keywords, the need for structured data was eliminated which helped in reducing the time required for creating the summary. Also, the extraction of keywords or dominant topics can help in categorization purpose which can increase the scope of the project where suggestions can be made based on the similarity of different topics with the given document. 4. ACKNOWLEDGEMENT We owe a sincere thanks to our college Atharva College of Engineering, especially our HeadofDepartment,Dr.Suvarna Pansambal, our guide, Prof. Shweta Sharma for their kind cooperation and guidance whichhelpedusinthe completion of this project which would have seemed difficult without their motivation, constant supportandvaluablesuggestions. Moreover, the completion of this research would have been impossible without the cooperation, suggestions andhelpof our family and friends. 5. REFERENCES [1] Bhagyashree Vyankatrao Barde and Anant Madhavrao Bainwad, “An Overview of Topic Modeling Methods and Tools”, International Conference on Intelligent Computing and Control Systems, ICICCS 2017. [2] Rahul, Surabhi Adhikari, Monika, “NLP based Machine Learning Approaches for Text Summarization”, Proceedings of the Fourth International Conference on Computing Methodologies and Communication (ICCMC 2020). [3] Ying Zhong, Zhuo Tang, Xiaofei Ding, Li Zhu, Yuquan Le, Kenli Li, Keqin Li, “An Improved LDA Multi-Document Summarization Model Based on TensorFlow”, 2017 International Conference on Tools with Artificial Intelligence. [4] David Alfred Ostrowski, “Using Latent Dirichlet Allocation for Topic Modelling in Twitter”, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) [5] Jinqiang Bian, Zengru Jiang, Qian Chen, “Research On Multi-document Summarization Based On LDA Topic Model”, 2014 Sixth International Conference on Intelligent Human Machine Systems and Cybernetics [6] J. N. Madhuri and R. Ganesh Kumar, “Extractive Text Summarization Using Sentence Ranking,” 2019 Int. Conf. Data Sci. Commun. [7] Shohreh Rad Rahimi, Ali Toofanzadeh Mozhdehi, Mohamad Abdolahi, “An Overview on Extractive Text Summarizzation”, 2071 IEEE 4th International Conference on knowledge-Based Engineering and Innovation (KBEI) [8] Hirohata, M., Shinnaka, Y., Iwano, K., & Furui, S. (n.d.). “Sentence extraction-based presentation
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1148 summarization techniques and evaluationmetrics”, Proceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. [9] Jain, A., Bhatia, D., & Thakur, M. K. (2017), “Extractive Text Summarization Using WordVector Embedding”, 2017 International Conference on Machine Learning and Data Science (MLDS). [10] Na, L., Ming-xia, L., Ying, L., Xiao-jun, T., Haiwen, W., Peng, X. (2014), “Mixture of topic model for multi-document summarization”, The 26th Chinese Control and Decision Conference (2014 CCDC). [11] Ebrahimpour-Komleh,H.,Afsharizadeh,M., Bagheri, A. (2018), “Query-oriented text summarization using sentence extraction technique.”, 2018 4th International Conference on Web Research (ICWR). [12] K. M. M. Ganiger, S., and Rajashekharaiah, (2018), “Comparative Study onKeywordExtraction Algorithms for Single Extractive Document”, 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS