SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 888
Text Optimization/Summarizer using Natural Language Processing
Mahesh Patil1, Mayur Pawar2, Yatin Rai3, Prof. Satish Kuchiwale4
1-3Student, Computer Engineering, SIGCE, Navi Mumbai, Maharashtra, India
4Asst. Professor, Computer Engineering, Smt. Indira Gandhi College of Engineering, Navi Mumbai, Maharashtra
-------------------------------------------------------------------------***------------------------------------------------------------------------
Abstract - Computers have become a major medium forthe
exchange of information. Hence, we have the increasing need
to do a thorough check of the information which the masses
intend to exchange. The need for accurate grammar and
spellings is not only a requirement for the formal business
conversations, but also for the formal conversations taking
place on the various platforms available for the same. The
project aims at building an intelligent system to optimize
English language. This system will perform functionslikeAuto
Completion, Summarization, spell check, Grammar check.
Key Words: Grammar, Business conversation, Optimize.
1. INTRODUCTION
Man’s quest for making machines as smart as he is will go on
forever. But the fact that we are able to design machines
which react to human language by speaking like humans and
passing the turing test with considerably fair amount of
results is worth appreciating as far as the man-machine
integration is concerned. The project aims at building
intelligent system to optimize English language.
It will perform the following tasks:
1. Grammar Optimization
2. Spellcheck
3. Summarization
4. Sentence Auto Completion
Computers have becomea major mediumfortheexchangeof
information. Hence, we have the increasing need to do a
thorough check of the information which the masses intend
to exchange. The need for accurate grammar and spellings is
not only a requirementfortheformalbusinessconversations,
but also for the formal conversations taking place on the
various platforms available for the same. The project aims at
building an intelligent system to optimize English language.
This system will perform functions like Auto Completion,
Summarization, Spell Check, Grammar Check
Let’s define the job of a spell checker and an autocorrector. A
word needs to be checked for spelling correctness and
corrected if necessary, many a time in the context of the
surrounding words. A spellchecker points to spelling errors
and possibly suggests alternatives. An autocorrector usually
goes a step further and automatically picks the most likely
word. In case of the correct word already having been typed,
the same is retained. So, in practice, an autocorrect is a bit
more aggressive than a spellchecker, but this is more of an
implementation detail — tools allow you to configure the
behaviour. There is not much difference between the two in
theory. So, the discussion in the rest of the blog post applies
to both.
Grammar checkers accepts input in form of documents,
paragraphs or sentence. However it then break down input
into unit form, sentence. Corresponding language
punctuation marks are used to identify completion of
sentence. The sentence has to undergo some kind of
preprocessing.
Text summarization, first we, have to know that what a
summary is. A summary is a text that is producedfromoneor
more texts, that conveys important information in the
original text,and it is of a shorter form. The goal of automatic
text summarization is presenting the source text into a
shorter version with semantics. The most important
advantage of using a summary is, it reduces the readingtime.
Text Summarization methodscanbeclassifiedintoextractive
and abstractivesummarization.Anextractivesummarization
method consists of selecting important sentences,
concatenating them into shorter form. An Abstractive
summarization isan understanding of the main conceptsina
document and then express those concepts in clear natural
language. There are two different groups of text
summarization: indicative and informative. Inductive
summarization only represent the main ideaofthetexttothe
user. The typical length of this type of summarization is 5 to
10 percent of the main text. On the other hand, the
informative summarization systems gives concise
information of the main text .The length of informative
summary is 20 to 30 percent of the main text paragraphs etc.
from the original document and
1.1 OBJECTIVE:
This System aimtoachievethefollowingthroughthisproject:
 Provide an intelligent and interactive system for
interactive communications.
 This will be done by a high end and high
computation processing system.
 Can be used on an individual level.
 Design a system with a highly usable UI.
 The system has an underlying objective of
communicating with the user by masking itself as
human.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 889
1.2 SCOPE:
 The primary objective of the proposed system is to
build a system which will intelligently optimize the
English language inputs. This will be done with the
help of Natural Language Toolkit (NLTK) which
facilitates Natural Language Processing (NLP).
 The scope of the proposed system encompasses the
Natural Language Processing (NLP) dimensioninthe
field of man-machine integration.
 This can work as a stand-alone system or can be
integrated with other systems to give increased
number of functionalities.
2. LITERATURE SURVEY
Early experimentation in the late 1950's and early 60's
suggested that text summarization by computer wasfeasible
though not straightforward (Luhn, 59; Edmundson, 68). The
methods developed then were fairly unsophisticated,relying
primarily on surface level phenomena such as sentence
position and word frequency counts, and focused on
producing extracts (passages selected from the text,
reproduced verbatim) rather than abstracts (interpreted
portions of the text, newly generated). After a hiatus of some
decades, the growing presence of large amounts of online
text-in corpora and especially on the Web-renewed the
interest in automated text summarization. During these
intervening decades, progress in Natural Language
Processing (NLP), coupled with great increases of computer
memory and speed, made possible more sophisticated
techniques, with very encouraging results. In the lateI990's,
some relatively small research investments in the US (not
more than 10 projects, including commercial efforts at
Microsoft, Lexis-Nexis, Oracle, SRA, and TextWise, and
university efforts at CMU, NMSU, UPenn, and USC/lSI) over
three or four years have produced several systems that
exhibit potential marketability,aswellasseveralinnovations
that promise continued improvement. In addition, several
recent workshops, a book collection, and several tutorials
testify that automated text summarization has become a hot
area.AutomaticTextSummarizationgainedattentionasearly
as the 1950’s. A research paper, published by Hans Peter
Luhn in the late 1950s, titled “The automatic creation of
literature abstracts”, used features such as word frequency
and phrase frequency to extract important sentences from
the text for summarization purposes.
In Paper [1], A SURVEY OF TEXT SUMMARIZATION
TECHNIQUES this proposed paper presents a Numerous
approaches for identifying important content for automatic
text summarization have been developed to date. Topic
representation approaches first derive an intermediate
representation of the text that captures the topics discussed
in the input. Based on these representations of topics,
sentences in the input document are scored for importance
Predictive text computer simplified keyboard with wordand
phrase auto-completion[2]instudyapredictivetextpersonal
computer simplified keyboard with word and phrase auto-
completion. It has a smaller keypad with each key
representing several letters/characters so that only 9 keys
are required to representtheentirealphabetof26characters
In IEEE Paper [3], Spell Checking Techniques in NLP: A
Survey [Neha Gupta, Pratishta Mathur], Spell checkers in
Indian languages are the basic tools that need to be
developed. A spell checker is a software tool that identifies
and corrects any spelling mistakes in a text. Spell checkers
can be combined with other applications or they can be
distributed individually. In this paper the authors are
discussing both the approaches and their roles in various
applications.
In [4], that IEEE that will be study Grammar checker is one of
proofing tool used for syntactic analysis of the text. Various
techniques are used for development of grammar checker.
These techniques includes rule based technique, statistical
based technique and syntax based technique.Inthisresearch
article, all these three techniques have been discussed. Both
advantages and disadvantages of these techniques have also
been discussed at the end.
3. PROBLEM STATEMENT
The system should be smart enough to correct the errors in
English Language andalso summarize it. We willbeusingthe
NLTK tool available in Python. NLTK stands for Natural
Language Toolkit. We will use different tools available in the
Python NLTK for this purpose. Summarization feature will
take input as a meaningful paragraph in English Language.
Summarization functionality of the system will provide the
"meaningful summary" of the paragraph which is taken as
input. The output will be the summarized paragraph (where
the original meaning will be retained).
The spell check feature will be comparing the input spellings
and then suggesta correct one, ifanywordhasbeenmisspelt.
The Grammar check feature will be used to find errors in the
grammarandsuggestthecorrections,whicharepossible.The
auto completion feature completes simple sentences
automatically.
NLPcan be integrated with a websiteto provideamoreuser-
friendly experience. Features like spell check, autocomplete,
and autocorrect in search bars can make it easier for users to
find the information they’re looking for, which in turn keeps
them from navigating away from your site.
Automatic Text Summarizationisoneofthemostchallenging
and interesting problems in the field of Natural Language
Processing (NLP). It is a process of generating a concise and
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 890
meaningful summary of text from multiple text resources
such as books, news articles, blog posts, research papers,
emails, and tweets.
The demand for automatic text summarization systems is
spiking these days thanks to the availability of largeamounts
of textual data.
4. FLOWCHART
Fig -1: Flow chart of the system
4.1. ALGORITHMS
4.1.1 SUMMARIZATION:
In the processofsummarization,thefollowingstepsareused:
 Taking user input in the form of a paragraph
 Passing the user input to our summarizer.
 The summarizer then eliminates English language
stop words from the extract presented by the user
 It then carries out the process of Stemming
 After stemming, it creates a frequency table which
maintains a count of all the distinct words in the
extract entered by the user
 Tokenization takes place on the sentences in the
frequency table
 Assign weights to words in the frequency table after
tokenization of sentences using neural networks
 Find the average score for these values
 Generate the summary based on these values
4.1.2 GRAMMAR CHECK:
Grammar check is carried out by the following steps:
 Taking a grammatically incorrect sentence/
paragraph from the user
 Using the language_check tool in python to check if
the input sentence is following all the rules of
English grammar
 If the rules are not being followed, then return the
issues in the sentence/paragraph entered by the
user
4.1.3 SPELL CHECK:
Spell check follows the following algorithm:
1. Takes a sentence/paragraph from the user
2. Checks for spelling of each word in the
sentence/paragraph for correctness
3. Return the misspelt words so that they can be
highlighted to the user
Spell check is a form of NLP that everyone is used to by now.
It’s unobtrusive, easy to use, and can reduce a lot of
headaches for both users and agents alike.
Not every user is going to take the time to compose a
grammatically perfect sentence when contacting a help desk
or sales agent. Salesforce knows this, so they made sure their
contact form was equipped with spell check to make users’
lives easier.
This also makes their employees’ lives easier, too. Error-
ridden customer messages can be difficult to interpret,
leading to miscommunicationandfrustrationforallinvolved.
4.1.4 AUTO COMPLETE:
Auto complete is a neural networks based algorithm that
works on user’s previous data(of using the system) to give
auto completion suggestionstotheuserSearchautocomplete
is another type of NLP that many people use on a daily basis
and have almost come to expect when searching for
something. This is thanks in large part to pioneers like
Google, who have been using the feature in their search
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 891
engine for years. The feature is just as helpful on company
websites.
Salesforce integrated the feature into their personal search
engine. Users interested in learning more about a topic or
function of Salesforce’s product might know one keyword,
but maybe not the full term.
Search autocomplete will help them locate the correct
information and answer their questions faster.Thishelpscut
down on the likelihood that they’ll becomedisinterestedand
navigate away from the site.
5. CONCLUSION
The proposed system as planned after extensive research
during a literature survey includes the following features:
Implementation of Data Mining algorithmforsummarization
of a given English Language Paragraph. It will also enablethe
user to perform spell check and Grammar checkontheuser’s
inputs to the system in English language.
ACKNOWLEDGEMENT
The success and final outcome of this research required a lot
of guidance and assistance and we are extremely privileged
to have got this all. All that we have done is only due to such
supervision and assistance and we would not forget to thank
them.
We respect and thank Prof. Satish Kuchiwale, for providing
us insightand expertisethatgreatlyassistedtheresearch.We
are extremely thankful to him for providing such a nice
support and guidance.
REFERENCES
[1] Ani Nenkova, Kathleen McKeown “A SURVEY OF TEXT
SUMMARIZATION TECHNIQUES”
[2] David Gikandi ”Predictive text computer simplified
keyboard with word and phrase auto-completion”
[3] Neha Gupta, PratishtaMathur“SpellCheckingTechniques
in NLP: A Survey”
[4]BlossomManchanda,VijayAnantAthavale,Sanjeevkumar
Sharma “Various Techniques Used For Grammar Checking”
[5] N.Moratanch, S.Chitrakala “A Survey on Extractive Text
Summarization.
[6] Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhangand
Chengqing Zong ”Read, Watch, Listen and Summarize:
Multimodal Summarization for Asynchronous Text, Image,
Audio and Video
BIOGRAPHIES
Mahesh Sunil Patil, Pursuing the
Bachelor degree (B.E.) in Computer
Engineering from Smt. Indira Gandhi
College of Engineering (SIGCE), Navi
Mumbai. His current research
interests include Web Designing &
Machine Learning
Mayur Anand Pawar, Pursuing the
Bachelor degree (B.E.) in Computer
Engineering from Smt. Indira Gandhi
College of Engineering (SIGCE), Navi
Mumbai. His current research
interests include Web Designing &
Machine Learning
Yatin Sitaram Rai, Pursuing the
Bachelor degree (B.E.) in Computer
Engineering from Smt. Indira Gandhi
College of Engineering (SIGCE), Navi
Mumbai. His current research
interests include Web Designing &
Machine Learning
Prof. Satish Lalasaheb Kuchiwale,
Obtained the Bachelor degree (B.E.IT)
in the year 2007 from Rajarambapu
Institute of Technology (RAIT),
Rajaramnagar,Sakharale, and Master
degree (M.E. Computer) from
Lokamanya Tilak College of
Engineering(LTCE), Navi Mumbai. He
is Asst. Professor in Smt.Indira Gandhi
College of Engineering of Mumbai
university and having about 12 yrs. of
experience.

More Related Content

PDF
G1803013542
PDF
Natural Language Processing Theory, Applications and Difficulties
PDF
J1803015357
PDF
P1803018289
PDF
A Novel Approach for Rule Based Translation of English to Marathi
PDF
Teachbot teaching robot_using_artificial
PDF
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
PDF
Automatic Text Summarization using Natural Language Processing
G1803013542
Natural Language Processing Theory, Applications and Difficulties
J1803015357
P1803018289
A Novel Approach for Rule Based Translation of English to Marathi
Teachbot teaching robot_using_artificial
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Automatic Text Summarization using Natural Language Processing

What's hot (18)

PDF
Myanmar news summarization using different word representations
PDF
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
PDF
A prior case study of natural language processing on different domain
PDF
SCTUR: A Sentiment Classification Technique for URDU
PDF
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
PDF
IRJET- Review of Chatbot System in Marathi Language
PDF
Cl35491494
PDF
NLPinAAC
PDF
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
PDF
Mining Opinion Features in Customer Reviews
PDF
IRJET- Vernacular Language Spell Checker & Autocorrection
PDF
IRJET- Review of Chatbot System in Hindi Language
PDF
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
PDF
Conceptual framework for abstractive text summarization
PDF
Semantic analyzer for marathi text
PDF
Semantic analyzer for marathi text
PDF
ALGORITHM FOR TEXT TO GRAPH CONVERSION
PDF
AN OVERVIEW OF EXTRACTIVE BASED AUTOMATIC TEXT SUMMARIZATION SYSTEMS
Myanmar news summarization using different word representations
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
A prior case study of natural language processing on different domain
SCTUR: A Sentiment Classification Technique for URDU
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Review of Chatbot System in Marathi Language
Cl35491494
NLPinAAC
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Mining Opinion Features in Customer Reviews
IRJET- Vernacular Language Spell Checker & Autocorrection
IRJET- Review of Chatbot System in Hindi Language
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
Conceptual framework for abstractive text summarization
Semantic analyzer for marathi text
Semantic analyzer for marathi text
ALGORITHM FOR TEXT TO GRAPH CONVERSION
AN OVERVIEW OF EXTRACTIVE BASED AUTOMATIC TEXT SUMMARIZATION SYSTEMS
Ad

Similar to IRJET - Text Optimization/Summarizer using Natural Language Processing (20)

PDF
Design of optimal search engine using text summarization through artificial i...
PPTX
Weekairtificial intelligence 8-Module 7 NLP.pptx
PDF
Automation tool for evaluation of the quality of nlp based
PDF
Text Summarization and Conversion of Speech to Text
PPTX
NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
IRJET - Text Summarizer.
PDF
A domain specific automatic text summarization using fuzzy logic
DOC
PPTX
nlp-updated-230720173348-d9097e (1).pptx
PDF
Syntactic Indexes for Text Retrieval
PDF
Computer aided environment for drawing (to set) fill in the blank from given ...
PDF
Computer aided environment for drawing (to set) fill in the blank from given ...
PDF
Automatic Text Summarization: A Critical Review
PDF
Text Pre-Processing Techniques in Natural Language Processing: A Review
PDF
IRJET- Text Highlighting – A Machine Learning Approach
PDF
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...
PDF
A Survey on Automatic Text Summarization
PDF
IRJET- Automatic Recapitulation of Text Document
DOCX
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
PPTX
Natural language processing
Design of optimal search engine using text summarization through artificial i...
Weekairtificial intelligence 8-Module 7 NLP.pptx
Automation tool for evaluation of the quality of nlp based
Text Summarization and Conversion of Speech to Text
NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
IRJET - Text Summarizer.
A domain specific automatic text summarization using fuzzy logic
nlp-updated-230720173348-d9097e (1).pptx
Syntactic Indexes for Text Retrieval
Computer aided environment for drawing (to set) fill in the blank from given ...
Computer aided environment for drawing (to set) fill in the blank from given ...
Automatic Text Summarization: A Critical Review
Text Pre-Processing Techniques in Natural Language Processing: A Review
IRJET- Text Highlighting – A Machine Learning Approach
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...
A Survey on Automatic Text Summarization
IRJET- Automatic Recapitulation of Text Document
MOST CITED NATURAL LANGUAGECOMPUTING ARTICLESIN 2017
Natural language processing
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Construction Project Organization Group 2.pptx
PDF
PPT on Performance Review to get promotions
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Welding lecture in detail for understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
DOCX
573137875-Attendance-Management-System-original
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Internet of Things (IOT) - A guide to understanding
Sustainable Sites - Green Building Construction
Lecture Notes Electrical Wiring System Components
Construction Project Organization Group 2.pptx
PPT on Performance Review to get promotions
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
OOP with Java - Java Introduction (Basics)
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Arduino robotics embedded978-1-4302-3184-4.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
Welding lecture in detail for understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
573137875-Attendance-Management-System-original

IRJET - Text Optimization/Summarizer using Natural Language Processing

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 888 Text Optimization/Summarizer using Natural Language Processing Mahesh Patil1, Mayur Pawar2, Yatin Rai3, Prof. Satish Kuchiwale4 1-3Student, Computer Engineering, SIGCE, Navi Mumbai, Maharashtra, India 4Asst. Professor, Computer Engineering, Smt. Indira Gandhi College of Engineering, Navi Mumbai, Maharashtra -------------------------------------------------------------------------***------------------------------------------------------------------------ Abstract - Computers have become a major medium forthe exchange of information. Hence, we have the increasing need to do a thorough check of the information which the masses intend to exchange. The need for accurate grammar and spellings is not only a requirement for the formal business conversations, but also for the formal conversations taking place on the various platforms available for the same. The project aims at building an intelligent system to optimize English language. This system will perform functionslikeAuto Completion, Summarization, spell check, Grammar check. Key Words: Grammar, Business conversation, Optimize. 1. INTRODUCTION Man’s quest for making machines as smart as he is will go on forever. But the fact that we are able to design machines which react to human language by speaking like humans and passing the turing test with considerably fair amount of results is worth appreciating as far as the man-machine integration is concerned. The project aims at building intelligent system to optimize English language. It will perform the following tasks: 1. Grammar Optimization 2. Spellcheck 3. Summarization 4. Sentence Auto Completion Computers have becomea major mediumfortheexchangeof information. Hence, we have the increasing need to do a thorough check of the information which the masses intend to exchange. The need for accurate grammar and spellings is not only a requirementfortheformalbusinessconversations, but also for the formal conversations taking place on the various platforms available for the same. The project aims at building an intelligent system to optimize English language. This system will perform functions like Auto Completion, Summarization, Spell Check, Grammar Check Let’s define the job of a spell checker and an autocorrector. A word needs to be checked for spelling correctness and corrected if necessary, many a time in the context of the surrounding words. A spellchecker points to spelling errors and possibly suggests alternatives. An autocorrector usually goes a step further and automatically picks the most likely word. In case of the correct word already having been typed, the same is retained. So, in practice, an autocorrect is a bit more aggressive than a spellchecker, but this is more of an implementation detail — tools allow you to configure the behaviour. There is not much difference between the two in theory. So, the discussion in the rest of the blog post applies to both. Grammar checkers accepts input in form of documents, paragraphs or sentence. However it then break down input into unit form, sentence. Corresponding language punctuation marks are used to identify completion of sentence. The sentence has to undergo some kind of preprocessing. Text summarization, first we, have to know that what a summary is. A summary is a text that is producedfromoneor more texts, that conveys important information in the original text,and it is of a shorter form. The goal of automatic text summarization is presenting the source text into a shorter version with semantics. The most important advantage of using a summary is, it reduces the readingtime. Text Summarization methodscanbeclassifiedintoextractive and abstractivesummarization.Anextractivesummarization method consists of selecting important sentences, concatenating them into shorter form. An Abstractive summarization isan understanding of the main conceptsina document and then express those concepts in clear natural language. There are two different groups of text summarization: indicative and informative. Inductive summarization only represent the main ideaofthetexttothe user. The typical length of this type of summarization is 5 to 10 percent of the main text. On the other hand, the informative summarization systems gives concise information of the main text .The length of informative summary is 20 to 30 percent of the main text paragraphs etc. from the original document and 1.1 OBJECTIVE: This System aimtoachievethefollowingthroughthisproject:  Provide an intelligent and interactive system for interactive communications.  This will be done by a high end and high computation processing system.  Can be used on an individual level.  Design a system with a highly usable UI.  The system has an underlying objective of communicating with the user by masking itself as human.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 889 1.2 SCOPE:  The primary objective of the proposed system is to build a system which will intelligently optimize the English language inputs. This will be done with the help of Natural Language Toolkit (NLTK) which facilitates Natural Language Processing (NLP).  The scope of the proposed system encompasses the Natural Language Processing (NLP) dimensioninthe field of man-machine integration.  This can work as a stand-alone system or can be integrated with other systems to give increased number of functionalities. 2. LITERATURE SURVEY Early experimentation in the late 1950's and early 60's suggested that text summarization by computer wasfeasible though not straightforward (Luhn, 59; Edmundson, 68). The methods developed then were fairly unsophisticated,relying primarily on surface level phenomena such as sentence position and word frequency counts, and focused on producing extracts (passages selected from the text, reproduced verbatim) rather than abstracts (interpreted portions of the text, newly generated). After a hiatus of some decades, the growing presence of large amounts of online text-in corpora and especially on the Web-renewed the interest in automated text summarization. During these intervening decades, progress in Natural Language Processing (NLP), coupled with great increases of computer memory and speed, made possible more sophisticated techniques, with very encouraging results. In the lateI990's, some relatively small research investments in the US (not more than 10 projects, including commercial efforts at Microsoft, Lexis-Nexis, Oracle, SRA, and TextWise, and university efforts at CMU, NMSU, UPenn, and USC/lSI) over three or four years have produced several systems that exhibit potential marketability,aswellasseveralinnovations that promise continued improvement. In addition, several recent workshops, a book collection, and several tutorials testify that automated text summarization has become a hot area.AutomaticTextSummarizationgainedattentionasearly as the 1950’s. A research paper, published by Hans Peter Luhn in the late 1950s, titled “The automatic creation of literature abstracts”, used features such as word frequency and phrase frequency to extract important sentences from the text for summarization purposes. In Paper [1], A SURVEY OF TEXT SUMMARIZATION TECHNIQUES this proposed paper presents a Numerous approaches for identifying important content for automatic text summarization have been developed to date. Topic representation approaches first derive an intermediate representation of the text that captures the topics discussed in the input. Based on these representations of topics, sentences in the input document are scored for importance Predictive text computer simplified keyboard with wordand phrase auto-completion[2]instudyapredictivetextpersonal computer simplified keyboard with word and phrase auto- completion. It has a smaller keypad with each key representing several letters/characters so that only 9 keys are required to representtheentirealphabetof26characters In IEEE Paper [3], Spell Checking Techniques in NLP: A Survey [Neha Gupta, Pratishta Mathur], Spell checkers in Indian languages are the basic tools that need to be developed. A spell checker is a software tool that identifies and corrects any spelling mistakes in a text. Spell checkers can be combined with other applications or they can be distributed individually. In this paper the authors are discussing both the approaches and their roles in various applications. In [4], that IEEE that will be study Grammar checker is one of proofing tool used for syntactic analysis of the text. Various techniques are used for development of grammar checker. These techniques includes rule based technique, statistical based technique and syntax based technique.Inthisresearch article, all these three techniques have been discussed. Both advantages and disadvantages of these techniques have also been discussed at the end. 3. PROBLEM STATEMENT The system should be smart enough to correct the errors in English Language andalso summarize it. We willbeusingthe NLTK tool available in Python. NLTK stands for Natural Language Toolkit. We will use different tools available in the Python NLTK for this purpose. Summarization feature will take input as a meaningful paragraph in English Language. Summarization functionality of the system will provide the "meaningful summary" of the paragraph which is taken as input. The output will be the summarized paragraph (where the original meaning will be retained). The spell check feature will be comparing the input spellings and then suggesta correct one, ifanywordhasbeenmisspelt. The Grammar check feature will be used to find errors in the grammarandsuggestthecorrections,whicharepossible.The auto completion feature completes simple sentences automatically. NLPcan be integrated with a websiteto provideamoreuser- friendly experience. Features like spell check, autocomplete, and autocorrect in search bars can make it easier for users to find the information they’re looking for, which in turn keeps them from navigating away from your site. Automatic Text Summarizationisoneofthemostchallenging and interesting problems in the field of Natural Language Processing (NLP). It is a process of generating a concise and
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 890 meaningful summary of text from multiple text resources such as books, news articles, blog posts, research papers, emails, and tweets. The demand for automatic text summarization systems is spiking these days thanks to the availability of largeamounts of textual data. 4. FLOWCHART Fig -1: Flow chart of the system 4.1. ALGORITHMS 4.1.1 SUMMARIZATION: In the processofsummarization,thefollowingstepsareused:  Taking user input in the form of a paragraph  Passing the user input to our summarizer.  The summarizer then eliminates English language stop words from the extract presented by the user  It then carries out the process of Stemming  After stemming, it creates a frequency table which maintains a count of all the distinct words in the extract entered by the user  Tokenization takes place on the sentences in the frequency table  Assign weights to words in the frequency table after tokenization of sentences using neural networks  Find the average score for these values  Generate the summary based on these values 4.1.2 GRAMMAR CHECK: Grammar check is carried out by the following steps:  Taking a grammatically incorrect sentence/ paragraph from the user  Using the language_check tool in python to check if the input sentence is following all the rules of English grammar  If the rules are not being followed, then return the issues in the sentence/paragraph entered by the user 4.1.3 SPELL CHECK: Spell check follows the following algorithm: 1. Takes a sentence/paragraph from the user 2. Checks for spelling of each word in the sentence/paragraph for correctness 3. Return the misspelt words so that they can be highlighted to the user Spell check is a form of NLP that everyone is used to by now. It’s unobtrusive, easy to use, and can reduce a lot of headaches for both users and agents alike. Not every user is going to take the time to compose a grammatically perfect sentence when contacting a help desk or sales agent. Salesforce knows this, so they made sure their contact form was equipped with spell check to make users’ lives easier. This also makes their employees’ lives easier, too. Error- ridden customer messages can be difficult to interpret, leading to miscommunicationandfrustrationforallinvolved. 4.1.4 AUTO COMPLETE: Auto complete is a neural networks based algorithm that works on user’s previous data(of using the system) to give auto completion suggestionstotheuserSearchautocomplete is another type of NLP that many people use on a daily basis and have almost come to expect when searching for something. This is thanks in large part to pioneers like Google, who have been using the feature in their search
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 891 engine for years. The feature is just as helpful on company websites. Salesforce integrated the feature into their personal search engine. Users interested in learning more about a topic or function of Salesforce’s product might know one keyword, but maybe not the full term. Search autocomplete will help them locate the correct information and answer their questions faster.Thishelpscut down on the likelihood that they’ll becomedisinterestedand navigate away from the site. 5. CONCLUSION The proposed system as planned after extensive research during a literature survey includes the following features: Implementation of Data Mining algorithmforsummarization of a given English Language Paragraph. It will also enablethe user to perform spell check and Grammar checkontheuser’s inputs to the system in English language. ACKNOWLEDGEMENT The success and final outcome of this research required a lot of guidance and assistance and we are extremely privileged to have got this all. All that we have done is only due to such supervision and assistance and we would not forget to thank them. We respect and thank Prof. Satish Kuchiwale, for providing us insightand expertisethatgreatlyassistedtheresearch.We are extremely thankful to him for providing such a nice support and guidance. REFERENCES [1] Ani Nenkova, Kathleen McKeown “A SURVEY OF TEXT SUMMARIZATION TECHNIQUES” [2] David Gikandi ”Predictive text computer simplified keyboard with word and phrase auto-completion” [3] Neha Gupta, PratishtaMathur“SpellCheckingTechniques in NLP: A Survey” [4]BlossomManchanda,VijayAnantAthavale,Sanjeevkumar Sharma “Various Techniques Used For Grammar Checking” [5] N.Moratanch, S.Chitrakala “A Survey on Extractive Text Summarization. [6] Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhangand Chengqing Zong ”Read, Watch, Listen and Summarize: Multimodal Summarization for Asynchronous Text, Image, Audio and Video BIOGRAPHIES Mahesh Sunil Patil, Pursuing the Bachelor degree (B.E.) in Computer Engineering from Smt. Indira Gandhi College of Engineering (SIGCE), Navi Mumbai. His current research interests include Web Designing & Machine Learning Mayur Anand Pawar, Pursuing the Bachelor degree (B.E.) in Computer Engineering from Smt. Indira Gandhi College of Engineering (SIGCE), Navi Mumbai. His current research interests include Web Designing & Machine Learning Yatin Sitaram Rai, Pursuing the Bachelor degree (B.E.) in Computer Engineering from Smt. Indira Gandhi College of Engineering (SIGCE), Navi Mumbai. His current research interests include Web Designing & Machine Learning Prof. Satish Lalasaheb Kuchiwale, Obtained the Bachelor degree (B.E.IT) in the year 2007 from Rajarambapu Institute of Technology (RAIT), Rajaramnagar,Sakharale, and Master degree (M.E. Computer) from Lokamanya Tilak College of Engineering(LTCE), Navi Mumbai. He is Asst. Professor in Smt.Indira Gandhi College of Engineering of Mumbai university and having about 12 yrs. of experience.