SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 285
AN AUTOMATIC TEXT SUMMARIZATION USING LEXICAL
COHESION AND CORRELATION OF SENTENCES
A.R.Kulkarni1
, S.S.Apte2
1
Computer Science & Engineering Department, Walchand Institute of Technology, Solapur – 413006, India
2
Head, Computer Science & Engineering Department, Walchand Institute of Technology, Solapur – 413006, India
Abstract
Due to substantial increase in the amount of information on the Internet, it has become extremely difficult to search for relevant
documents needed by the users. To solve this problem, Text summarization is used which produces the summary of documents
such that the summary contains important content of the document. This paper proposes a better approach for text summarization
using lexical chaining and correlation of sentences. Lexical chains are created using Wordnet . The score of each Lexical chain is
calculated based on keyword strength, Tf-idf & other features. The concept of using lexical chains helps to analyze the document
semantically and the concept of correlation of sentences helps to consider the relation of sentence with preceding or succeeding
sentence. This improves the quality of summary generated.
In this paper we discuss a summarization method, which combines lexical chaining with correlation of sentences in which relation
of a sentence with the preceding sentence is considered. Our experiments show that the inclusion of both these features improves
the quality of summary generated.
Keywords— Text summarization, Wordnet, Correlation of sentences, Lexical chains
--------------------------------------------------------------------***-----------------------------------------------------------------
1. INTRODUCTION
1.1 Motivation
These days, the number of Web pages on the Internet almost
doubles every year as the information is now available from
a variety of sources. It takes considerable amount of time to
find the relevant information. Automatic Text
Summarization will help the users to find the relevant
information rapidly. It generates the summary of the
document and one can read the read the summary and
decide the relevance of the document to the information
needed by the user.
1.2 Background Research:
Text summarization is the process of producing a condensed
version of original document. This condensed version
should have important content of the original document.
Research is being done since many years to generate
coherent and indicative summaries using different
techniques. According to (Jones, 1993) the text
summarization is described as two step process
i) Building a source representation from the original
document.
ii) Generating summary from the source representation
Text summarization can be broadly classified into two types:
Single document summarization and multi-document
summarization. This paper focuses on single document
summarization that generates summary of single document.
The text summarization can be categorized into extractive
and abstractive based on the nature of text representation in
the summary.
Many methods have been proposed till now on generating a
coherent summary. The earlier methods used only statistical
methods that focused on term frequency [1] for choosing
important sentences. These methods were not found to be
efficient as it did not consider all the contexts of the word
or identify semantically related terms known as cohesion.
Then came methods which used semantic representation of
the original document supported by a domain-specific
knowledge base. Now a days text summarization is
considered as a natural language processing task . Lexical
chains a simplest form of lexical cohesion was introduced
by Morns & Hirst[2].But it was found that all possible
senses of the word were not taken into account. .
Berzilay & Elhada [2] presented a better algorithm that
constructs all possible interpretations of the source text
using lexical chains. It is an efficient method for text
summarization as lexical chains identify and capture
important concepts of the document without going into
deep semantic analyses. Lexical chains are constructed
using some knowledge base that contains nouns and its
various associations.
Our Algorithm is based on the method used above. We
have used Wordnet to generate domain-specific extractive
summary using Lexical chains for the nouns in the
document. The algorithm segments the given content into
sentences & then into tokens. These tokens are tagged using
POS tagger. The Nouns are selected & for each noun in the
segment, we consider its sense using Wordnet. Then we
attempt to merge these senses into all of the existing chains
in all possible ways, hence building every possible
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 286
interpretation of the segment. Next merge chains between
segments that contain a word in the same sense in common.
The algorithm then calculates score of lexical chains,
determines the strongest chain and uses this to generate a
summary. We have also used the concept of correlation of
sentences to generate a good quality summary.. The terms
that occur in the strongest lexical chains are considered as
key terms and the score of sentences is calculated based
on the presence of key terms in it. All the sentences are
ranked based on their score and top n sentences are selected
for inclusion in the summary. Then the correlation of
sentences is checked and if any sentence has correlation
with the previous sentence, then the previous sentence
should also be included in the summary based on condition
as shown in the algorithm below
2. ARCHITECTURE OF TEXT
SUMMARIZATION
Preprocessing includes
 Segmentation
 Tokenization
 POS(part of speech tagging) at lexical level.
 Stemming.
3. LEXICAL CHAIN COMPUTING
ALGORITHM
1. Input Original document for generating summary
(.txt file).
2. Divide the document into sentences using
segmentation.
3. Each sentence is divided into tokens using
tokenizer.
4. These tokens are tagged using POS Tagger.
5. For each noun build the synsets.
6. For each sentence generate a map using 4
relations: Synonym, Hyperrnym, Hyponym,
Merynym.
7. Calculate distance of each word from other related
words.
8. Build Lexical chains using generated map.
9. Calculate each chain weight using values of
distances of each word
10. Select longest chain i.e. best chain having highest
chain weight
11. From the original document select sentences that
have words in the best chain retaining their order
of occurrence in the original document.
12. Pick top n sentences as summary based on the
percentage of original document to be used for
generating summary.
13. If the selected sentence starts with words :
although, however, moreover ,also, this, those and
that ,then they are related with the preceding
sentence.
14. If the rank of the preceding sentence is equal to or
greater than 70% of the rank of the selected
sentence, then it is included in the summary. In this
way correlation between sentences is maintained.
4. EVALUATION
Evaluation is the most important part of any research work.
It helps to compare various techniques based on evaluation
metrics.
This paper uses precision & recall [4,5,6]technique for
evaluation which is based on statistical measures. Precision
evaluates the proportion of correctness for the sentences in
the summary whereas recall is utilized to evaluate the
proportion of relevant sentences included in the summary.
4.1 Precision
Precision = {Retrieved sentences} - {Relevant sentences}
-----------------------------------------------------------
{Retrieved Sentences}
The higher the precision value, the better is the efficiency
of the system in reducing irrelevant Sentences
4.2 Recall
Recall= {Retrieved sentences}- {Relevant sentences}
______________________________
{relevant sentences}
Higher the recall value, better the efficiency of the
approach in selecting only relevant sentences.
4.3 F-Measure
The weighted harmonic mean of precision and recall is
called as F-measure
F-measure= 2 x Precision * recall
-----------------------
Precision + recall
5. EXPERIMENTAL RESULTS
Three documents are taken in news domain. The original
document, manually generated summaries and summaries
generated by the above approach are shown below. The
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 287
precision recall and F-measure are calculated for these three
documents and they are compared with other two
summarizers.
Original Document 1
Ideal Summary of Document 1
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 288
Summary of Document 1 generated by our Summarizer
Original Document 2
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 289
Ideal Summary of Document 2
Summary of Document 2 generated by our summarizer
Original document 3
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 290
Ideal summary of document 3
Generated Summary of Document 3
6. COMPARISION
This paper considers online summarizer from freesummarizer.com[7], Copernicus summarizer and our summarizer using lexical
chains of sentences for comparison. The above three documents are used as input to all the three summarizers. The precision,
recall and F-measure are used as performance measures for summary generated.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 291
Document1
Document2:
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Copernic Summarizer Online Summarizer Lexical Chain
Summarizer
Precision
Recall
F-measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Copernic Summarizer Online Summarizer Lexical Chain
Summarizer
Precision
Recall
F-measure
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 292
Document3:
7. CONCLUSIONS
It is seen that for document 1 and document 3 our summarizer
performs better than Copernicus summarizer. and online
summarizer. For document 2, It performs equally as online
summarizer but less efficient than Copernicus summarizer.
Our summarizer is better as it also considers the semantic
analysis of the document & correlation of sentences for
generating the summary.
REFERENCES
[1] Canasai Kruengkari and Chuleer at Jaruskulchai,
"Generic Text Summarization Using Local and Global
Properties of Sentences", Proceedings of the IEEE/WIC
international Conference on Web Intelligence (WI’03),
2003.
[2] Morris, J. and G. Hirst. Lexical cohesion computed by
thesaural relations as an indicator of the structure of the
text. In Computational Linguistics, 18(1):pp21-45.
1991.
[3] Barzilay, Regina and Michael Elhadad. Using Lexical
Chains for Text Summarization. in Proceedings of the
Intelligent Scalable Text Summarization
Workshop.(ISTS’97), ACL Madrid, 1997.
[4] Rene Arnulfo Garcia-Hera ndez and Yulia Ledeneva,
“Word Sequence Models for Single Text
Summarization”, IEEE,44-48, 2009.
[5] Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, Jaime
Carbonell, Summarizing text documents: Sentence
Selection and Evaluation Metrics, Language
Technologies Institute, Carnegie Mellon University.
[6] Khosrow Kaikhah, "Automatic Text summarization
with Neural Networks", in Proceedings of second
international Conference on intelligent systems, IEEE,
40-44, Texas, USA, June 2004.
[7] www.freesummarizer.com/summarize/
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Copernic Summarizer Online Summarizer Lexical Chain
Summarizer
Precision
Recall
F-measure

More Related Content

PDF
K0936266
PDF
Improvement of Text Summarization using Fuzzy Logic Based Method
PDF
Sentence similarity-based-text-summarization-using-clusters
PDF
A template based algorithm for automatic summarization and dialogue managemen...
PDF
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
PDF
Optimal approach for text summarization
PDF
A4 elanjceziyan
PDF
IRJET- A Survey Paper on Text Summarization Methods
K0936266
Improvement of Text Summarization using Fuzzy Logic Based Method
Sentence similarity-based-text-summarization-using-clusters
A template based algorithm for automatic summarization and dialogue managemen...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
Optimal approach for text summarization
A4 elanjceziyan
IRJET- A Survey Paper on Text Summarization Methods

What's hot (17)

PDF
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
PDF
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
PDF
Complete agglomerative hierarchy document’s clustering based on fuzzy luhn’s ...
PDF
I6 mala3 sowmya
PDF
IRJET-Semantic Based Document Clustering Using Lexical Chains
PDF
Y24168171
PDF
Elevating forensic investigation system for file clustering
PDF
Elevating forensic investigation system for file clustering
PDF
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
PDF
IRJET- Automatic Recapitulation of Text Document
PDF
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
PDF
Legal Document
PDF
Generation of Question and Answer from Unstructured Document using Gaussian M...
PDF
Analysis of Opinionated Text for Opinion Mining
PDF
Dr31564567
PDF
Information extraction using discourse
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
Semi-Supervised Keyphrase Extraction on Scientific Article using Fact-based S...
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
Complete agglomerative hierarchy document’s clustering based on fuzzy luhn’s ...
I6 mala3 sowmya
IRJET-Semantic Based Document Clustering Using Lexical Chains
Y24168171
Elevating forensic investigation system for file clustering
Elevating forensic investigation system for file clustering
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
IRJET- Automatic Recapitulation of Text Document
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
Legal Document
Generation of Question and Answer from Unstructured Document using Gaussian M...
Analysis of Opinionated Text for Opinion Mining
Dr31564567
Information extraction using discourse
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
Ad

Viewers also liked (16)

PPTX
Document Summarizer
PPT
5 Bedroom Terra Nova Villa Community View Arabian Ranches
PDF
Avoiding packet loss in gateway reallocation in mobile wimax networks
PPT
DOCX
June 1 prescription for life
PPT
праздники осени
PDF
Gestão de recursos 3º Nível- Curso Básico em Agro-Pecuário
PDF
Heterogeneous data transfer and loader
PPTX
Kebutuhan Server
PPTX
You Will Be Breached
PDF
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
PDF
The Economic Impact of Hockey in Canada
PPTX
Nhìn ra thế giới để tìm kiếm cơ hội làm giàu cho người việt nam
PPTX
Smo by goigi
PPTX
Presentation 11 19 [recovered]
Document Summarizer
5 Bedroom Terra Nova Villa Community View Arabian Ranches
Avoiding packet loss in gateway reallocation in mobile wimax networks
June 1 prescription for life
праздники осени
Gestão de recursos 3º Nível- Curso Básico em Agro-Pecuário
Heterogeneous data transfer and loader
Kebutuhan Server
You Will Be Breached
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
The Economic Impact of Hockey in Canada
Nhìn ra thế giới để tìm kiếm cơ hội làm giàu cho người việt nam
Smo by goigi
Presentation 11 19 [recovered]
Ad

Similar to An automatic text summarization using lexical cohesion and correlation of sentences (20)

PDF
Conceptual framework for abstractive text summarization
PPTX
Automatic keyword extraction.pptx
PPTX
Keyword_extraction.pptx
PDF
A domain specific automatic text summarization using fuzzy logic
PDF
Automation tool for evaluation of the quality of nlp based
PDF
Document Summarization
PDF
A Survey on Automatic Text Summarization
PDF
Query Answering Approach Based on Document Summarization
PDF
Automatic Text Summarization using Natural Language Processing
PDF
Text summarization
PDF
Towards efficient knowledge extraction: Natural language processing-based sum...
PDF
AbstractiveSurvey of text in today timef
PDF
Automatic Text Summarization: A Critical Review
PDF
Article Summarizer
PPTX
Comparative Analysis of Text Summarization Techniques
PDF
Automatic Text Summarization
PDF
Automatic Text Summarization Using Natural Language Processing (1)
PDF
IRJET- Text Highlighting – A Machine Learning Approach
PDF
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...
PDF
AN OVERVIEW OF EXTRACTIVE BASED AUTOMATIC TEXT SUMMARIZATION SYSTEMS
Conceptual framework for abstractive text summarization
Automatic keyword extraction.pptx
Keyword_extraction.pptx
A domain specific automatic text summarization using fuzzy logic
Automation tool for evaluation of the quality of nlp based
Document Summarization
A Survey on Automatic Text Summarization
Query Answering Approach Based on Document Summarization
Automatic Text Summarization using Natural Language Processing
Text summarization
Towards efficient knowledge extraction: Natural language processing-based sum...
AbstractiveSurvey of text in today timef
Automatic Text Summarization: A Critical Review
Article Summarizer
Comparative Analysis of Text Summarization Techniques
Automatic Text Summarization
Automatic Text Summarization Using Natural Language Processing (1)
IRJET- Text Highlighting – A Machine Learning Approach
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...
AN OVERVIEW OF EXTRACTIVE BASED AUTOMATIC TEXT SUMMARIZATION SYSTEMS

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Sustainable Sites - Green Building Construction
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Welding lecture in detail for understanding
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Well-logging-methods_new................
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Sustainable Sites - Green Building Construction
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Lecture Notes Electrical Wiring System Components
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Welding lecture in detail for understanding
Arduino robotics embedded978-1-4302-3184-4.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
CH1 Production IntroductoryConcepts.pptx
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Lesson 3_Tessellation.pptx finite Mathematics
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Well-logging-methods_new................

An automatic text summarization using lexical cohesion and correlation of sentences

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 285 AN AUTOMATIC TEXT SUMMARIZATION USING LEXICAL COHESION AND CORRELATION OF SENTENCES A.R.Kulkarni1 , S.S.Apte2 1 Computer Science & Engineering Department, Walchand Institute of Technology, Solapur – 413006, India 2 Head, Computer Science & Engineering Department, Walchand Institute of Technology, Solapur – 413006, India Abstract Due to substantial increase in the amount of information on the Internet, it has become extremely difficult to search for relevant documents needed by the users. To solve this problem, Text summarization is used which produces the summary of documents such that the summary contains important content of the document. This paper proposes a better approach for text summarization using lexical chaining and correlation of sentences. Lexical chains are created using Wordnet . The score of each Lexical chain is calculated based on keyword strength, Tf-idf & other features. The concept of using lexical chains helps to analyze the document semantically and the concept of correlation of sentences helps to consider the relation of sentence with preceding or succeeding sentence. This improves the quality of summary generated. In this paper we discuss a summarization method, which combines lexical chaining with correlation of sentences in which relation of a sentence with the preceding sentence is considered. Our experiments show that the inclusion of both these features improves the quality of summary generated. Keywords— Text summarization, Wordnet, Correlation of sentences, Lexical chains --------------------------------------------------------------------***----------------------------------------------------------------- 1. INTRODUCTION 1.1 Motivation These days, the number of Web pages on the Internet almost doubles every year as the information is now available from a variety of sources. It takes considerable amount of time to find the relevant information. Automatic Text Summarization will help the users to find the relevant information rapidly. It generates the summary of the document and one can read the read the summary and decide the relevance of the document to the information needed by the user. 1.2 Background Research: Text summarization is the process of producing a condensed version of original document. This condensed version should have important content of the original document. Research is being done since many years to generate coherent and indicative summaries using different techniques. According to (Jones, 1993) the text summarization is described as two step process i) Building a source representation from the original document. ii) Generating summary from the source representation Text summarization can be broadly classified into two types: Single document summarization and multi-document summarization. This paper focuses on single document summarization that generates summary of single document. The text summarization can be categorized into extractive and abstractive based on the nature of text representation in the summary. Many methods have been proposed till now on generating a coherent summary. The earlier methods used only statistical methods that focused on term frequency [1] for choosing important sentences. These methods were not found to be efficient as it did not consider all the contexts of the word or identify semantically related terms known as cohesion. Then came methods which used semantic representation of the original document supported by a domain-specific knowledge base. Now a days text summarization is considered as a natural language processing task . Lexical chains a simplest form of lexical cohesion was introduced by Morns & Hirst[2].But it was found that all possible senses of the word were not taken into account. . Berzilay & Elhada [2] presented a better algorithm that constructs all possible interpretations of the source text using lexical chains. It is an efficient method for text summarization as lexical chains identify and capture important concepts of the document without going into deep semantic analyses. Lexical chains are constructed using some knowledge base that contains nouns and its various associations. Our Algorithm is based on the method used above. We have used Wordnet to generate domain-specific extractive summary using Lexical chains for the nouns in the document. The algorithm segments the given content into sentences & then into tokens. These tokens are tagged using POS tagger. The Nouns are selected & for each noun in the segment, we consider its sense using Wordnet. Then we attempt to merge these senses into all of the existing chains in all possible ways, hence building every possible
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 286 interpretation of the segment. Next merge chains between segments that contain a word in the same sense in common. The algorithm then calculates score of lexical chains, determines the strongest chain and uses this to generate a summary. We have also used the concept of correlation of sentences to generate a good quality summary.. The terms that occur in the strongest lexical chains are considered as key terms and the score of sentences is calculated based on the presence of key terms in it. All the sentences are ranked based on their score and top n sentences are selected for inclusion in the summary. Then the correlation of sentences is checked and if any sentence has correlation with the previous sentence, then the previous sentence should also be included in the summary based on condition as shown in the algorithm below 2. ARCHITECTURE OF TEXT SUMMARIZATION Preprocessing includes  Segmentation  Tokenization  POS(part of speech tagging) at lexical level.  Stemming. 3. LEXICAL CHAIN COMPUTING ALGORITHM 1. Input Original document for generating summary (.txt file). 2. Divide the document into sentences using segmentation. 3. Each sentence is divided into tokens using tokenizer. 4. These tokens are tagged using POS Tagger. 5. For each noun build the synsets. 6. For each sentence generate a map using 4 relations: Synonym, Hyperrnym, Hyponym, Merynym. 7. Calculate distance of each word from other related words. 8. Build Lexical chains using generated map. 9. Calculate each chain weight using values of distances of each word 10. Select longest chain i.e. best chain having highest chain weight 11. From the original document select sentences that have words in the best chain retaining their order of occurrence in the original document. 12. Pick top n sentences as summary based on the percentage of original document to be used for generating summary. 13. If the selected sentence starts with words : although, however, moreover ,also, this, those and that ,then they are related with the preceding sentence. 14. If the rank of the preceding sentence is equal to or greater than 70% of the rank of the selected sentence, then it is included in the summary. In this way correlation between sentences is maintained. 4. EVALUATION Evaluation is the most important part of any research work. It helps to compare various techniques based on evaluation metrics. This paper uses precision & recall [4,5,6]technique for evaluation which is based on statistical measures. Precision evaluates the proportion of correctness for the sentences in the summary whereas recall is utilized to evaluate the proportion of relevant sentences included in the summary. 4.1 Precision Precision = {Retrieved sentences} - {Relevant sentences} ----------------------------------------------------------- {Retrieved Sentences} The higher the precision value, the better is the efficiency of the system in reducing irrelevant Sentences 4.2 Recall Recall= {Retrieved sentences}- {Relevant sentences} ______________________________ {relevant sentences} Higher the recall value, better the efficiency of the approach in selecting only relevant sentences. 4.3 F-Measure The weighted harmonic mean of precision and recall is called as F-measure F-measure= 2 x Precision * recall ----------------------- Precision + recall 5. EXPERIMENTAL RESULTS Three documents are taken in news domain. The original document, manually generated summaries and summaries generated by the above approach are shown below. The
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 287 precision recall and F-measure are calculated for these three documents and they are compared with other two summarizers. Original Document 1 Ideal Summary of Document 1
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 288 Summary of Document 1 generated by our Summarizer Original Document 2
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 289 Ideal Summary of Document 2 Summary of Document 2 generated by our summarizer Original document 3
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 290 Ideal summary of document 3 Generated Summary of Document 3 6. COMPARISION This paper considers online summarizer from freesummarizer.com[7], Copernicus summarizer and our summarizer using lexical chains of sentences for comparison. The above three documents are used as input to all the three summarizers. The precision, recall and F-measure are used as performance measures for summary generated.
  • 7. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 291 Document1 Document2: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Copernic Summarizer Online Summarizer Lexical Chain Summarizer Precision Recall F-measure 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Copernic Summarizer Online Summarizer Lexical Chain Summarizer Precision Recall F-measure
  • 8. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 03 Issue: 06 | Jun-2014, Available @ http://www.ijret.org 292 Document3: 7. CONCLUSIONS It is seen that for document 1 and document 3 our summarizer performs better than Copernicus summarizer. and online summarizer. For document 2, It performs equally as online summarizer but less efficient than Copernicus summarizer. Our summarizer is better as it also considers the semantic analysis of the document & correlation of sentences for generating the summary. REFERENCES [1] Canasai Kruengkari and Chuleer at Jaruskulchai, "Generic Text Summarization Using Local and Global Properties of Sentences", Proceedings of the IEEE/WIC international Conference on Web Intelligence (WI’03), 2003. [2] Morris, J. and G. Hirst. Lexical cohesion computed by thesaural relations as an indicator of the structure of the text. In Computational Linguistics, 18(1):pp21-45. 1991. [3] Barzilay, Regina and Michael Elhadad. Using Lexical Chains for Text Summarization. in Proceedings of the Intelligent Scalable Text Summarization Workshop.(ISTS’97), ACL Madrid, 1997. [4] Rene Arnulfo Garcia-Hera ndez and Yulia Ledeneva, “Word Sequence Models for Single Text Summarization”, IEEE,44-48, 2009. [5] Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, Jaime Carbonell, Summarizing text documents: Sentence Selection and Evaluation Metrics, Language Technologies Institute, Carnegie Mellon University. [6] Khosrow Kaikhah, "Automatic Text summarization with Neural Networks", in Proceedings of second international Conference on intelligent systems, IEEE, 40-44, Texas, USA, June 2004. [7] www.freesummarizer.com/summarize/ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Copernic Summarizer Online Summarizer Lexical Chain Summarizer Precision Recall F-measure