SlideShare a Scribd company logo
CSE 4000: Project/Thesis 
A Study on Fuzzy Relational 
Clustering for Sentence-level Text
A Study on Fuzzy Relational Clustering for 
Sentence-level Text 
• Presented by- 
Sikder Tahsin Al-Amin Mahade Hasan 
0907018 0907047 
• Thesis Supervisor- 
Dr. M.M.A. Hashem 
Professor 
Dept. of CSE,KUET
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
Cluster Analysis 
• Creation of subsets (cluster) from 
documents. 
• Similar objects in a cluster.
Clustering 
• Hard clustering 
– data belong to exactly one cluster. 
• Fuzzy (Soft) clustering 
– data belong more than one cluster 
– associated with a membership value
Sentence Clustering 
• A sentence - related to more than one 
theme. 
• Hard clustering approaches are generally 
not applicable. 
• So, a fuzzy clustering algorithm is needed.
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
General Problem Statement 
• Finding similar sentences in a document 
• Group them in a cluster.
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
Objectives 
• Apply a fuzzy clustering algorithm on 
sentences. 
• Develop a sentence similarity method to 
measure similarity of sentences. 
• Evaluate the result on a quotation and 
news article dataset.
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
Flow Chart of Proposed Method 
Expectation 
Maximization 
Determine cluster 
membership values 
푚 
푃푖 
No 
Sentence Similarity 
Matrix 푆푖푗 
Page Rank Algorithm 
Overall Sentence 
Similarity 푆푠푖푚(푖,푗) 
WordNet 
Database Expected 
Result? 
Stop 
Document 
(Collection of sentences) 
푆푖 , i=1,2….n 
Word-to-word 
similarity 푆푤표푟푑 
Order similarity 
푆표푟푑푒푟 
Yes 
Fig-1: Flow Chart of the proposed method
Sentence Similarity Method 
Word-to-word 
similarity 
Order vector 1 
Order vector 2 
Order similarity 
WordNet 
Database 
Sentence 1 
Sentence 2 
Sentence 
Similarity 
Fig 2: Sentence Similarity Computation Diagram
Word-to-Word Similarity 
• Similarity between each word of 1st 
sentence with each word of 2nd sentence is 
added. 
• Word Similarity - Jiang-Conrath similarity 
measure (JnC) is used.
Word-to-Word Similarity 
• S1=푤11, 푤12 … 푤1푛 
• S2=푤21, 푤22 … 푤2푛 
• 푆푤표푟푑 = 
퐽푛퐶(푤11, 푤21) + ⋯ + 퐽푛퐶(푤11, 푤2푛) +… 
퐽푛퐶(푤1푛, 푤21) + ⋯ + 퐽푛퐶(푤1푛, 푤2푛)
Order Similarity 
• S1: A dog jumps over the fox. 
S2: A fox jumps over the dog. 
• Joint word set= {A dog jumps over the fox} 
1 2 3 4 5 6 
• r1={ 1 2 3 4 5 6} 
r2={ 1 6 3 4 5 2} 
• 푆표푟푑푒푟 = 1 − 
| 푟1−푟2 | 
| 푟1+푟2 |
Overall Sentence Similarity 
• 푆푠푖푚(푠1,푠2) = 푟 ∗ 푆푤표푟푑 + 1 − 푟 ∗ 푆표푟푑푒푟 
• Where, 0.5 < 푟 ≤ 1
Affinity Matrix 
푚 = 푠푖푗 ∗ 푝푖 
• 푤푖푗 
푚 ∗ 푝푗 
푚 
• 푤푚 푖푗 
= weight between objects i and j in cluster 
m 
• 푠푖푗 = similarity between objects i and j 
• 푝푖 
푚 =the membership value of object i to 
cluster m
Initialization 
• Number of clusters. 
• Membership values and PageRank values 
are randomly initialized. 
• Uniform Random Number generation is 
used.
PageRank Algorithm 
• PageRank - used to rank webpages. 
• It measures the importance of website 
pages 
• Here Modified PageRank is used to 
determine the importance of a sentence in 
a document.
Modified PageRank Algorithm 
푃푅 푉푖 = 1 − 푑 + 푑 ∗ 
푁 
(푊푗푖 ∗ 
푗=1 
푃푅 푉푗 
푁 푊푗푘 
푘=1 
) 
dumping factor, d = 0.85
Expectation-Maximization 
Algorithm 
• Expectation step (E-step) & 
• Maximization step (M-step) 
- iterated until convergence. 
• E-step computes the cluster membership 
values. 
• M-step updates the membership function.
Selecting Output Clusters 
• Output cluster 
– For which the membership value is highest.
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
Clustering Quotation Dataset 
Knowledge Class 
1. Our knowledge can only be finite, while our ignorance must necessarily be infinite. 
2. Everybody gets so much common information all day long that they lose their common sense…. 
Marriage Class 
11. A husband is what is left of a lover, after the nerve has been extracted. 
12. Marriage has many pains, but celibacy has no pleasures….. 
Nature Class 
21. I have called this principle by which each slight variation if useful is preserved by the term 
natural selection. 
22. Nature is reckless of the individual. When she has points to carry, she carries them….. 
Peace Class 
31. There is no such thing as inner peace there is only nervousness and death. 
32. Once you hear the details of victory, it is hard to distinguish it from a defeat….. 
Food Class 
41. Food is an important part of a balanced diet. 
42. To eat well in England you should have breakfast three times a day…..
Result 
No. of Cluster Purity Entropy Rand F-Measure 
Our Method 
4 
5 
6 
7 
.350 
.400 
.400 
.400 
.550 
.550 
.445 
.550 
.750 
.795 
.765 
.770 
.380 
.400 
.350 
.340 
ARCA 
4 
5 
6 
7 
.543 
.622 
.680 
.678 
.646 
.515 
.451 
.444 
.710 
.786 
.815 
.817 
.391 
.459 
.462 
.427 
Spectral Clust. 
4 
5 
6 
7 
.641 
.652 
.690 
.699 
.472 
.508 
.475 
.429 
.795 
.785 
.800 
.809 
.544 
.477 
.444 
.431 
K-Medoids 
4 
5 
6 
7 
.560 
.600 
.720 
.740 
.605 
.569 
.457 
.425 
.691 
.733 
.779 
.805 
.392 
.433 
.459 
.461
Clustering on News Articles 
News Article 
1. President Barak Obama on Tuesday championed nuclear energy expansion 
as the latest way that feuding parties can move beyond the broken politics 
2. The economic stimulus plan against Republican criticism reflecting White 
House will remain aggressive in selling its own case to the public 
3. Since a January special election in Masssachusetts Obama has recalibrated 
his strategy to advance has agenda 
………. 
30. A CBS news poll in early February found eighty one percent saying it’s time 
to elect new people to congress 
31. That affects Obama who is not up for reelection but needs allies and votes 
on Capitol Hill to usher in the domestic change he has promised.
Clustering on News Articles 
Cluster 
Number 
Sentences 
1 3,5,6,9,10,13,14,15,16,17,19,21,24 
2 2,7,12,18,20,22,25,26,27,28,29 
3 1,8,23,30
Clustering on News Articles 
News Article Dataset 
12. That mission however remains in doubt. 
22. Obama in working to change that system he is required to work within it. 
26. But the white house is taking an approach that is at once more 
aggressive and more streamlined. 
28. The intended narrative is one in which Obama hears people’s frustrations 
and is working directly to end them. 
29. There is a little doubt the public is angry. 
• Reside in cluster 2 (Negative)
Outline 
• Introduction 
• General Problem Statement 
• Objectives 
• Methodology 
• Result & Discussion 
• Future Work
Scope of Future Research 
• Perform Hierarchical fuzzy clustering. 
• Updating text preprocessing step.
References 
• A Skabar and K Abdalgader, “Clustering sentence level text using a 
novel fuzzy relational clustering algorithm,” IEEE Transaction on 
knowledge and data engineering, VOL. 25, NO. 1, Jan 2013. 
• Y. Li, D. McLean, Z.A. Bandar, J.D. O’Shea, and K. Crockett, “Sentence 
Similarity Based on Semantic Nets and Corpus Statistics,” IEEE Trans. 
Knowledge and Data Eng., vol. 8, no. 8, pp. 1138-1150, Aug. 2006 
• J.J. Jiang and D.W. Conrath, “Semantic Similarity Based on Corpus 
Statistics and Lexical Taxonomy,” Proc. 10th Int’l Conf. Research in 
Computational Linguistics, pp. 19-33, 1997. 
• Web resources
Thank you

More Related Content

PPTX
Sherri Collie
PPTX
SQLfX Slide119
PDF
Descriptive statistics
PDF
Question Classification using Semantic, Syntactic and Lexical features
PPTX
Clustering part 1
PDF
Mixed Effects Models - Introduction
PDF
Statistical Distributions
PDF
Mixed Effects Models - Missing Data
Sherri Collie
SQLfX Slide119
Descriptive statistics
Question Classification using Semantic, Syntactic and Lexical features
Clustering part 1
Mixed Effects Models - Introduction
Statistical Distributions
Mixed Effects Models - Missing Data

What's hot (7)

PPTX
Schema learning
PDF
Data analysis Design Document
PDF
Mixed Effects Models - Fixed Effect Interactions
PDF
Mixed Effects Models - Random Intercepts
PPTX
Ds gate preorder
PDF
Mixed Effects Models - Level-2 Variables
DOCX
PSYCH 625 MENTOR Education Your Life / psych625mentor.com
Schema learning
Data analysis Design Document
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Random Intercepts
Ds gate preorder
Mixed Effects Models - Level-2 Variables
PSYCH 625 MENTOR Education Your Life / psych625mentor.com
Ad

Similar to Fuzzy clustering of sentence (20)

PPTX
Text clustering
PDF
Summarization using ntc approach based on keyword extraction for discussion f...
PDF
Crash-course in Natural Language Processing
PDF
Reviews on swarm intelligence algorithms for text document clustering
PPTX
Model of semantic textual document clustering
DOCX
Clustering sentence level text using a novel fuzzy relational clustering algo...
PPTX
Classification of CNN.com Articles using a TF*IDF Metric
DOC
Discovering Novel Information with sentence Level clustering From Multi-docu...
PPTX
Natural Language Processing
PPTX
Introduction to Text Mining
PPTX
ashu ppt final.pptx
PPTX
Topical_Facets
PPTX
Hierarchical clustering in Python and beyond
PDF
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
PPT
Cluster
PDF
Invited Talk: Early Detection of Research Topics
PPTX
Frontiers of Computational Journalism week 2 - Text Analysis
PDF
IDENTIFYING THE DAMAGE ASSESSMENT TWEETS DURING DISASTER
PDF
NLP Project Full Cycle
PPTX
TS4-3: Takumi Sato from Nagoya Institute of Technology
Text clustering
Summarization using ntc approach based on keyword extraction for discussion f...
Crash-course in Natural Language Processing
Reviews on swarm intelligence algorithms for text document clustering
Model of semantic textual document clustering
Clustering sentence level text using a novel fuzzy relational clustering algo...
Classification of CNN.com Articles using a TF*IDF Metric
Discovering Novel Information with sentence Level clustering From Multi-docu...
Natural Language Processing
Introduction to Text Mining
ashu ppt final.pptx
Topical_Facets
Hierarchical clustering in Python and beyond
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
Cluster
Invited Talk: Early Detection of Research Topics
Frontiers of Computational Journalism week 2 - Text Analysis
IDENTIFYING THE DAMAGE ASSESSMENT TWEETS DURING DISASTER
NLP Project Full Cycle
TS4-3: Takumi Sato from Nagoya Institute of Technology
Ad

More from Sikder Tahsin Al-Amin (10)

PPTX
de Bruijn Graph Construction from Combination of Short and Long Reads
PPTX
Distance Estimation by Constructing The Virtual Ruler in Anisotropic Sensor N...
PPT
Graphs - Discrete Math
PPTX
Combinational Logic with MSI and LSI
PPTX
Combinational Logic
PPTX
Simplification of Boolean Functions
PPTX
Boolean algebra
PPTX
Problem Solving Basics
PPTX
Cloud computing for education: A new dawn?
PPTX
Introduction to C++
de Bruijn Graph Construction from Combination of Short and Long Reads
Distance Estimation by Constructing The Virtual Ruler in Anisotropic Sensor N...
Graphs - Discrete Math
Combinational Logic with MSI and LSI
Combinational Logic
Simplification of Boolean Functions
Boolean algebra
Problem Solving Basics
Cloud computing for education: A new dawn?
Introduction to C++

Recently uploaded (20)

PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Artificial Intelligence
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
web development for engineering and engineering
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
Project quality management in manufacturing
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Well-logging-methods_new................
DOCX
573137875-Attendance-Management-System-original
PPTX
Sustainable Sites - Green Building Construction
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Artificial Intelligence
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
CYBER-CRIMES AND SECURITY A guide to understanding
Foundation to blockchain - A guide to Blockchain Tech
Operating System & Kernel Study Guide-1 - converted.pdf
additive manufacturing of ss316l using mig welding
Internet of Things (IOT) - A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
web development for engineering and engineering
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Project quality management in manufacturing
Geodesy 1.pptx...............................................
UNIT-1 - COAL BASED THERMAL POWER PLANTS
UNIT 4 Total Quality Management .pptx
Well-logging-methods_new................
573137875-Attendance-Management-System-original
Sustainable Sites - Green Building Construction

Fuzzy clustering of sentence

  • 1. CSE 4000: Project/Thesis A Study on Fuzzy Relational Clustering for Sentence-level Text
  • 2. A Study on Fuzzy Relational Clustering for Sentence-level Text • Presented by- Sikder Tahsin Al-Amin Mahade Hasan 0907018 0907047 • Thesis Supervisor- Dr. M.M.A. Hashem Professor Dept. of CSE,KUET
  • 3. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 4. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 5. Cluster Analysis • Creation of subsets (cluster) from documents. • Similar objects in a cluster.
  • 6. Clustering • Hard clustering – data belong to exactly one cluster. • Fuzzy (Soft) clustering – data belong more than one cluster – associated with a membership value
  • 7. Sentence Clustering • A sentence - related to more than one theme. • Hard clustering approaches are generally not applicable. • So, a fuzzy clustering algorithm is needed.
  • 8. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 9. General Problem Statement • Finding similar sentences in a document • Group them in a cluster.
  • 10. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 11. Objectives • Apply a fuzzy clustering algorithm on sentences. • Develop a sentence similarity method to measure similarity of sentences. • Evaluate the result on a quotation and news article dataset.
  • 12. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 13. Flow Chart of Proposed Method Expectation Maximization Determine cluster membership values 푚 푃푖 No Sentence Similarity Matrix 푆푖푗 Page Rank Algorithm Overall Sentence Similarity 푆푠푖푚(푖,푗) WordNet Database Expected Result? Stop Document (Collection of sentences) 푆푖 , i=1,2….n Word-to-word similarity 푆푤표푟푑 Order similarity 푆표푟푑푒푟 Yes Fig-1: Flow Chart of the proposed method
  • 14. Sentence Similarity Method Word-to-word similarity Order vector 1 Order vector 2 Order similarity WordNet Database Sentence 1 Sentence 2 Sentence Similarity Fig 2: Sentence Similarity Computation Diagram
  • 15. Word-to-Word Similarity • Similarity between each word of 1st sentence with each word of 2nd sentence is added. • Word Similarity - Jiang-Conrath similarity measure (JnC) is used.
  • 16. Word-to-Word Similarity • S1=푤11, 푤12 … 푤1푛 • S2=푤21, 푤22 … 푤2푛 • 푆푤표푟푑 = 퐽푛퐶(푤11, 푤21) + ⋯ + 퐽푛퐶(푤11, 푤2푛) +… 퐽푛퐶(푤1푛, 푤21) + ⋯ + 퐽푛퐶(푤1푛, 푤2푛)
  • 17. Order Similarity • S1: A dog jumps over the fox. S2: A fox jumps over the dog. • Joint word set= {A dog jumps over the fox} 1 2 3 4 5 6 • r1={ 1 2 3 4 5 6} r2={ 1 6 3 4 5 2} • 푆표푟푑푒푟 = 1 − | 푟1−푟2 | | 푟1+푟2 |
  • 18. Overall Sentence Similarity • 푆푠푖푚(푠1,푠2) = 푟 ∗ 푆푤표푟푑 + 1 − 푟 ∗ 푆표푟푑푒푟 • Where, 0.5 < 푟 ≤ 1
  • 19. Affinity Matrix 푚 = 푠푖푗 ∗ 푝푖 • 푤푖푗 푚 ∗ 푝푗 푚 • 푤푚 푖푗 = weight between objects i and j in cluster m • 푠푖푗 = similarity between objects i and j • 푝푖 푚 =the membership value of object i to cluster m
  • 20. Initialization • Number of clusters. • Membership values and PageRank values are randomly initialized. • Uniform Random Number generation is used.
  • 21. PageRank Algorithm • PageRank - used to rank webpages. • It measures the importance of website pages • Here Modified PageRank is used to determine the importance of a sentence in a document.
  • 22. Modified PageRank Algorithm 푃푅 푉푖 = 1 − 푑 + 푑 ∗ 푁 (푊푗푖 ∗ 푗=1 푃푅 푉푗 푁 푊푗푘 푘=1 ) dumping factor, d = 0.85
  • 23. Expectation-Maximization Algorithm • Expectation step (E-step) & • Maximization step (M-step) - iterated until convergence. • E-step computes the cluster membership values. • M-step updates the membership function.
  • 24. Selecting Output Clusters • Output cluster – For which the membership value is highest.
  • 25. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 26. Clustering Quotation Dataset Knowledge Class 1. Our knowledge can only be finite, while our ignorance must necessarily be infinite. 2. Everybody gets so much common information all day long that they lose their common sense…. Marriage Class 11. A husband is what is left of a lover, after the nerve has been extracted. 12. Marriage has many pains, but celibacy has no pleasures….. Nature Class 21. I have called this principle by which each slight variation if useful is preserved by the term natural selection. 22. Nature is reckless of the individual. When she has points to carry, she carries them….. Peace Class 31. There is no such thing as inner peace there is only nervousness and death. 32. Once you hear the details of victory, it is hard to distinguish it from a defeat….. Food Class 41. Food is an important part of a balanced diet. 42. To eat well in England you should have breakfast three times a day…..
  • 27. Result No. of Cluster Purity Entropy Rand F-Measure Our Method 4 5 6 7 .350 .400 .400 .400 .550 .550 .445 .550 .750 .795 .765 .770 .380 .400 .350 .340 ARCA 4 5 6 7 .543 .622 .680 .678 .646 .515 .451 .444 .710 .786 .815 .817 .391 .459 .462 .427 Spectral Clust. 4 5 6 7 .641 .652 .690 .699 .472 .508 .475 .429 .795 .785 .800 .809 .544 .477 .444 .431 K-Medoids 4 5 6 7 .560 .600 .720 .740 .605 .569 .457 .425 .691 .733 .779 .805 .392 .433 .459 .461
  • 28. Clustering on News Articles News Article 1. President Barak Obama on Tuesday championed nuclear energy expansion as the latest way that feuding parties can move beyond the broken politics 2. The economic stimulus plan against Republican criticism reflecting White House will remain aggressive in selling its own case to the public 3. Since a January special election in Masssachusetts Obama has recalibrated his strategy to advance has agenda ………. 30. A CBS news poll in early February found eighty one percent saying it’s time to elect new people to congress 31. That affects Obama who is not up for reelection but needs allies and votes on Capitol Hill to usher in the domestic change he has promised.
  • 29. Clustering on News Articles Cluster Number Sentences 1 3,5,6,9,10,13,14,15,16,17,19,21,24 2 2,7,12,18,20,22,25,26,27,28,29 3 1,8,23,30
  • 30. Clustering on News Articles News Article Dataset 12. That mission however remains in doubt. 22. Obama in working to change that system he is required to work within it. 26. But the white house is taking an approach that is at once more aggressive and more streamlined. 28. The intended narrative is one in which Obama hears people’s frustrations and is working directly to end them. 29. There is a little doubt the public is angry. • Reside in cluster 2 (Negative)
  • 31. Outline • Introduction • General Problem Statement • Objectives • Methodology • Result & Discussion • Future Work
  • 32. Scope of Future Research • Perform Hierarchical fuzzy clustering. • Updating text preprocessing step.
  • 33. References • A Skabar and K Abdalgader, “Clustering sentence level text using a novel fuzzy relational clustering algorithm,” IEEE Transaction on knowledge and data engineering, VOL. 25, NO. 1, Jan 2013. • Y. Li, D. McLean, Z.A. Bandar, J.D. O’Shea, and K. Crockett, “Sentence Similarity Based on Semantic Nets and Corpus Statistics,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 8, pp. 1138-1150, Aug. 2006 • J.J. Jiang and D.W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” Proc. 10th Int’l Conf. Research in Computational Linguistics, pp. 19-33, 1997. • Web resources

Editor's Notes

  • #6: Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters)
  • #7: In hard clustering, data is belonged to exactly one cluster. In fuzzy clustering, the data objects can belong to more than one cluster, and each point associated with membership grade.
  • #8: A sentence is likely to be related to more than one theme or topic present within a document. So a fuzzy clustering algorithm is needed that operates on relational input data.
  • #10: Finding similar sentences from a document and group them in a cluster.
  • #12: Main objective is to work on text data such as sentences and to find similar sentences and group them in a cluster.   Evaluate the result on a quotation and news article dataset Develop a sentence similarity method and measure the similarity of sentences.
  • #19: Where, 0.5<𝑟≤1 which means word-to-word similarity plays vital role in overall similarity