SlideShare a Scribd company logo
CLUSTERING
• Organizing data into classes such that there is
• high intra-class similarity
• low inter-class similarity
• Finding the class labels and the number of classes directly from the
data (in contrast to classification).
• More informally, finding natural groupings among objects.
What is Clustering?
Also called unsupervised learning, sometimes called classification by
statisticians and sorting by psychologists and segmentation by people in
marketing
Defining Distance Measures
Definition: Let O1 and O2 be two objects from the universe of possible objects. The distance (dissimilarity)
between O1 and O2 is a real number denoted by D(O1,O2)
0.23 3 342.7
Peter Piotr
What properties should a distance measure have?
• D(A,B) = D(B,A) Symmetry
• D(A,A) = 0 Constancy of Self-Similarity
• D(A,B) = 0 IIf A= B Positivity (Separation)
• D(A,B)  D(A,C) + D(B,C) Triangular Inequality
Peter Piotr
3
d('', '') = 0 d(s, '') = d('',
s) = |s| -- i.e. length of
s d(s1+ch1, s2+ch2) =
min( d(s1, s2) + if
ch1=ch2 then 0 else 1
fi, d(s1+ch1, s2) + 1,
d(s1, s2+ch2) + 1 )
When we peek inside one of these black
boxes, we see some function on two
variables. These functions might very
simple or very complex.
In either case it is natural to ask, what
properties should these functions have?
Intuitions behind desirable distance
measure properties
D(A,B) = D(B,A) Symmetry
Otherwise you could claim “Alex looks like Bob, but Bob looks nothing like Alex.”
D(A,A) = 0 Constancy of Self-Similarity
Otherwise you could claim “Alex looks more like Bob, than Bob does.”
D(A,B) = 0 IIf A=B Positivity (Separation)
Otherwise there are objects in your world that are different, but you cannot tell apart.
D(A,B)  D(A,C) + D(B,C) Triangular Inequality
Otherwise you could claim “Alex is very like Bob, and Alex is very like Carl, but Bob
is very unlike Carl.”
Two Types of Clustering
Hierarchical
• Partitional algorithms: Construct various partitions and then evaluate them by some
criterion (we will see an example called BIRCH)
• Hierarchical algorithms: Create a hierarchical decomposition of the set of objects
using some criterion
Partitional
Desirable Properties of a Clustering Algorithm
• Scalability (in terms of both time and space)
• Ability to deal with different data types
• Minimal requirements for domain knowledge to determine
input parameters
• Able to deal with noise and outliers
• Insensitive to order of input records
• Incorporation of user-specified constraints
• Interpretability and usability
Note that hierarchies are commonly used to organize information, for example
in a web portal.
Yahoo’s hierarchy is manually created, we will focus on automatic creation of
hierarchies in data mining.
Pedro (Portuguese/Spanish)
Petros (Greek), Peter (English), Piotr (Polish), Peadar
(Irish), Pierre (French), Peder (Danish), Peka
(Hawaiian), Pietro (Italian), Piero (Italian Alternative),
Petr (Czech), Pyotr (Russian)
ANGUILLAAUSTRALIA
St. Helena &
Dependencie
South Georgia &
South Sandwich
Islands U.K.
Serbia &
Montenegro
(Yugoslavia) FRANCE NIGER INDIA IRELAND BRAZIL
Hierarchal clustering can sometimes show
patterns that are meaningless or spurious
• For example, in this clustering, the tight grouping of Australia, Anguilla, St. Helena etc is
meaningful, since all these countries are former UK colonies.
• However the tight grouping of Niger and India is completely spurious, there is no
connection between the two.
Partitional Clustering
• Nonhierarchical, each instance is placed in
exactly one of K nonoverlapping clusters.
• Since only one set of clusters is output, the user
normally has to input the desired number of
clusters K.
Algorithm k-means
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if necessary).
3. Decide the class memberships of the N objects by assigning
them to the nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the
memberships found above are correct.
5. If none of the N objects changed membership in the last
iteration, exit. Otherwise goto 3.
Comments on the K-Means Method
 Strength
 Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is #
iterations. Normally, k, t << n.
 Often terminates at a local optimum. The global optimum may be found
using techniques such as: deterministic annealing and genetic algorithms
 Weakness
 Applicable only when mean is defined, then what about categorical data?
 Need to specify k, the number of clusters, in advance
 Unable to handle noisy data and outliers
 Not suitable to discover clusters with non-convex shapes
The K- Medoids Clustering Method
 Find representative objects, called medoids, in clusters
 PAM (Partitioning Around Medoids, 1987)
 starts from an initial set of medoids and iteratively replaces one of
the medoids by one of the non-medoids if it improves the total
distance of the resulting clustering
 PAM works effectively for small data sets, but does not scale well for
large data sets
Nearest Neighbor Clustering
Not to be confused with Nearest Neighbor Classification
• Items are iteratively merged into the
existing clusters that are closest.
• Incremental
• Threshold, t, used to determine if items are
added to existing clusters or a new cluster is
created.
Partitional Clustering Algorithms
 Clustering algorithms have been designed to handle very
large datasets
 E.g. the Birch algorithm
• Main idea: use an in-memory R-tree to store points that are being
clustered
• Insert points one at a time into the R-tree, merging a new point
with an existing cluster if is less than some  distance away
• If there are more leaf nodes than fit in memory, merge existing
clusters that are close to each other
• At the end of first pass we get a large number of clusters at the
leaves of the R-tree
 Merge clusters to reduce the number of clusters
Partitional Clustering Algorithms
 The Birch algorithm
R10 R11 R12
R1 R2 R3 R4 R5 R6 R7 R8 R9
Data nodes containing points
R10 R11
R12
Partitional Clustering Algorithms
 The Birch algorithm
R10 R11 R12
{R1,R2} R3 R4 R5 R6 R7 R8 R9
Data nodes containing points
R10 R11
R12
10
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
How can we tell the right number of clusters?
In general, this is a unsolved problem. However there are many approximate methods. In
the next few slides we will see an example.
For our example, we will use the familiar
katydid/grasshopper dataset.
However, in this case we are imagining
that we do NOT know the class labels.
We are only clustering on the X and Y
axis values.

More Related Content

PPT
Lect4
PPTX
Unsupervised Learning
PPTX
Kmeans
PPT
Dataa miining
PPTX
Unsupervised Learning
PDF
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
PPTX
Document clustering and classification
PPTX
K-means clustering algorithm
Lect4
Unsupervised Learning
Kmeans
Dataa miining
Unsupervised Learning
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
Document clustering and classification
K-means clustering algorithm

What's hot (20)

PPTX
Large Scale Data Clustering: an overview
PPTX
K-Means clustring @jax
PDF
K means Clustering
PPT
Machine Learning Project
PDF
An improvement in k mean clustering algorithm using better time and accuracy
PPT
Clustering in artificial intelligence
PPT
Clustering
PDF
Clustering: A Survey
PPTX
K MEANS CLUSTERING
PDF
Big data Clustering Algorithms And Strategies
PPT
3.2 partitioning methods
PPT
3.5 model based clustering
PPTX
K means clustering | K Means ++
PDF
Current clustering techniques
PPTX
Grid based method & model based clustering method
PPTX
Classification and Clustering
PPTX
05 k-means clustering
PPTX
K-means Clustering
PPT
Large Scale Data Clustering: an overview
K-Means clustring @jax
K means Clustering
Machine Learning Project
An improvement in k mean clustering algorithm using better time and accuracy
Clustering in artificial intelligence
Clustering
Clustering: A Survey
K MEANS CLUSTERING
Big data Clustering Algorithms And Strategies
3.2 partitioning methods
3.5 model based clustering
K means clustering | K Means ++
Current clustering techniques
Grid based method & model based clustering method
Classification and Clustering
05 k-means clustering
K-means Clustering
Ad

Similar to Unit3 (20)

PPT
K means Clustering Algorithm
PPTX
06-Clustering.pptx
PDF
Clustering.pdf
PPTX
unitvclusteranalysis-221214135407-1956d6ef.pptx
PPT
26-Clustering MTech-2017.ppt
PPTX
Clustering in Machine Learning, a process of grouping.
PPTX
Advanced database and data mining & clustering concepts
PPTX
machine learning - Clustering in R
PDF
ch_5_dm clustering in data mining.......
PDF
CLUSTERING IN DATA MINING.pdf
PPTX
Hierarchical methods navdeep kaur newww.pptx
PPTX
Data mining techniques unit v
PPTX
DS9 - Clustering.pptx
PPTX
UNIT_V_Cluster Analysis.pptx
PPTX
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
PPTX
Machine Learning : Clustering - Cluster analysis.pptx
PDF
Chapter 5.pdf
PPTX
Unsupervised learning Modi.pptx
PDF
PR-284: End-to-End Object Detection with Transformers(DETR)
PPTX
Clusters techniques
K means Clustering Algorithm
06-Clustering.pptx
Clustering.pdf
unitvclusteranalysis-221214135407-1956d6ef.pptx
26-Clustering MTech-2017.ppt
Clustering in Machine Learning, a process of grouping.
Advanced database and data mining & clustering concepts
machine learning - Clustering in R
ch_5_dm clustering in data mining.......
CLUSTERING IN DATA MINING.pdf
Hierarchical methods navdeep kaur newww.pptx
Data mining techniques unit v
DS9 - Clustering.pptx
UNIT_V_Cluster Analysis.pptx
Lecture 9 -Clustering(ML algorithms: Clustering, KNN, DBScan).pptx
Machine Learning : Clustering - Cluster analysis.pptx
Chapter 5.pdf
Unsupervised learning Modi.pptx
PR-284: End-to-End Object Detection with Transformers(DETR)
Clusters techniques
Ad

Recently uploaded (20)

PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharma ospi slides which help in ospi learning
PDF
Complications of Minimal Access Surgery at WLH
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
Institutional Correction lecture only . . .
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
RMMM.pdf make it easy to upload and study
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Types and Its function , kingdom of life
Pharma ospi slides which help in ospi learning
Complications of Minimal Access Surgery at WLH
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Final Presentation General Medicine 03-08-2024.pptx
Classroom Observation Tools for Teachers
Institutional Correction lecture only . . .
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Microbial disease of the cardiovascular and lymphatic systems
VCE English Exam - Section C Student Revision Booklet
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial diseases, their pathogenesis and prophylaxis
PPH.pptx obstetrics and gynecology in nursing
Week 4 Term 3 Study Techniques revisited.pptx
RMMM.pdf make it easy to upload and study
Renaissance Architecture: A Journey from Faith to Humanism
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx

Unit3

  • 2. • Organizing data into classes such that there is • high intra-class similarity • low inter-class similarity • Finding the class labels and the number of classes directly from the data (in contrast to classification). • More informally, finding natural groupings among objects. What is Clustering? Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by people in marketing
  • 3. Defining Distance Measures Definition: Let O1 and O2 be two objects from the universe of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2) 0.23 3 342.7 Peter Piotr
  • 4. What properties should a distance measure have? • D(A,B) = D(B,A) Symmetry • D(A,A) = 0 Constancy of Self-Similarity • D(A,B) = 0 IIf A= B Positivity (Separation) • D(A,B)  D(A,C) + D(B,C) Triangular Inequality Peter Piotr 3 d('', '') = 0 d(s, '') = d('', s) = |s| -- i.e. length of s d(s1+ch1, s2+ch2) = min( d(s1, s2) + if ch1=ch2 then 0 else 1 fi, d(s1+ch1, s2) + 1, d(s1, s2+ch2) + 1 ) When we peek inside one of these black boxes, we see some function on two variables. These functions might very simple or very complex. In either case it is natural to ask, what properties should these functions have?
  • 5. Intuitions behind desirable distance measure properties D(A,B) = D(B,A) Symmetry Otherwise you could claim “Alex looks like Bob, but Bob looks nothing like Alex.” D(A,A) = 0 Constancy of Self-Similarity Otherwise you could claim “Alex looks more like Bob, than Bob does.” D(A,B) = 0 IIf A=B Positivity (Separation) Otherwise there are objects in your world that are different, but you cannot tell apart. D(A,B)  D(A,C) + D(B,C) Triangular Inequality Otherwise you could claim “Alex is very like Bob, and Alex is very like Carl, but Bob is very unlike Carl.”
  • 6. Two Types of Clustering Hierarchical • Partitional algorithms: Construct various partitions and then evaluate them by some criterion (we will see an example called BIRCH) • Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion Partitional
  • 7. Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with different data types • Minimal requirements for domain knowledge to determine input parameters • Able to deal with noise and outliers • Insensitive to order of input records • Incorporation of user-specified constraints • Interpretability and usability
  • 8. Note that hierarchies are commonly used to organize information, for example in a web portal. Yahoo’s hierarchy is manually created, we will focus on automatic creation of hierarchies in data mining.
  • 9. Pedro (Portuguese/Spanish) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian)
  • 10. ANGUILLAAUSTRALIA St. Helena & Dependencie South Georgia & South Sandwich Islands U.K. Serbia & Montenegro (Yugoslavia) FRANCE NIGER INDIA IRELAND BRAZIL Hierarchal clustering can sometimes show patterns that are meaningless or spurious • For example, in this clustering, the tight grouping of Australia, Anguilla, St. Helena etc is meaningful, since all these countries are former UK colonies. • However the tight grouping of Niger and India is completely spurious, there is no connection between the two.
  • 11. Partitional Clustering • Nonhierarchical, each instance is placed in exactly one of K nonoverlapping clusters. • Since only one set of clusters is output, the user normally has to input the desired number of clusters K.
  • 12. Algorithm k-means 1. Decide on a value for k. 2. Initialize the k cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. 5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3.
  • 13. Comments on the K-Means Method  Strength  Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n.  Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms  Weakness  Applicable only when mean is defined, then what about categorical data?  Need to specify k, the number of clusters, in advance  Unable to handle noisy data and outliers  Not suitable to discover clusters with non-convex shapes
  • 14. The K- Medoids Clustering Method  Find representative objects, called medoids, in clusters  PAM (Partitioning Around Medoids, 1987)  starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering  PAM works effectively for small data sets, but does not scale well for large data sets
  • 15. Nearest Neighbor Clustering Not to be confused with Nearest Neighbor Classification • Items are iteratively merged into the existing clusters that are closest. • Incremental • Threshold, t, used to determine if items are added to existing clusters or a new cluster is created.
  • 16. Partitional Clustering Algorithms  Clustering algorithms have been designed to handle very large datasets  E.g. the Birch algorithm • Main idea: use an in-memory R-tree to store points that are being clustered • Insert points one at a time into the R-tree, merging a new point with an existing cluster if is less than some  distance away • If there are more leaf nodes than fit in memory, merge existing clusters that are close to each other • At the end of first pass we get a large number of clusters at the leaves of the R-tree  Merge clusters to reduce the number of clusters
  • 17. Partitional Clustering Algorithms  The Birch algorithm R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 Data nodes containing points R10 R11 R12
  • 18. Partitional Clustering Algorithms  The Birch algorithm R10 R11 R12 {R1,R2} R3 R4 R5 R6 R7 R8 R9 Data nodes containing points R10 R11 R12
  • 19. 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 How can we tell the right number of clusters? In general, this is a unsolved problem. However there are many approximate methods. In the next few slides we will see an example. For our example, we will use the familiar katydid/grasshopper dataset. However, in this case we are imagining that we do NOT know the class labels. We are only clustering on the X and Y axis values.