SlideShare a Scribd company logo
K-MEAN CLUSTER
BY CHENG ZHAN
HOUSTON MACHINE LEARNING MEETUP
1/7/2017
K means and dbscan
K means and dbscan
K means and dbscan
INTRODUCTION
• K-means (MacQueen, 1967) is one of the simplest
unsupervised learning algorithms that solve the well known
clustering problem.
• The main idea is to define k centroids, one for each
cluster.
• Input
• M(set of points)
• k(number of clusters)
• Output
• μ_1 , …, μ_k (cluster centroids)
• k-Means clusters the M point into K clusters by minimizing the
squared error function
μ
HOW TO PICK K
K-MEAN ALGORITHM
• 0. Initialize cluster centers
• 1. Assign observations to closest
cluster center
• 2. Revise cluster centers as mean of
assigned observations
• 3. Repeat 1&2 until convergence
K-MEAN ALGORITHM
• 0. Initialize cluster centers
• 1. Assign observations to closest
cluster center
• 2. Revise cluster centers as mean of
assigned observations
• 3. Repeat 1&2 until convergence
K-MEAN ALGORITHM
• 0. Initialize cluster centers
• 1. Assign observations to closest
cluster center
• 2. Revise cluster centers as mean of
assigned observations
• 3. Repeat 1&2 until convergence
K-MEAN ALGORITHM
• 0. Initialize cluster centers
• 1. Assign observations to closest
cluster center
• 2. Revise cluster centers as mean of
assigned observations
• 3. Repeat 1&2 until convergence
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
GOOD INITIAL POINTS
UNLUCKY
K-MEANS IN PRACTICE
• How to choose initial centroids
• select randomly among the data points
• generate completely randomly
• How to choose k
• study the data
• run k-Means for different k (measure squared error for each k)
• Run k-means many times!
• Get many choices of initial points
WHAT ABOUT THIS?
QUESTIONS
• Euclidean distance results in spherical clusters
• What cluster shape does the Manhattan distance give?
• Think of other distance measures. What cluster shapes
will those yield?
DBSCAN
DENSITY-BASED SPATIAL CLUSTERING OF APPLICATION
WITH NOISE
• DBSCAN is a Density-Based Clustering algorithm
• In density based clustering we partition points into dense regions separated
by not-so-dense regions.
• Important Questions:
• How do we measure density and what is a dense region?
• DBSCAN:
• Density at point p: number of points within a circle of radius Eps
• Dense Region: A circle of radius Eps that contains at least MinPts points
WHEN DBSCAN WORKS WELL
WHEN DBSCAN WORKS WELL
DBSCAN
DEFINITIONS
REACHABILITY AND CONNECTIVITY
DBSCAN ALGORITHM: EXAMPLE
DBSCAN ALGORITHM: EXAMPLE
DBSCAN ALGORITHM: EXAMPLE
DETERMINING EPS & MINPTS
• Idea is that for points in a cluster, their kth nearest neighbors
are at roughly the same distance
• Noise points have the kth nearest neighbor at farther distance
• So, plot sorted distance of every point to its kth nearest
neighbor
• Find the distance d where there is a “knee” in the curve
• Eps = d, MinPts = k
K means and dbscan
SENSITIVE TO PARAMETERS
SENSITIVE TO PARAMETERS
APPLICATIONS
DISTANCE METRIC FOR DOCUMENTS
• Motivations
• Identical – easy
• Modified or related (Ex: DNA, Plagiarism, Authorship)
• Did Francis Bacon write Shakespeare’s plays
DOCUMENT RETRIEVAL
CHALLENGES
• How do we measure similarity
• How do we search over articles
DOCUMENT REPRESENTATION
• Word count document representation
• Bag of words model
• Ignore order of words
• Count # of instances of each word in vocabulary
EXAMPLE
• Word: Sequence of alphanumeric characters. For example, the phrase “6.006
is fun” has 4 words.
• Word Frequencies: Word frequency D(w) of a given word w is the number of
times it occurs in a document D.
• For example, the words and word frequencies for the above phrase are as
below: Word 6 The Is 006 Easy Fun
Count 1 0 1 1 0 1
METRIC
• d(x,x) = 0
• d(x,y) = d(y,x)
• d(x,y) + d(y,z) >= d(x,z)
METRIC
• Inner product of the vectors D1 andD2 containing the word frequencies
for all words in the 2 documents. Equivalently, this is the projection of
vectors D1 onto D2 or vice versa. Mathematically this is expressed as:
D1 ·D2 = ∑ D1(w) .D2(w)
• Angle Metric: The angle between the vectors D1 and D2 gives an
indication of overlap between the 2 documents. Mathematically this
angle is expressed as:
θ(D1,D2) = arccos (
𝐷1.𝐷2
| 𝐷1 |∗| 𝐷2 |
)
PYTHON EXAMPLE
• https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/
docdist2.py
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
K means and dbscan
REFERENCE
• http://www.cs.haifa.ac.il/~rita/uml_course/lectures/kmeans.pdf
• https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringA
nalysis.pdf
• http://www.it.uu.se/edu/course/homepage/infoutv/ht09/a2t.pdf
• http://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_de
nsity.pdf
• Machine Learning Specialization by University of Washington in
Coursera

More Related Content

PPTX
Dbscan algorithom
PPT
K mean-clustering algorithm
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPTX
Kmeans
PPTX
Knn 160904075605-converted
PDF
Hierarchical clustering
PPTX
Presentation on K-Means Clustering
Dbscan algorithom
K mean-clustering algorithm
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Kmeans
Knn 160904075605-converted
Hierarchical clustering
Presentation on K-Means Clustering

What's hot (20)

PPTX
Decision tree induction
PDF
Hierarchical Clustering
PDF
Density Based Clustering
PPTX
Unsupervised learning (clustering)
PPTX
Text similarity measures
PPTX
DBSCAN (1) (4).pptx
PPTX
Clusters techniques
PPTX
Density based clustering
PPTX
Lecture optimal binary search tree
PPTX
K MEANS CLUSTERING
PPT
Cluster analysis
PPT
2.2 decision tree
PPTX
Unsupervised learning
PDF
K - Nearest neighbor ( KNN )
PDF
Classification Based Machine Learning Algorithms
PPTX
Introduction to Clustering algorithm
PPTX
K-means clustering algorithm
PPTX
Clustering, k-means clustering
PPTX
Hierarchical clustering
Decision tree induction
Hierarchical Clustering
Density Based Clustering
Unsupervised learning (clustering)
Text similarity measures
DBSCAN (1) (4).pptx
Clusters techniques
Density based clustering
Lecture optimal binary search tree
K MEANS CLUSTERING
Cluster analysis
2.2 decision tree
Unsupervised learning
K - Nearest neighbor ( KNN )
Classification Based Machine Learning Algorithms
Introduction to Clustering algorithm
K-means clustering algorithm
Clustering, k-means clustering
Hierarchical clustering
Ad

Viewers also liked (7)

PPTX
K means clustering
PPTX
Cardiac Image Analysis based on K Means Clustering
PPTX
A study and comparison of different image segmentation algorithms
PPT
Intro to MATLAB and K-mean algorithm
PPTX
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT
Image segmentation ppt
PPTX
IMAGE SEGMENTATION.
K means clustering
Cardiac Image Analysis based on K Means Clustering
A study and comparison of different image segmentation algorithms
Intro to MATLAB and K-mean algorithm
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
Image segmentation ppt
IMAGE SEGMENTATION.
Ad

Similar to K means and dbscan (20)

PPTX
K-Means Clustering Presentation Slides for Machine Learning Course
PPTX
Deep Learning Bangalore meet up
PPTX
DLBLR talk
PDF
Machine Learning Foundations for Professional Managers
PPTX
Mathematics online: some common algorithms
PDF
KNN Neural Network In minimax search, alpha-beta pruning can be applied to pr...
PPTX
CNN for modeling sentence
PPT
presentation related to artificial intelligence.ppt
PPT
presentation on artificial intelligence autosaved
PPTX
Fast Single-pass K-means Clusterting at Oxford
PPTX
[Paper Reading] Attention is All You Need
PDF
stable_diffusion_a_tutorial, How stable_diffusion works, build stable_diffusi...
PDF
DMTM Lecture 11 Clustering
PDF
09_dm1_knn_2022_23.pdf
PDF
6 clustering
PPTX
Cyclic quorum
PPTX
Data Mining Lecture_7.pptx
PDF
lec9_annotated.pdf ml csci 567 vatsal sharan
PPT
Sudoku
K-Means Clustering Presentation Slides for Machine Learning Course
Deep Learning Bangalore meet up
DLBLR talk
Machine Learning Foundations for Professional Managers
Mathematics online: some common algorithms
KNN Neural Network In minimax search, alpha-beta pruning can be applied to pr...
CNN for modeling sentence
presentation related to artificial intelligence.ppt
presentation on artificial intelligence autosaved
Fast Single-pass K-means Clusterting at Oxford
[Paper Reading] Attention is All You Need
stable_diffusion_a_tutorial, How stable_diffusion works, build stable_diffusi...
DMTM Lecture 11 Clustering
09_dm1_knn_2022_23.pdf
6 clustering
Cyclic quorum
Data Mining Lecture_7.pptx
lec9_annotated.pdf ml csci 567 vatsal sharan
Sudoku

More from Yan Xu (20)

PPTX
Kaggle winning solutions: Retail Sales Forecasting
PDF
Basics of Dynamic programming
PPTX
Walking through Tensorflow 2.0
PPTX
Practical contextual bandits for business
PDF
Introduction to Multi-armed Bandits
PDF
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
PDF
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
PDF
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
PDF
Introduction to Autoencoders
PPTX
State of enterprise data science
PDF
Long Short Term Memory
PDF
Deep Feed Forward Neural Networks and Regularization
PPTX
Linear algebra and probability (Deep Learning chapter 2&3)
PPTX
HML: Historical View and Trends of Deep Learning
PDF
Secrets behind AlphaGo
PPTX
Optimization in Deep Learning
PDF
Introduction to Recurrent Neural Network
PDF
Convolutional neural network
PDF
Introduction to Neural Network
Kaggle winning solutions: Retail Sales Forecasting
Basics of Dynamic programming
Walking through Tensorflow 2.0
Practical contextual bandits for business
Introduction to Multi-armed Bandits
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Introduction to Autoencoders
State of enterprise data science
Long Short Term Memory
Deep Feed Forward Neural Networks and Regularization
Linear algebra and probability (Deep Learning chapter 2&3)
HML: Historical View and Trends of Deep Learning
Secrets behind AlphaGo
Optimization in Deep Learning
Introduction to Recurrent Neural Network
Convolutional neural network
Introduction to Neural Network

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Global journeys: estimating international migration
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Quality review (1)_presentation of this 21
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Fluorescence-microscope_Botany_detailed content
Launch Your Data Science Career in Kochi – 2025
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Global journeys: estimating international migration
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
1_Introduction to advance data techniques.pptx
Reliability_Chapter_ presentation 1221.5784
Quality review (1)_presentation of this 21
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
IBA_Chapter_11_Slides_Final_Accessible.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

K means and dbscan