SlideShare a Scribd company logo
MACHINE LEARNING Clustering
WHAT’S IN THE MENU - RECOMMENDATIONS
1. Why so popular
2. Supervised vs Unsupervised Learning
3. Topic2
4. Topic3
5. Topic4
6. Wrap-up
MACHINE LEARNING
http://videolectures.net/Top/Computer_Science/Machine_Learning/
WHY IS MACHINE LEARNING (CS 229) THE MOST
POPULAR COURSE AT STANFORD? - ANDREW NG
WHAT CAN YOU TELL ME ABOUT X?
Supervised vs unsupervised learning
Typical methods: regression and classification
Given an object with observed set of features X1, …., Xn
having an response Y, the goal is to predict Y using X1,
…., Xn
Typical methods: principal component analysis (PCA),
expectation maximization (EM) and clustering (k-means
and its variations)
Given an object with observed set of features X1, …., Xn,
the goal is to discover relationships or groups between
variables or observations. Clustering algorithms try to find
natural grouping in data and therefore similar datasets.
APPLICATIONS
Market segmentation : given market research results, how you can find the best
customer segments
Anomaly detection : find fraud, detect network attacks, or discover problems in
servers or other sensor-equipped machinery. Is important to be able to find new
types of anomalies that have never seen before.
Healthcare: accident prone factor of the area to hospital assignment, gene clustering
GROUPING UNLABELED ITEMS USING K-MEANS
CLUSTERING
SWAT
Strengths :
Will always converge
Scales well
Weakness :
Can converge at local minima
Slow on very large datasets
Choosing the wrong k
Advantages :
Easy to implement
GROUPING UNLABELED ITEMS USING K-MEANS
CLUSTERING
SIMILARITY
There are several ways on measuring similarity between observations.
Manhattan distance
Euclidian distance
Cosine distance
K-MEANS PSEUDO CODE
Randomly create k points for starting centroids
----------------------------------------------------------------
For every point assigned to a centroid
Calculate the distance between the centroid and point
Assign the point to the cluster with the lowest distance
----------------------------------------------------------------
For every cluster calculate the mean of the points in that cluster
Assign the centroid to the mean
While any point has changed cluster assignment
Repeat until convergence
Cluster assignment
step
Move centroid
step
COST FUNCTION & RANDOM INITIALIZATION
for i = 1 to 100 {
randomly initialize k-means
run k-means and get centroids positions c(1 to m) and µ(1 to K)
compute cost function J(c(1 to m), µ(1 to K))
}
Pick clustering that gave lowest J(c(1 to m), µ(1 to K))
Cluster assignment step: minimize J c(1 to m) while holding µ(1 to K) fixed
Move centroid step: minimize J with respect to µ(1 to K)
PERFORMANCE CONSIDERATION
K-means
The K-means has the computational complexity of O(iKnm),
i is the number of iterations,
K the number of clusters,
n the number of observations,
m the number of features.
Improvements:
•Reducing the average number of iterations.
•Parallel implementation of K-means by leveraging Hadoop or Spark.
•Reducing the number of outliers and possible features by noise filtering with a smoothing
algorithm.
•Decreasing the dimensions of the model.
FRAMEWORKS
Java : Weka, Mahout, spark
Python: scikit-learn, py-spark, Pylearn2 (Theano)
C ++: Shogun
.NET: Encog
https://github.com/josephmisiti/awesome-machine-learning
PLATFORMS - IBM BLUEMIX
PLATFORMS – MICROSOFT AZURE ML
REFERENCES
http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/
http://www-bcf.usc.edu/~gareth/ISL/
BOOKS

More Related Content

PPT
Clustering
PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PPTX
K-means clustering algorithm
PDF
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
PPTX
K means clustering algorithm
PPT
PPTX
Unsupervised Learning
PPTX
Kmeans
Clustering
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
K-means clustering algorithm
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K means clustering algorithm
Unsupervised Learning
Kmeans

What's hot (20)

PPT
Enhance The K Means Algorithm On Spatial Dataset
PDF
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
PPT
Uninformed search
PPT
K mean-clustering
PPT
Jarrar: Games
PPTX
K-Means clustring @jax
PPTX
Clustering
PPT
K means Clustering Algorithm
PPTX
K means clustering | K Means ++
PPTX
Clustering, k-means clustering
PDF
Bigdata analytics
PPTX
Hierarchical Clustering
PDF
K means clustering
PPTX
Types of clustering and different types of clustering algorithms
PPTX
K-means Clustering
PPTX
Clique and sting
PPTX
Clique
PPT
Cure, Clustering Algorithm
PPT
Anfis (1)
Enhance The K Means Algorithm On Spatial Dataset
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Uninformed search
K mean-clustering
Jarrar: Games
K-Means clustring @jax
Clustering
K means Clustering Algorithm
K means clustering | K Means ++
Clustering, k-means clustering
Bigdata analytics
Hierarchical Clustering
K means clustering
Types of clustering and different types of clustering algorithms
K-means Clustering
Clique and sting
Clique
Cure, Clustering Algorithm
Anfis (1)
Ad

Viewers also liked (8)

PPTX
Machine learning clustering
PDF
Clustering tutorial
PDF
Machine Learning and Data Mining: 06 Clustering: Introduction
PDF
Mahout and Distributed Machine Learning 101
PPTX
machine learning - Clustering in R
PDF
Machine Learning and Data Mining: 06 Clustering: Partitioning
PPTX
Fuzzy c means manual work
PPTX
Introduction to Machine Learning
Machine learning clustering
Clustering tutorial
Machine Learning and Data Mining: 06 Clustering: Introduction
Mahout and Distributed Machine Learning 101
machine learning - Clustering in R
Machine Learning and Data Mining: 06 Clustering: Partitioning
Fuzzy c means manual work
Introduction to Machine Learning
Ad

Similar to Machine learning hands on clustering (20)

PDF
Machine Learning, K-means Algorithm Implementation with R
PDF
CSA 3702 machine learning module 3
PPTX
Unsupervised Learning: Clustering
PPTX
Unsupervised learning Algorithms and Assumptions
PPTX
"k-means-clustering" presentation @ Papers We Love Bucharest
PPTX
For iiii year students of cse ML-UNIT-V.pptx
PDF
Machine learning
PDF
Machine Learning, Statistics And Data Mining
PPTX
AI-Lec20 Clustering I - Kmean.pptx
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
PDF
Chapter#04[Part#01]K-Means Clusterig.pdf
PPTX
Unsupervised learning Modi.pptx
PPTX
K MEANS CLUSTERING - UNSUPERVISED LEARNING
PDF
K-means Clustering
PDF
ML using MATLAB
PPT
K mean-clustering algorithm
PPT
K mean-clustering
DOCX
Neural nw k means
PDF
Clustering.pdf
PDF
Clustering
Machine Learning, K-means Algorithm Implementation with R
CSA 3702 machine learning module 3
Unsupervised Learning: Clustering
Unsupervised learning Algorithms and Assumptions
"k-means-clustering" presentation @ Papers We Love Bucharest
For iiii year students of cse ML-UNIT-V.pptx
Machine learning
Machine Learning, Statistics And Data Mining
AI-Lec20 Clustering I - Kmean.pptx
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Chapter#04[Part#01]K-Means Clusterig.pdf
Unsupervised learning Modi.pptx
K MEANS CLUSTERING - UNSUPERVISED LEARNING
K-means Clustering
ML using MATLAB
K mean-clustering algorithm
K mean-clustering
Neural nw k means
Clustering.pdf
Clustering

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
annual-report-2024-2025 original latest.
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Lecture1 pattern recognition............
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
climate analysis of Dhaka ,Banglades.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
annual-report-2024-2025 original latest.
oil_refinery_comprehensive_20250804084928 (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
Foundation of Data Science unit number two notes
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Miokarditis (Inflamasi pada Otot Jantung)
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Lecture1 pattern recognition............
Reliability_Chapter_ presentation 1221.5784
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Machine learning hands on clustering

  • 2. WHAT’S IN THE MENU - RECOMMENDATIONS 1. Why so popular 2. Supervised vs Unsupervised Learning 3. Topic2 4. Topic3 5. Topic4 6. Wrap-up
  • 4. WHY IS MACHINE LEARNING (CS 229) THE MOST POPULAR COURSE AT STANFORD? - ANDREW NG
  • 5. WHAT CAN YOU TELL ME ABOUT X? Supervised vs unsupervised learning Typical methods: regression and classification Given an object with observed set of features X1, …., Xn having an response Y, the goal is to predict Y using X1, …., Xn Typical methods: principal component analysis (PCA), expectation maximization (EM) and clustering (k-means and its variations) Given an object with observed set of features X1, …., Xn, the goal is to discover relationships or groups between variables or observations. Clustering algorithms try to find natural grouping in data and therefore similar datasets.
  • 6. APPLICATIONS Market segmentation : given market research results, how you can find the best customer segments Anomaly detection : find fraud, detect network attacks, or discover problems in servers or other sensor-equipped machinery. Is important to be able to find new types of anomalies that have never seen before. Healthcare: accident prone factor of the area to hospital assignment, gene clustering
  • 7. GROUPING UNLABELED ITEMS USING K-MEANS CLUSTERING SWAT Strengths : Will always converge Scales well Weakness : Can converge at local minima Slow on very large datasets Choosing the wrong k Advantages : Easy to implement
  • 8. GROUPING UNLABELED ITEMS USING K-MEANS CLUSTERING
  • 9. SIMILARITY There are several ways on measuring similarity between observations. Manhattan distance Euclidian distance Cosine distance
  • 10. K-MEANS PSEUDO CODE Randomly create k points for starting centroids ---------------------------------------------------------------- For every point assigned to a centroid Calculate the distance between the centroid and point Assign the point to the cluster with the lowest distance ---------------------------------------------------------------- For every cluster calculate the mean of the points in that cluster Assign the centroid to the mean While any point has changed cluster assignment Repeat until convergence Cluster assignment step Move centroid step
  • 11. COST FUNCTION & RANDOM INITIALIZATION for i = 1 to 100 { randomly initialize k-means run k-means and get centroids positions c(1 to m) and µ(1 to K) compute cost function J(c(1 to m), µ(1 to K)) } Pick clustering that gave lowest J(c(1 to m), µ(1 to K)) Cluster assignment step: minimize J c(1 to m) while holding µ(1 to K) fixed Move centroid step: minimize J with respect to µ(1 to K)
  • 12. PERFORMANCE CONSIDERATION K-means The K-means has the computational complexity of O(iKnm), i is the number of iterations, K the number of clusters, n the number of observations, m the number of features. Improvements: •Reducing the average number of iterations. •Parallel implementation of K-means by leveraging Hadoop or Spark. •Reducing the number of outliers and possible features by noise filtering with a smoothing algorithm. •Decreasing the dimensions of the model.
  • 13. FRAMEWORKS Java : Weka, Mahout, spark Python: scikit-learn, py-spark, Pylearn2 (Theano) C ++: Shogun .NET: Encog https://github.com/josephmisiti/awesome-machine-learning
  • 14. PLATFORMS - IBM BLUEMIX
  • 17. BOOKS