SlideShare a Scribd company logo
CLUSTERING
TECHNIQUES
OVERVIEW AND
APPLICATIONS
INDEX
Introduction
Clustering
Clustering Techniques
Pros and Cons of Clustering Techniques
Applications of Clustering Techniques
Conclusion
Future Work
INTRODUCTION
• Clustering was first employed in biology back in the 1960s to classify species.
• In this data-driven era, effective data organization and analysis methods play a
major role in gaining insights from data.
• From marketing to social network analysis, clustering has been evolving and now
is an essential sorting and categorizing data tool for pattern detection, data
analysis, and interpretation
• Clustering is an unsupervised data analysis technique that groups a set of objects
such that objects in the same group (cluster) are more similar to each other than to
those in other groups.
.
Example: Clustering Grocery Items
eggs bananas
milk bread
TECHNIQUES
Partitional Clustering - K Means
Hierarchical Based Clustering – BIRCH
Density Based Clustering – DBSCAN
Grid Based Clustering – STING
Model Based Clustering – Gaussian Mixture Model
Partitional Clustering - K Means
• Partitional clustering divides a dataset into non-overlapping partitions or clusters, where
each data point belongs to exactly one cluster.
• K-means clustering groups the unlabelled dataset into a defined number of clusters where
similar data points are grouped together to discover underlying patterns.
Phases:
• Initialization
• Categorize and Update centroids
• Repeat
Hierarchical Based Clustering –
BIRCH(Unsupervised)
Hierarchical Clustering organizes elements in a hierarchical or tree like structure.
Balanced Iterative Reducing and Clustering
BIRCH clusters large data set with a single scan and improves the quality of data
with a few additional scans.
BIRCH consists of two stages,
• Building the CF(Clustering Feature) tree
• Global Clustering.
Cluster refinement for accuracy.
Density Based Clustering – DBSCAN
• Density-based clustering methods create clusters based on the density of data or
information that are to be clustered in the feature space.
• Density Based Spatial Clustering of Applications with Noise defines clusters by
identifying the data which has a minimum number of data points within a specific
radius.
• Steps in the DBSCAN algorithm
• Classify the points and discard noise.
• Assign cluster to a core point.
• Color all the density connected points and boundary points according to the nearest core point.
Grid-Based
Clustering –
STING
• Grid-based clustering partitions the dataset into a grid structure,
organizing data points into cells for efficient clustering based on spatial
proximity.
• STING(STATISTICAL INFORMATION GRID) approach which
partitions the data into a hierarchical grid, Investigates the clusters at
different levels of their detail
• Phases of sting are
• Grid Construction & Cell Assignment
• Density Calculation & Cluster Identification
• Border Point Assignment & Noise Identification
Model-Based Clustering
– Gaussian Mixture
• Model-based clustering assigns data points to clusters based on
probabilistic models representing the data distribution.
• "Gaussian Mixture is a statistical model that identifies subgroups within
a population using a combination of Gaussian distributions."
• It repeatedly optimizes parameters using an expectation-maximization
algorithm which estimates cluster means, covariances, and mixture
covariances
• Steps of gaussian mixture are
• Initialization
• Expectation Step(E-Step) & Maximum Step(M-Step)
• Convergence Check
• Iteration
PROS AND CONS OF CLUSTERING
TECHNIQUES
Cons:
• Parameter subjectivity
• High dimensions challenge
• Evaluation difficulty
• Shape Assumptions
• Noise handling
Pros:
• Pattern finding
• Exploration
• Feature Discovery
• Data compression
• Scalability
Applications of Clustering Techniques
• Customer Segmentation: Grouping customers into distinct segments based on attitudes
and behavior for targeted marketing strategies.
• Anomaly Detection: Identifying unusual patterns or outliers in datasets that deviate
significantly from normal behavior.
• Image Segmentation: Partitioning an image into regions with similar attributes, for object
recognition and image analysis tasks.
• Recommendation Systems: Grouping users or items into clusters based on preferences
or similarities to provide personalized recommendations in e-commerce or content
platforms.
• Document clustering enables automatic grouping of similar documents for efficient
information retrieval, text summarization, and content-based recommendation systems.
CONCLUSION
Clustering techniques offer a flexible approach to unsupervised learning, applicable
across diverse datasets and domains. By grouping similar data points, clustering
facilitates exploration and recognition of underlying patterns, leading to valuable
insights.
Clustering algorithms automate data grouping tasks, saving time and enabling
efficient analysis of large datasets. Clustering finds use in marketing, healthcare,
finance, and more, for tasks like customer segmentation and anomaly detection.
FUTURE
RESEARCH Adaptability to diverse data types,
including text, image, and graph data.
Improving visualization of clustering
results.
Integration with machine learning for
predictive modeling.
Addressing privacy concerns with
privacy-preserving techniques.
Tailoring clustering methods for
domain-specific applications.
References
• T. Zhang, R. Ramakrishnan and M. Livny, “BIRCH: an efficient data clustering method
for very large databases” in ACM Sigmod Record, ACM, vol. 25, pp. 103–114.
• M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering
techniques" in Proceedings of the KDD Workshop on Text Mining, ACM, 2000.
• M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering
clusters in large spatial databases with noise" in Proceedings of the 2nd International
Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, 1996.

More Related Content

PPT
upd Unit-v -Cluster Analysis (1) (1).ppt
PPT
multiarmed bandit.ppt
PDF
4.Unit 4 ML Q&A.pdf machine learning qb
PPTX
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
PDF
Chapter 5.pdf
PPTX
UNIT - 4: Data Warehousing and Data Mining
PDF
It is a presentation on machine learning
PPT
Clustering_Unsupervised learning Unsupervised learning.ppt
upd Unit-v -Cluster Analysis (1) (1).ppt
multiarmed bandit.ppt
4.Unit 4 ML Q&A.pdf machine learning qb
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
Chapter 5.pdf
UNIT - 4: Data Warehousing and Data Mining
It is a presentation on machine learning
Clustering_Unsupervised learning Unsupervised learning.ppt

Similar to Clustering: Grouping all Data for Insights (20)

PPTX
Clustering in data Mining (Data Mining)
PDF
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
PDF
Paper id 26201478
PPTX
CLUSTER ANALYSIS ALGORITHMS.pptx
PPT
DM_clustering.ppt
PPTX
Clustering in Data Mining
PPT
cluster analysis
PPTX
Clustering on DSS
PDF
A0360109
PPT
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
PPTX
Clustering in Machine Learning, a process of grouping.
PPT
My8clst
PPT
Capter10 cluster basic
PPT
Capter10 cluster basic : Han & Kamber
PDF
Ijartes v1-i2-006
PPTX
Cluster Analysis.pptx
PPTX
PPTX
clustering ppt.pptx
PDF
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
PPTX
Clusters techniques
Clustering in data Mining (Data Mining)
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
Paper id 26201478
CLUSTER ANALYSIS ALGORITHMS.pptx
DM_clustering.ppt
Clustering in Data Mining
cluster analysis
Clustering on DSS
A0360109
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Clustering in Machine Learning, a process of grouping.
My8clst
Capter10 cluster basic
Capter10 cluster basic : Han & Kamber
Ijartes v1-i2-006
Cluster Analysis.pptx
clustering ppt.pptx
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Clusters techniques
Ad

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
Ad

Clustering: Grouping all Data for Insights

  • 2. INDEX Introduction Clustering Clustering Techniques Pros and Cons of Clustering Techniques Applications of Clustering Techniques Conclusion Future Work
  • 3. INTRODUCTION • Clustering was first employed in biology back in the 1960s to classify species. • In this data-driven era, effective data organization and analysis methods play a major role in gaining insights from data. • From marketing to social network analysis, clustering has been evolving and now is an essential sorting and categorizing data tool for pattern detection, data analysis, and interpretation • Clustering is an unsupervised data analysis technique that groups a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups. .
  • 4. Example: Clustering Grocery Items eggs bananas milk bread
  • 5. TECHNIQUES Partitional Clustering - K Means Hierarchical Based Clustering – BIRCH Density Based Clustering – DBSCAN Grid Based Clustering – STING Model Based Clustering – Gaussian Mixture Model
  • 6. Partitional Clustering - K Means • Partitional clustering divides a dataset into non-overlapping partitions or clusters, where each data point belongs to exactly one cluster. • K-means clustering groups the unlabelled dataset into a defined number of clusters where similar data points are grouped together to discover underlying patterns. Phases: • Initialization • Categorize and Update centroids • Repeat
  • 7. Hierarchical Based Clustering – BIRCH(Unsupervised) Hierarchical Clustering organizes elements in a hierarchical or tree like structure. Balanced Iterative Reducing and Clustering BIRCH clusters large data set with a single scan and improves the quality of data with a few additional scans. BIRCH consists of two stages, • Building the CF(Clustering Feature) tree • Global Clustering. Cluster refinement for accuracy.
  • 8. Density Based Clustering – DBSCAN • Density-based clustering methods create clusters based on the density of data or information that are to be clustered in the feature space. • Density Based Spatial Clustering of Applications with Noise defines clusters by identifying the data which has a minimum number of data points within a specific radius. • Steps in the DBSCAN algorithm • Classify the points and discard noise. • Assign cluster to a core point. • Color all the density connected points and boundary points according to the nearest core point.
  • 9. Grid-Based Clustering – STING • Grid-based clustering partitions the dataset into a grid structure, organizing data points into cells for efficient clustering based on spatial proximity. • STING(STATISTICAL INFORMATION GRID) approach which partitions the data into a hierarchical grid, Investigates the clusters at different levels of their detail • Phases of sting are • Grid Construction & Cell Assignment • Density Calculation & Cluster Identification • Border Point Assignment & Noise Identification
  • 10. Model-Based Clustering – Gaussian Mixture • Model-based clustering assigns data points to clusters based on probabilistic models representing the data distribution. • "Gaussian Mixture is a statistical model that identifies subgroups within a population using a combination of Gaussian distributions." • It repeatedly optimizes parameters using an expectation-maximization algorithm which estimates cluster means, covariances, and mixture covariances • Steps of gaussian mixture are • Initialization • Expectation Step(E-Step) & Maximum Step(M-Step) • Convergence Check • Iteration
  • 11. PROS AND CONS OF CLUSTERING TECHNIQUES Cons: • Parameter subjectivity • High dimensions challenge • Evaluation difficulty • Shape Assumptions • Noise handling Pros: • Pattern finding • Exploration • Feature Discovery • Data compression • Scalability
  • 12. Applications of Clustering Techniques • Customer Segmentation: Grouping customers into distinct segments based on attitudes and behavior for targeted marketing strategies. • Anomaly Detection: Identifying unusual patterns or outliers in datasets that deviate significantly from normal behavior. • Image Segmentation: Partitioning an image into regions with similar attributes, for object recognition and image analysis tasks. • Recommendation Systems: Grouping users or items into clusters based on preferences or similarities to provide personalized recommendations in e-commerce or content platforms. • Document clustering enables automatic grouping of similar documents for efficient information retrieval, text summarization, and content-based recommendation systems.
  • 13. CONCLUSION Clustering techniques offer a flexible approach to unsupervised learning, applicable across diverse datasets and domains. By grouping similar data points, clustering facilitates exploration and recognition of underlying patterns, leading to valuable insights. Clustering algorithms automate data grouping tasks, saving time and enabling efficient analysis of large datasets. Clustering finds use in marketing, healthcare, finance, and more, for tasks like customer segmentation and anomaly detection.
  • 14. FUTURE RESEARCH Adaptability to diverse data types, including text, image, and graph data. Improving visualization of clustering results. Integration with machine learning for predictive modeling. Addressing privacy concerns with privacy-preserving techniques. Tailoring clustering methods for domain-specific applications.
  • 15. References • T. Zhang, R. Ramakrishnan and M. Livny, “BIRCH: an efficient data clustering method for very large databases” in ACM Sigmod Record, ACM, vol. 25, pp. 103–114. • M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques" in Proceedings of the KDD Workshop on Text Mining, ACM, 2000. • M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise" in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, 1996.