SlideShare a Scribd company logo
REHANA RAJ
DFK1307
DEPT OF FISH PROCESSING TECHNOLOGY
COLLEGE OF FISHERIES
MANGALORE
CLUSTER ANALYSIS
 Cluster Analysis is a multivariate statistical techniques
in which large data set is segregated into several
groups based on homogeneity or similarity measures
 Cluster Analysis make sensible and informative
classification of an initially unclassified set of data
with desired accuracy, using the variable values
observed on each individual
 It saves lot of resource in terms of time, money etc
Before clustering After clustering
 To assign observations to groups (‘clusters’)
 To divide the observations into homogenous and
distinct groups
 To reduce the complexity of data
 Generates several groups of data set which are similar
 Homogeneous within the group and as much as
possible heterogeneous to other groups
 Normally, data consists of objects or persons
 Segregation is done based on more than two
variables.
 Hierarchical Clustering
 Centroid-based clustering
 Distribution-based clustering
 Density-based clustering
 Hierarchical clustering is a method of cluster analysis which
seeks to build a hierarchy of clusters.
 Two types:
 Agglomerative (bottom-top):
◦ Start with each document being a single cluster.
◦ Eventually all documents belong to the same cluster.
 Divisive (top-bottom):
◦ Start with all documents belong to the same cluster.
◦ Eventually each node forms a cluster on its own.
 No. of clusters need not be k.
 Construction of a tree-based hierarchical diagram
usually called dendrogram. E.g., In case of taxonomy
classification
animal
vertebrate
fish reptile amphib. mammal worm insect crustacean
invertebrate
 In this clustering, clusters are
represented by a central
vector, which may not
necessarily be a member of
the data set.
 Aims to partition on
observations into k clusters.
 Each observation belongs to
the cluster with the nearest
mean.
 Here, the no. of clusters is
fixed to k(k-means clustering)
 Clusters can be defined as objects belonging to same
distribution.
 It provides correlation and dependence of attributes.
 Clusters are based on density.
 Objects in these sparse areas - that are required to separate
clusters - are usually considered to be noise and border
points.
 The most popular density based clustering method is
DBSCAN (density-based spatial clustering of applications
with noise).
 OPTICS (Ordering Points To Identify the Clustering
Structure) is a generalization of DBSCAN that handles
different densities much better way.
Density-based clustering
with DBSCAN.
DBSCAN assumes clusters of
similar density, and may have
problems separating nearby
clusters
OPTICS is a DBSCAN variant
that handles different densities
much better
1. Forming the clusters from the given data set – resulting
in a new variable that identifies cluster members among
the cases (one phase cluster)
2. Description of clusters by re-crossing with the data
(Two phase cluster)
FISH CUTLET
FISH FINGER
FISH BURGER
VALUE
ADDED
PRODUCTS
One phase cluster
Forming of clusters by the
chosen data set
FISH CUTLET
Seer fish Mackerel
Baked Fried
Two phase cluster
Third phase cluster
 Cuts down the cost of preparing a sampling frame and
other administrative factors.
 No special scales of measurement necessary
 Visual graphic provides clear understanding of the
clusters.
Disadvantages:
 Choice of cluster-forming variables often not based on
theory but at random
 In some cases, determination of clusters is difficult to
decide.
Advantages :
Marketing: Help marketers to discover distinct groups in their
customer bases, and then use this knowledge to develop targeted
marketing programs
Land use: Identification of areas of similar land use in an earth
observation database
Insurance: Identifying groups of motor insurance policy holders
with a high average claim cost
City-planning: Identifying groups of houses according to their
house type, value, and geographical location
Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults
for your kind attention!

More Related Content

DOC
Graph Clustering and cluster
PPTX
Cluster Analysis
PPTX
Cluster Analysis
PPTX
Hierarchical Clustering in Data Mining
PPTX
Program_Cluster_Analysis
PPT
cluster analysis
PDF
Clustering Methods with R
PPTX
Marketing analytics - clustering Types
Graph Clustering and cluster
Cluster Analysis
Cluster Analysis
Hierarchical Clustering in Data Mining
Program_Cluster_Analysis
cluster analysis
Clustering Methods with R
Marketing analytics - clustering Types

What's hot (20)

DOCX
Cluster Analysis Assignment 2013-2014(2)
PPTX
Cluster analysis
PPTX
Cluster Validation
PDF
Cluster Analysis
PPTX
Cluster analysis
PPT
My8clst
PPT
Clustering & classification
PPTX
Classification and Clustering
PPT
Statistical Clustering
PPT
Cluster spss week7
PDF
Spss tutorial-cluster-analysis
PPT
Cluster analysis for market segmentation
PDF
Cluster Analysis
PPT
Clustering
PDF
Cluster analysis
PPTX
Cluster analysis
PDF
Cluster analysis using spss
PDF
Cluster Analysis : Assignment & Update
PPT
Dataa miining
PDF
Cluster Analysis Assignment 2013-2014(2)
Cluster analysis
Cluster Validation
Cluster Analysis
Cluster analysis
My8clst
Clustering & classification
Classification and Clustering
Statistical Clustering
Cluster spss week7
Spss tutorial-cluster-analysis
Cluster analysis for market segmentation
Cluster Analysis
Clustering
Cluster analysis
Cluster analysis
Cluster analysis using spss
Cluster Analysis : Assignment & Update
Dataa miining
Ad

Similar to Rajia cluster analysis (20)

PPT
upd Unit-v -Cluster Analysis (1) (1).ppt
PPT
multiarmed bandit.ppt
PPT
Data mining concepts and techniques Chapter 10
PPT
data mining cocepts and techniques chapter
PPT
Chapter 10 ClusBasic ppt file for clear understaning
PPT
Chapter -10-Clus_Basic.ppt -DataMinning
PPT
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
PPT
10 clusbasic
PDF
10 clusbasic
PPT
CLUSTERING
PPTX
Advanced database and data mining & clustering concepts
PPT
Capter10 cluster basic
PPT
Capter10 cluster basic : Han & Kamber
PPTX
DS9 - Clustering.pptx
PPTX
Clustering in data Mining (Data Mining)
PDF
Data Science - Part VII - Cluster Analysis
PDF
Chapter 5.pdf
PPTX
Cluster Analysis.pptx
PPT
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
PPTX
UNIT - 4: Data Warehousing and Data Mining
upd Unit-v -Cluster Analysis (1) (1).ppt
multiarmed bandit.ppt
Data mining concepts and techniques Chapter 10
data mining cocepts and techniques chapter
Chapter 10 ClusBasic ppt file for clear understaning
Chapter -10-Clus_Basic.ppt -DataMinning
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
10 clusbasic
10 clusbasic
CLUSTERING
Advanced database and data mining & clustering concepts
Capter10 cluster basic
Capter10 cluster basic : Han & Kamber
DS9 - Clustering.pptx
Clustering in data Mining (Data Mining)
Data Science - Part VII - Cluster Analysis
Chapter 5.pdf
Cluster Analysis.pptx
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
UNIT - 4: Data Warehousing and Data Mining
Ad

More from College of Fisheries, KVAFSU, Mangalore, Karnataka (20)

PPTX
PPTX
Types of coral reefs and its distribution
PPTX
PPTX
PPTX
Stunted seed production & culture practices
PPTX
PPTX
PPTX
Plants and animals associates of living reef corals
PPTX
PPT
Detail accounts of different comme rci al carp egg hatching devices (2)
PPTX
Conservation and management of coral reefs
PPTX
Types of coral reefs and its distribution
Stunted seed production & culture practices
Plants and animals associates of living reef corals
Detail accounts of different comme rci al carp egg hatching devices (2)
Conservation and management of coral reefs

Recently uploaded (20)

PDF
Computing-Curriculum for Schools in Ghana
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
RMMM.pdf make it easy to upload and study
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Pre independence Education in Inndia.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Computing-Curriculum for Schools in Ghana
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Sports Quiz easy sports quiz sports quiz
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Final Presentation General Medicine 03-08-2024.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
human mycosis Human fungal infections are called human mycosis..pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
RMMM.pdf make it easy to upload and study
Microbial diseases, their pathogenesis and prophylaxis
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPH.pptx obstetrics and gynecology in nursing
Microbial disease of the cardiovascular and lymphatic systems
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pre independence Education in Inndia.pdf
Basic Mud Logging Guide for educational purpose
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
FourierSeries-QuestionsWithAnswers(Part-A).pdf

Rajia cluster analysis

  • 1. REHANA RAJ DFK1307 DEPT OF FISH PROCESSING TECHNOLOGY COLLEGE OF FISHERIES MANGALORE CLUSTER ANALYSIS
  • 2.  Cluster Analysis is a multivariate statistical techniques in which large data set is segregated into several groups based on homogeneity or similarity measures  Cluster Analysis make sensible and informative classification of an initially unclassified set of data with desired accuracy, using the variable values observed on each individual  It saves lot of resource in terms of time, money etc
  • 4.  To assign observations to groups (‘clusters’)  To divide the observations into homogenous and distinct groups  To reduce the complexity of data
  • 5.  Generates several groups of data set which are similar  Homogeneous within the group and as much as possible heterogeneous to other groups  Normally, data consists of objects or persons  Segregation is done based on more than two variables.
  • 6.  Hierarchical Clustering  Centroid-based clustering  Distribution-based clustering  Density-based clustering
  • 7.  Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.  Two types:  Agglomerative (bottom-top): ◦ Start with each document being a single cluster. ◦ Eventually all documents belong to the same cluster.  Divisive (top-bottom): ◦ Start with all documents belong to the same cluster. ◦ Eventually each node forms a cluster on its own.  No. of clusters need not be k.
  • 8.  Construction of a tree-based hierarchical diagram usually called dendrogram. E.g., In case of taxonomy classification animal vertebrate fish reptile amphib. mammal worm insect crustacean invertebrate
  • 9.  In this clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set.  Aims to partition on observations into k clusters.  Each observation belongs to the cluster with the nearest mean.  Here, the no. of clusters is fixed to k(k-means clustering)
  • 10.  Clusters can be defined as objects belonging to same distribution.  It provides correlation and dependence of attributes.
  • 11.  Clusters are based on density.  Objects in these sparse areas - that are required to separate clusters - are usually considered to be noise and border points.  The most popular density based clustering method is DBSCAN (density-based spatial clustering of applications with noise).  OPTICS (Ordering Points To Identify the Clustering Structure) is a generalization of DBSCAN that handles different densities much better way.
  • 12. Density-based clustering with DBSCAN. DBSCAN assumes clusters of similar density, and may have problems separating nearby clusters OPTICS is a DBSCAN variant that handles different densities much better
  • 13. 1. Forming the clusters from the given data set – resulting in a new variable that identifies cluster members among the cases (one phase cluster) 2. Description of clusters by re-crossing with the data (Two phase cluster)
  • 14. FISH CUTLET FISH FINGER FISH BURGER VALUE ADDED PRODUCTS One phase cluster Forming of clusters by the chosen data set
  • 15. FISH CUTLET Seer fish Mackerel Baked Fried Two phase cluster Third phase cluster
  • 16.  Cuts down the cost of preparing a sampling frame and other administrative factors.  No special scales of measurement necessary  Visual graphic provides clear understanding of the clusters. Disadvantages:  Choice of cluster-forming variables often not based on theory but at random  In some cases, determination of clusters is difficult to decide. Advantages :
  • 17. Marketing: Help marketers to discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Land use: Identification of areas of similar land use in an earth observation database Insurance: Identifying groups of motor insurance policy holders with a high average claim cost City-planning: Identifying groups of houses according to their house type, value, and geographical location Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults
  • 18. for your kind attention!