SlideShare a Scribd company logo
6
Most read
8
Most read
19
Most read
CLIQUE09mxCrew Members ~			K. Kanagaraj 	14			S. Karthikeyan	17			S. Kathiresan	19			 N. PadmaShree	28			M. RamKumar	33			S. Sowmya		45
GRID-BASED CLUSTERING METHODUsing multi-resolution grid data structureClustering complexity depends on the number of populated grid cells and not on the number of objects in the datasetSpace into a finite number of cells that form a grid structure on which all of the operations for clustering is performed.(eg) assume that we have a set of records and we want to cluster with respect to two attributes, then, we divide the related space (plane), into a grid                  structure and then we find the clusters.
Salary (10,000)“Space” is this plane876    5 432  10  20              30         40            50             60  Age
4Advantages of Grid-based ClusteringfastNo distance computationsComplexity is usually on #-of populated-grid-cells and not on #-of objectsEasy to determine which clusters are neighboringShapes are limited to union of grid-cells
Techniques for Grid-Based ClusteringThe following are some techniques that are used to perform Grid-Based Clustering:CLIQUE (CLustering In QUEst.)STING (STatistical Information Grid.)WaveCluster
CLIQUECLustering In QUEst – By Agarwal, Gehrke, Gunopulos, Raghavan published in (SIGMOD ‘98)  - [Special Interest Group on Management of Data]Clustering -  grouping of a number of similar things acc,. to Characteristic or Behavior.Quest - make a search (for)Automatic sub-space clustering                            of high dimension data
Looking at CLIQUE as an Example CLIQUE is used for the clustering of high-dimensional data present in large tables.  By high-dimensional data we mean records that have many attributes. CLIQUE identifies the dense units in the subspaces of high dimensional data space, and uses these subspaces to provide more efficient clustering.
Definitions That Need to Be KnownUnit :	After forming a grid structure on the   	space, each rectangular cell is   	called a Unit.Dense: 	A unit is dense, if the fraction of   	total data points contained in the  	unit exceeds the input model              	parameter.Cluster:	A cluster is defined as a maximal set of 	connected dense units.
How Does CLIQUE Work? Let us say that we have a set of records that we would like to cluster in terms of n-attributes. So, we are dealing with an n-dimensional space. MAJOR STEPS :CLIQUE partitions each subspace that has dimension 1 into the same number of equal length intervals.Using this as basis, it partitions the n-dimensional data space into non-overlapping rectangular units.
CLIQUE: Major Steps (Cont.)Now CLIQUE’S goal is to identify the dense n-dimensional units.It does this in the following way:CLIQUE finds dense units of higher dimensionality by finding the dense units in the subspaces.So, for example if we are dealing with a 3-dimensional space, CLIQUE finds the dense units in the 3 related PLANES (2-dimensional subspaces.)It then intersects the extension of the subspaces representing the dense units to form a candidate search space in which dense units of higher dimensionality would exist.
CLIQUE: Major Steps. (Cont.)Eachmaximal set of connected dense units is considered a cluster.Using this definition, the dense units in the subspaces are examined in order to find clusters in the subspaces. The information of the subspaces is then used to find clusters in the n-dimensional space. It must be noted that all cluster boundaries are either horizontal or vertical. This is due to the nature of the rectangular grid cells.
Example for CLIQUE Let us say that we want to cluster a set of records that have three attributes namely  salary, vacation and age. The data space for the this data would be  3-dimensional. vacationagesalary
Example (Cont.) After plotting the data objects, each dimension, (i.e., salary, vacation and age) is split into intervals of equal length. Then we form a 3-dimensional grid on the space, each unit of which would be a 3-D rectangle. Now, our goal is to find the dense 3-D rectangular units.
Example (Cont.)To do this, we find the dense units of the subspaces of this 3-d space.So, we find the dense units with respect to age for salary. This means that we look at the salary-age plane and find all the 2-D rectangular units that are dense. We also find the dense 2-D rectangular units for the vacation-age plane.
Example
Example (Cont.) Now let us try to visualize the dense units of the two planes on the following 3-d figure :
Example (Cont.)We can extend the dense areas in the vacation-age plane inwards. We can extend the dense areas in the salary-age plane upwards. The intersection of these two spaces would give us a candidate search space in which 3-dimensional dense units exist.We then find the dense units in the salary-vacation plane and we form an extension of the subspace that represents these dense units.
Example (Cont.) Now, we perform an intersection of the candidate search space with the extension of the dense units of the salary-vacation plane, in order to get all the 3-d dense units.   So, What was the main idea?We used the dense units in subspaces in order to find the dense units in the 3-dimensional space. After finding the dense units, it is very easy to find clusters.
Reflecting upon CLIQUE Why does CLIQUE confine its search for dense units in high dimensions to the intersection of dense units in subspaces? Because the Apriori property employs prior knowledge of the items in the search space so that portions of the space can be pruned.  The property for CLIQUE says that if a k-dimensional unit is dense then so are its projections in the (k-1) dimensional space.
Strength and Weakness of CLIQUEStrengthIt automatically finds subspaces of thehighest dimensionality such that high density clusters exist in those subspaces.It is quite efficient.It is insensitive to the order of records in input and does not presume some canonical data distribution.It scales linearly with the size of input and has good scalability as the number of dimensions in the data increases.WeaknessThe accuracy of the clustering result may be degraded at the expense of simplicity of the simplicity of this method.
Although the study of complete subgraphs goes back at least to the graph-theoretic reformulation of Ramsey theory by Erdős & Szekeres (1935),[1] the term "clique" comes from Luce & Perry (1949), who used complete subgraphs in social networks to model cliques of people; that is, groups of people all of whom know each other. Cliques have many other applications in the sciences and particularly in bioinformatics.
A maximal clique is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique.A maximum clique is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.

More Related Content

PPTX
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
PPTX
Clique and sting
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
PPT
K means Clustering Algorithm
PPT
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
PPTX
Random forest
PPT
K mean-clustering
PPTX
Naive bayes
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
Clique and sting
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K means Clustering Algorithm
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Random forest
K mean-clustering
Naive bayes

What's hot (20)

PPTX
Dbscan algorithom
PPTX
Machine learning with ADA Boost
PPT
3.3 hierarchical methods
PPT
3.7 outlier analysis
PPTX
Data Mining: clustering and analysis
PDF
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
PPT
Association rule mining
PPTX
Instance based learning
PPTX
Birch Algorithm With Solved Example
PDF
Hierarchical Clustering
PPT
Data cleaning-outlier-detection
PDF
Optics ordering points to identify the clustering structure
PPTX
Text similarity measures
PPT
3.2 partitioning methods
PPTX
05 Clustering in Data Mining
PPT
Data mining :Concepts and Techniques Chapter 2, data
PPTX
Data mining: Classification and prediction
PDF
Density Based Clustering
PPTX
Presentation on K-Means Clustering
Dbscan algorithom
Machine learning with ADA Boost
3.3 hierarchical methods
3.7 outlier analysis
Data Mining: clustering and analysis
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
Association rule mining
Instance based learning
Birch Algorithm With Solved Example
Hierarchical Clustering
Data cleaning-outlier-detection
Optics ordering points to identify the clustering structure
Text similarity measures
3.2 partitioning methods
05 Clustering in Data Mining
Data mining :Concepts and Techniques Chapter 2, data
Data mining: Classification and prediction
Density Based Clustering
Presentation on K-Means Clustering
Ad

Viewers also liked (6)

PPT
3.4 density and grid methods
PPT
Cure, Clustering Algorithm
PPT
1.7 data reduction
PPTX
Application of data mining
PPTX
PDF
Data Mining: Association Rules Basics
3.4 density and grid methods
Cure, Clustering Algorithm
1.7 data reduction
Application of data mining
Data Mining: Association Rules Basics
Ad

Similar to Clique (20)

PPT
dm_clustering2.ppt
PDF
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
PPT
ClustIII.ppt
PPT
Clustering_Unsupervised learning Unsupervised learning.ppt
PPT
Clustering
PPTX
Cluster Analysis.pptx
PDF
Ir3116271633
PDF
A h k clustering algorithm for high dimensional data using ensemble learning
PPT
DM_clustering.ppt
PDF
Bs31267274
PDF
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
PDF
Volume 2-issue-6-2143-2147
PDF
Volume 2-issue-6-2143-2147
PPTX
clustering and distance metrics.pptx
PPTX
Clustering in Machine Learning, a process of grouping.
PDF
Module - 5 Machine Learning-22ISE62.pdf
PPTX
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
PDF
Paper id 26201478
PPTX
Subspace clustring
dm_clustering2.ppt
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
ClustIII.ppt
Clustering_Unsupervised learning Unsupervised learning.ppt
Clustering
Cluster Analysis.pptx
Ir3116271633
A h k clustering algorithm for high dimensional data using ensemble learning
DM_clustering.ppt
Bs31267274
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
clustering and distance metrics.pptx
Clustering in Machine Learning, a process of grouping.
Module - 5 Machine Learning-22ISE62.pdf
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
Paper id 26201478
Subspace clustring

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Classroom Observation Tools for Teachers
PDF
01-Introduction-to-Information-Management.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
master seminar digital applications in india
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Institutional Correction lecture only . . .
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Structure & Organelles in detailed.
Classroom Observation Tools for Teachers
01-Introduction-to-Information-Management.pdf
Computing-Curriculum for Schools in Ghana
Renaissance Architecture: A Journey from Faith to Humanism
TR - Agricultural Crops Production NC III.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Supply Chain Operations Speaking Notes -ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Module 4: Burden of Disease Tutorial Slides S2 2025
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Sports Quiz easy sports quiz sports quiz
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
master seminar digital applications in india
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Institutional Correction lecture only . . .
FourierSeries-QuestionsWithAnswers(Part-A).pdf

Clique

  • 1. CLIQUE09mxCrew Members ~ K. Kanagaraj 14 S. Karthikeyan 17 S. Kathiresan 19 N. PadmaShree 28 M. RamKumar 33 S. Sowmya 45
  • 2. GRID-BASED CLUSTERING METHODUsing multi-resolution grid data structureClustering complexity depends on the number of populated grid cells and not on the number of objects in the datasetSpace into a finite number of cells that form a grid structure on which all of the operations for clustering is performed.(eg) assume that we have a set of records and we want to cluster with respect to two attributes, then, we divide the related space (plane), into a grid structure and then we find the clusters.
  • 3. Salary (10,000)“Space” is this plane876 5 432 10 20 30 40 50 60 Age
  • 4. 4Advantages of Grid-based ClusteringfastNo distance computationsComplexity is usually on #-of populated-grid-cells and not on #-of objectsEasy to determine which clusters are neighboringShapes are limited to union of grid-cells
  • 5. Techniques for Grid-Based ClusteringThe following are some techniques that are used to perform Grid-Based Clustering:CLIQUE (CLustering In QUEst.)STING (STatistical Information Grid.)WaveCluster
  • 6. CLIQUECLustering In QUEst – By Agarwal, Gehrke, Gunopulos, Raghavan published in (SIGMOD ‘98) - [Special Interest Group on Management of Data]Clustering - grouping of a number of similar things acc,. to Characteristic or Behavior.Quest - make a search (for)Automatic sub-space clustering of high dimension data
  • 7. Looking at CLIQUE as an Example CLIQUE is used for the clustering of high-dimensional data present in large tables. By high-dimensional data we mean records that have many attributes. CLIQUE identifies the dense units in the subspaces of high dimensional data space, and uses these subspaces to provide more efficient clustering.
  • 8. Definitions That Need to Be KnownUnit : After forming a grid structure on the space, each rectangular cell is called a Unit.Dense: A unit is dense, if the fraction of total data points contained in the unit exceeds the input model parameter.Cluster: A cluster is defined as a maximal set of connected dense units.
  • 9. How Does CLIQUE Work? Let us say that we have a set of records that we would like to cluster in terms of n-attributes. So, we are dealing with an n-dimensional space. MAJOR STEPS :CLIQUE partitions each subspace that has dimension 1 into the same number of equal length intervals.Using this as basis, it partitions the n-dimensional data space into non-overlapping rectangular units.
  • 10. CLIQUE: Major Steps (Cont.)Now CLIQUE’S goal is to identify the dense n-dimensional units.It does this in the following way:CLIQUE finds dense units of higher dimensionality by finding the dense units in the subspaces.So, for example if we are dealing with a 3-dimensional space, CLIQUE finds the dense units in the 3 related PLANES (2-dimensional subspaces.)It then intersects the extension of the subspaces representing the dense units to form a candidate search space in which dense units of higher dimensionality would exist.
  • 11. CLIQUE: Major Steps. (Cont.)Eachmaximal set of connected dense units is considered a cluster.Using this definition, the dense units in the subspaces are examined in order to find clusters in the subspaces. The information of the subspaces is then used to find clusters in the n-dimensional space. It must be noted that all cluster boundaries are either horizontal or vertical. This is due to the nature of the rectangular grid cells.
  • 12. Example for CLIQUE Let us say that we want to cluster a set of records that have three attributes namely salary, vacation and age. The data space for the this data would be 3-dimensional. vacationagesalary
  • 13. Example (Cont.) After plotting the data objects, each dimension, (i.e., salary, vacation and age) is split into intervals of equal length. Then we form a 3-dimensional grid on the space, each unit of which would be a 3-D rectangle. Now, our goal is to find the dense 3-D rectangular units.
  • 14. Example (Cont.)To do this, we find the dense units of the subspaces of this 3-d space.So, we find the dense units with respect to age for salary. This means that we look at the salary-age plane and find all the 2-D rectangular units that are dense. We also find the dense 2-D rectangular units for the vacation-age plane.
  • 16. Example (Cont.) Now let us try to visualize the dense units of the two planes on the following 3-d figure :
  • 17. Example (Cont.)We can extend the dense areas in the vacation-age plane inwards. We can extend the dense areas in the salary-age plane upwards. The intersection of these two spaces would give us a candidate search space in which 3-dimensional dense units exist.We then find the dense units in the salary-vacation plane and we form an extension of the subspace that represents these dense units.
  • 18. Example (Cont.) Now, we perform an intersection of the candidate search space with the extension of the dense units of the salary-vacation plane, in order to get all the 3-d dense units. So, What was the main idea?We used the dense units in subspaces in order to find the dense units in the 3-dimensional space. After finding the dense units, it is very easy to find clusters.
  • 19. Reflecting upon CLIQUE Why does CLIQUE confine its search for dense units in high dimensions to the intersection of dense units in subspaces? Because the Apriori property employs prior knowledge of the items in the search space so that portions of the space can be pruned. The property for CLIQUE says that if a k-dimensional unit is dense then so are its projections in the (k-1) dimensional space.
  • 20. Strength and Weakness of CLIQUEStrengthIt automatically finds subspaces of thehighest dimensionality such that high density clusters exist in those subspaces.It is quite efficient.It is insensitive to the order of records in input and does not presume some canonical data distribution.It scales linearly with the size of input and has good scalability as the number of dimensions in the data increases.WeaknessThe accuracy of the clustering result may be degraded at the expense of simplicity of the simplicity of this method.
  • 21. Although the study of complete subgraphs goes back at least to the graph-theoretic reformulation of Ramsey theory by Erdős & Szekeres (1935),[1] the term "clique" comes from Luce & Perry (1949), who used complete subgraphs in social networks to model cliques of people; that is, groups of people all of whom know each other. Cliques have many other applications in the sciences and particularly in bioinformatics.
  • 22. A maximal clique is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique.A maximum clique is a clique of the largest possible size in a given graph. The clique number ω(G) of a graph G is the number of vertices in a maximum clique in G.