SlideShare a Scribd company logo
DATA
MINING
PRESENTED BY:
KINZA RAZZAQ
BSIT-13-F072
Supervised
vs.
Unsupervised
Learning
A brief
introduction to
Data Mining
AGENDA
What can Data
Mining do
“There are things that we know that we know(Known
knowns)…
There are things that we know that we
don’t know(Known unknowns)…
There are things that we don’t know
we don’t know(Unknown unknowns)…
There are things that we don’t
know we know(Unknown knowns)”
“There are things that we know that we know(Known
knowns)…
There are things that we know that we
don’t know(Known unknowns)…
There are things that we don’t know
we don’t know(Unknown unknowns)…
There are things that we don’t
know we know(Unknown knowns)”
Data mining has relevance to the fourth point in
red.
It is an art of digging out what exactly we don’t
know that we must know in our business.
The methodology is to first convert “unknown
unknowns” into “known unknowns” and then
finally to “known knowns”.
DATA WAREHOUSING
VS.
DATA MINING
Data Warehousing provides the
Enterprise with a memory
Data Mining provides the
Enterprise with intelligence
Data Mining works with Data
Warehouse
What is Data Mining?
• Knowledge Discovery in Databases (KDD).
• Data mining digs out valuable, non-trivial
information from large multidimensional apparently
unrelated data base.
• It’s the integration of business knowledge, people,
information, algorithms, statistics and computing
technology.
• Finding useful hidden patterns and relationships in
data.
Data mining
Why Data Mining???
HUGE VOLUME- THERE IS WAY TOO MUCH DATA &
GROWING!
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
Example of growing DATA
• Data collected much faster than it can be
processed or managed. NASA Earth Observation
System (EOS), alone, collected 15 Peta bytes by
2007 (15,000,000,000,000,000 bytes).
• Much of which won't be used - ever!
• Much of which won't be seen - ever!
• Why not?
• There's so much volume, usefulness of some of
it will never be discovered
Solution to the Problem of Growing
Data
Reduce the volume and/or raise the information
content by structuring, querying, filtering,
summarizing, aggregating, and mining the data.
Claude Shannon's info. theory
More volume, less information
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
Decision Support
The next is the level where machine
supports decision making process by
helping in selecting appropriate
pre-defined rules.
Knowledge
Next is the level where the
machine discovers and learns
rules.
Information
In the next level is the
aggregate/summarized data.
Indexed Data
We have found short cuts, to
reach desired points in the
voluminous data sea, rather than
conventional scanning.
Raw Data
Raw data having maximum
volume
Amount of digital data recording and storage
exploded during the past decade
BUT
number of scientists, engineers, and analysts
available to analyze the data has not
grown correspondingly.
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
• Limitations of OLTP systems
• Massive data sets
• high dimensionality
• new data types
• multiple heterogeneous data resources
The conventional systems couldn’t keep pace with the
ever changing and increasing data sets
• Data mining algorithms are built
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
How Data Mining is different?
▪ Data Warehouses (Data-driven exploration)
 Data Mining (Knowledge-driven exploration)
 Traditional Database (Transactions)
 Knowledge Discovery (KDD)
Data Mining Vs. Statistics
Formal statistical inference is assumption driven
i.e. a hypothesis is formed and validated against
the data.
Data mining is discovery driven i.e. patterns and
hypothesis are automatically extracted from
data.
Knowledge extraction using statistics
Inflation Vs Stock inedx increase
0
10
20
30
40
1.6 1.7 1.8 1.85 1.9 1.95 2 2.9 3 3.3 4.2 4.4 5 6
Inflation (%)
Stockincrease
(%)
Q: What will be the stock increase when inflation is 6%?
A: Model non-linear relationship using a line y = mx + c.
Hence answer is 13%
0
10000
20000
30000
40000
50000
60000
70000
0 5 10 15 20 25 30 35
y = -0.0127x6 + 1.5029x5 - 63.627x4 + 1190.3x3 - 9725.3x2 + 31897x - 29263
-10000
0
10000
20000
30000
40000
50000
60000
70000
0 5 10 15 20 25 30 35
Failure of regression models
Data Mining is…
• Decision Trees
If. . . . .
Then. . .
• Rule Induction
• Clustering
• Genetic Algorithms
• Neural Networks
Supervised
vs.
Unsupervised
Learning
A brief
introduction to
Data Mining
What can Data
Mining do
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
98% of people who purchased items A and B
also purchased item C
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
segmenting a
heterogeneous
population into a
number of more
homogenous sub-
groups or clusters
How many clusters?
How many clusters, now?
How many clusters, finally?
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
To know what is
happening in our
databases is
Beneficial, move the
cube in different
angles to get to
the information of
interest
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Data mining: the core of
knowledge discovery process.
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Where does Data Mining fits
in?
Supervised vs.
Unsupervised
Learning
A brief
introduction to
Data Mining
What can Data
Mining do
Example
Example
Data Structures in Data Mining
• Data matrix
– Table or database
– n records and m
attributes,
– n >> m
C1,1 C1,2 C1,3 C1,m
C2,1 C2,2 C2,3 C2,m
C3,1 C3,2 C3,3 C3,m
Cn,1 Cn,2 Cn,3 Cn,m
…
.
.
.
…
.
.
.
1 S1,2 S1,3 S1,n
S2,1 1 S2,3 S2,n
S3,1 S3,2 1 S3,n
Sn,1 Sn,2 Sn,3 1
…
.
.
.
…
.
.
.
• Similarity matrix
– Symmetric square matrix
– n x n or m x m
Main types of DATA MINING
Supervised
• Bayesian Modeling
• Decision Trees
• Neural Networks
• Etc.
Unsupervised
• One-way Clustering
• Two-way Clustering
Type and number of
classes are NOT
known in advance
Type and number of
classes are known in
advance
Clustering: Min-Max Distance
Age
Salary
20 40 60
outlier Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
One-way clustering example
INPUT OUTPUT
Black spots
are noise
White spots
are missing
data
Data Mining Agriculture data
INPUT Clustered OUTPUT
clusters
Created a similarity matrix using farm area, cotton variety
and pesticide used
Data mining
Which class?
Classifier (model)
Unseen Data
Classification
Output
Confidence
Level (accuracy)
Inputs
How Classification work?
Classification: Model Construction
Training
Data
NAME Time Items Gender
Moin 10 2 M
Munir 16 3 M
Meher 15 1 F
Javed 5 1 M
Mahin 20 1 F
Akram 20 4 M
Classification
Algorithms
IF time/items >= 6
THEN gender = ‘F’
Classifier
(Model)
(observations, measurements, etc.)
Relationship between shopping time and items bought
Classification : Use in Prediction
Testing
Data Unseen Data
(Addan, Time= 15 Items = 1)
Classifier
Gender?
NAME Time Items Gender
Tahir 20 1 M
Younas 11 2 M
Yasin 3 1 M
Clustering vs. Cluster Detection
• In one-way clustering, reordering of rows (or
columns) assembles clusters.
• If the clusters are NOT assembled, they are very
difficult to detect
First you cluster your data and then detect
clusters in the clustered data
Example
A B
The K-Means Clustering
k-means clustering aims to partition ‘n’ observations
into ‘k’ clusters in which each observation belongs to
the cluster with the nearest mean.
k-means algorithm is implemented in
4 steps
1
2
3
4
k-means algorithm is implemented in
4 steps
1
k-means algorithm is implemented in
4 steps
2
k-means algorithm is implemented in
4 steps
3
k-means algorithm is implemented in
4 steps
4
Go back to Step 2,
stop when no more
new assignment
Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
A B
D C
Data Mining is FRUITFUL..!!
Data mining
Data mining

More Related Content

DOCX
Project Report Outlook (
PPTX
Data Mining & Applications
PDF
Devops - Continuous Integration And Continuous Development
PPT
Psikologi Perkembangan 1
PPTX
Data mining , Knowledge Discovery Process, Classification
PPTX
Surveillance
PPT
Data models
Project Report Outlook (
Data Mining & Applications
Devops - Continuous Integration And Continuous Development
Psikologi Perkembangan 1
Data mining , Knowledge Discovery Process, Classification
Surveillance
Data models

What's hot (20)

PPTX
Introduction to Data mining
PPTX
web mining
PPTX
PDF
Data warehousing
PPT
introduction to data mining tutorial
PPTX
Exploratory data analysis with Python
PPTX
OLAP & DATA WAREHOUSE
PPT
Data Warehouse Basic Guide
PPT
Data Mining Concepts
PPTX
Data Reduction Stratergies
PDF
Data mining (lecture 1 & 2) conecpts and techniques
PPTX
Big data by Mithlesh sadh
PPTX
Data mining presentation.ppt
PDF
Data Mining: Association Rules Basics
PPTX
Data Mining: Classification and analysis
PPTX
Classification in data mining
PPTX
Text mining
PDF
Data science presentation 2nd CI day
PPT
PPT
Introduction to Data Mining
Introduction to Data mining
web mining
Data warehousing
introduction to data mining tutorial
Exploratory data analysis with Python
OLAP & DATA WAREHOUSE
Data Warehouse Basic Guide
Data Mining Concepts
Data Reduction Stratergies
Data mining (lecture 1 & 2) conecpts and techniques
Big data by Mithlesh sadh
Data mining presentation.ppt
Data Mining: Association Rules Basics
Data Mining: Classification and analysis
Classification in data mining
Text mining
Data science presentation 2nd CI day
Introduction to Data Mining
Ad

Similar to Data mining (20)

PPTX
lec01-IntroductionToDataMining.pptx
PDF
Data mining chapter for students of university
PDF
Module-1-IntroductionToDataMining (Data Mining)
PPTX
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
PPT
Machine Learning, Data Mining, and
PPTX
Data preparation and processing chapter 2
PPT
Chapter 01Intro.ppt full explanation used
PPT
Introduction to Data Mining and technologies .ppt
PPT
Introduction of Data Mining - Concept and techniques
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
PDF
Data mining
PPSX
Data Mining and Data Warehousing (MAKAUT)
PPTX
Data Mining : Concepts and Techniques
PPTX
Explorartory Data Analytics and Knowledge Discovery techniques.pptx
PDF
DM-Unit-1-Part 1-R.pdf
PPTX
Data mining and its applications!
PPT
Data mining Introduction
PDF
Data mining and Machine learning expained in jargon free & lucid language
PPTX
Unit 1.pptx
PPTX
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
lec01-IntroductionToDataMining.pptx
Data mining chapter for students of university
Module-1-IntroductionToDataMining (Data Mining)
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Machine Learning, Data Mining, and
Data preparation and processing chapter 2
Chapter 01Intro.ppt full explanation used
Introduction to Data Mining and technologies .ppt
Introduction of Data Mining - Concept and techniques
Unit 1 (Chapter-1) on data mining concepts.ppt
Data mining
Data Mining and Data Warehousing (MAKAUT)
Data Mining : Concepts and Techniques
Explorartory Data Analytics and Knowledge Discovery techniques.pptx
DM-Unit-1-Part 1-R.pdf
Data mining and its applications!
Data mining Introduction
Data mining and Machine learning expained in jargon free & lucid language
Unit 1.pptx
Chap1-Introduction.pptx. Data Mining and introduction about it in a specified...
Ad

More from Kinza Razzaq (10)

PPTX
Leadership in technology
PPTX
Governance Analysis using enterprise architecture
PPTX
Risk Management
PPTX
Ipv4 and Ipv6
PPTX
Internet wan
PPT
The internet protocols and OSI Model
PPTX
HDLC and Point to point protocol
PDF
Operating system
PPTX
Point to point interconnect
PPTX
Recruitment and selection
Leadership in technology
Governance Analysis using enterprise architecture
Risk Management
Ipv4 and Ipv6
Internet wan
The internet protocols and OSI Model
HDLC and Point to point protocol
Operating system
Point to point interconnect
Recruitment and selection

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
A Presentation on Artificial Intelligence
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
A Presentation on Artificial Intelligence
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...

Data mining

  • 3. “There are things that we know that we know(Known knowns)… There are things that we know that we don’t know(Known unknowns)… There are things that we don’t know we don’t know(Unknown unknowns)… There are things that we don’t know we know(Unknown knowns)”
  • 4. “There are things that we know that we know(Known knowns)… There are things that we know that we don’t know(Known unknowns)… There are things that we don’t know we don’t know(Unknown unknowns)… There are things that we don’t know we know(Unknown knowns)”
  • 5. Data mining has relevance to the fourth point in red. It is an art of digging out what exactly we don’t know that we must know in our business. The methodology is to first convert “unknown unknowns” into “known unknowns” and then finally to “known knowns”.
  • 7. Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence Data Mining works with Data Warehouse
  • 8. What is Data Mining? • Knowledge Discovery in Databases (KDD). • Data mining digs out valuable, non-trivial information from large multidimensional apparently unrelated data base. • It’s the integration of business knowledge, people, information, algorithms, statistics and computing technology. • Finding useful hidden patterns and relationships in data.
  • 11. HUGE VOLUME- THERE IS WAY TOO MUCH DATA & GROWING! Bridging the gap Supply & Demand To minimize the volume
  • 12. Example of growing DATA • Data collected much faster than it can be processed or managed. NASA Earth Observation System (EOS), alone, collected 15 Peta bytes by 2007 (15,000,000,000,000,000 bytes). • Much of which won't be used - ever! • Much of which won't be seen - ever! • Why not? • There's so much volume, usefulness of some of it will never be discovered
  • 13. Solution to the Problem of Growing Data Reduce the volume and/or raise the information content by structuring, querying, filtering, summarizing, aggregating, and mining the data.
  • 14. Claude Shannon's info. theory More volume, less information Bridging the gap Supply & Demand To minimize the volume
  • 15. Decision Support The next is the level where machine supports decision making process by helping in selecting appropriate pre-defined rules. Knowledge Next is the level where the machine discovers and learns rules. Information In the next level is the aggregate/summarized data. Indexed Data We have found short cuts, to reach desired points in the voluminous data sea, rather than conventional scanning. Raw Data Raw data having maximum volume
  • 16. Amount of digital data recording and storage exploded during the past decade BUT number of scientists, engineers, and analysts available to analyze the data has not grown correspondingly. Bridging the gap Supply & Demand To minimize the volume
  • 17. • Limitations of OLTP systems • Massive data sets • high dimensionality • new data types • multiple heterogeneous data resources The conventional systems couldn’t keep pace with the ever changing and increasing data sets • Data mining algorithms are built Bridging the gap Supply & Demand To minimize the volume
  • 18. How Data Mining is different? ▪ Data Warehouses (Data-driven exploration)  Data Mining (Knowledge-driven exploration)  Traditional Database (Transactions)  Knowledge Discovery (KDD)
  • 19. Data Mining Vs. Statistics Formal statistical inference is assumption driven i.e. a hypothesis is formed and validated against the data. Data mining is discovery driven i.e. patterns and hypothesis are automatically extracted from data.
  • 20. Knowledge extraction using statistics Inflation Vs Stock inedx increase 0 10 20 30 40 1.6 1.7 1.8 1.85 1.9 1.95 2 2.9 3 3.3 4.2 4.4 5 6 Inflation (%) Stockincrease (%) Q: What will be the stock increase when inflation is 6%? A: Model non-linear relationship using a line y = mx + c. Hence answer is 13%
  • 21. 0 10000 20000 30000 40000 50000 60000 70000 0 5 10 15 20 25 30 35 y = -0.0127x6 + 1.5029x5 - 63.627x4 + 1190.3x3 - 9725.3x2 + 31897x - 29263 -10000 0 10000 20000 30000 40000 50000 60000 70000 0 5 10 15 20 25 30 35 Failure of regression models
  • 22. Data Mining is… • Decision Trees If. . . . . Then. . . • Rule Induction • Clustering • Genetic Algorithms • Neural Networks
  • 24. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description
  • 25. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description
  • 26. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description
  • 27. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description 98% of people who purchased items A and B also purchased item C
  • 28. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description segmenting a heterogeneous population into a number of more homogenous sub- groups or clusters
  • 31. How many clusters, finally?
  • 32. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description To know what is happening in our databases is Beneficial, move the cube in different angles to get to the information of interest
  • 38. Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation Where does Data Mining fits in?
  • 39. Supervised vs. Unsupervised Learning A brief introduction to Data Mining What can Data Mining do
  • 42. Data Structures in Data Mining • Data matrix – Table or database – n records and m attributes, – n >> m C1,1 C1,2 C1,3 C1,m C2,1 C2,2 C2,3 C2,m C3,1 C3,2 C3,3 C3,m Cn,1 Cn,2 Cn,3 Cn,m … . . . … . . . 1 S1,2 S1,3 S1,n S2,1 1 S2,3 S2,n S3,1 S3,2 1 S3,n Sn,1 Sn,2 Sn,3 1 … . . . … . . . • Similarity matrix – Symmetric square matrix – n x n or m x m
  • 43. Main types of DATA MINING Supervised • Bayesian Modeling • Decision Trees • Neural Networks • Etc. Unsupervised • One-way Clustering • Two-way Clustering Type and number of classes are NOT known in advance Type and number of classes are known in advance
  • 44. Clustering: Min-Max Distance Age Salary 20 40 60 outlier Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 45. One-way clustering example INPUT OUTPUT Black spots are noise White spots are missing data
  • 46. Data Mining Agriculture data INPUT Clustered OUTPUT clusters Created a similarity matrix using farm area, cotton variety and pesticide used
  • 50. Classification: Model Construction Training Data NAME Time Items Gender Moin 10 2 M Munir 16 3 M Meher 15 1 F Javed 5 1 M Mahin 20 1 F Akram 20 4 M Classification Algorithms IF time/items >= 6 THEN gender = ‘F’ Classifier (Model) (observations, measurements, etc.) Relationship between shopping time and items bought
  • 51. Classification : Use in Prediction Testing Data Unseen Data (Addan, Time= 15 Items = 1) Classifier Gender? NAME Time Items Gender Tahir 20 1 M Younas 11 2 M Yasin 3 1 M
  • 52. Clustering vs. Cluster Detection • In one-way clustering, reordering of rows (or columns) assembles clusters. • If the clusters are NOT assembled, they are very difficult to detect First you cluster your data and then detect clusters in the clustered data
  • 54. The K-Means Clustering k-means clustering aims to partition ‘n’ observations into ‘k’ clusters in which each observation belongs to the cluster with the nearest mean.
  • 55. k-means algorithm is implemented in 4 steps 1 2 3 4
  • 56. k-means algorithm is implemented in 4 steps 1
  • 57. k-means algorithm is implemented in 4 steps 2
  • 58. k-means algorithm is implemented in 4 steps 3
  • 59. k-means algorithm is implemented in 4 steps 4 Go back to Step 2, stop when no more new assignment
  • 60. Example 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 A B D C
  • 61. Data Mining is FRUITFUL..!!