SlideShare a Scribd company logo
DATA MINING
What is Data Mining?
•New buzzword, old idea.
•“The process of semi automatically analyzing large
databases to find useful patterns” (Silberschatz)
•KDD – “Knowledge Discovery in Databases”
•Inferring new information from already collected data.
•Areas of Use :
Internet – Discover needs of customers
Economics – Predict stock prices
Science – Predict environmental change
Medicine – Match patients with similar problems 
cure
Data Mining –Main Components
Wikipedia definition : “Data mining is the entire process of
applying computer-based methodology, including new
techniques for knowledge discovery, from data.”
Knowledge Discovery
Concrete information gleaned from known data. Data you may
not have known, but which is supported by recorded facts.
Knowledge Prediction
Uses known data to forecast future trends, events, etc
Wikipedia note: "some data mining systems such as neural
networks are inherently geared towards prediction and pattern
recognition, rather than knowledge discovery.“ These include
applications in AI and Symbol analysis
Data Warehouse:
“is a repository (or archive) of information gathered from
multiple sources, stored under a unified schema, at a single
site.” (Silberschatz)
Collect data  Store in single repository
Allows for easier query development as a single repository can be
queried.
Data Mining:
Analyzing databases or Data Warehouses to discover patterns
about the data to gain knowledge.
Data Mining & Data Warehousing
Data Mining Techniques
•Classification
•Clustering
•Regression
•Association Rules
Classification
•Classification: Given a set of items that have several classes, and
given the past instances (training instances) with their associated
class, Classification is the process of predicting the class of a new item.
•Therefore to classify the new item and identify to which class it
belongs
•Example:
A bank wants to classify its Home Loan Customers into groups
according to their response to bank advertisements. The bank might
use the classifications “Responds Rarely, Responds Sometimes,
Responds Frequently”.
The bank will then attempt to find rules about the customers that
respond Frequently and Sometimes.
The rules could be used to predict needs of potential customers.
Clustering
“Clustering algorithms find groups of items that are similar. …
It divides a data set so that records with similar content are in
the same group, and groups are as different as possible from
each other. ”
Example:
Insurance company could use clustering to group
clients by their age, location and types of
insurance purchased.
The categories are unspecified and this is referred to as
‘unsupervised learning’
Regression
“Regression deals with the prediction of a value, rather than a
class
Example:
Find out if there is a relationship between smoking patients and
cancer related illness.
Given values: X1, X2... Xn
Objective predict variable Y
One way is to predict coefficients a0, a1, a2
Y = a0 + a1X1 + a2X2 + … anXn
Linear Regression
.
Regression
Example graph:
Line of Best Fit
Curve Fitting
.
Association Rules
An association algorithm creates rules that describe how often
events have occurred together.”
Example: When a customer buys a hammer, then 90% of
the time they will buy nails.
Uses of Data Mining
AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus
developing intelligent AI opponents. (ie: Chess)
Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and purchasing
patterns.
Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict
possible complications (read: lawsuits) down the line.
Uses of Data Mining (Cont.)
Sales/ Marketing
Diversify target market
Identify clients needs to increase response rates
Fraud Detection
Identify people misusing the system. E.g. People who have two
Social Security Numbers
Customer Care
Identify customers likely to change providers
Identify customer needs
Sources of Data for Mining
•Databases
•Text Documents
•Computer Simulations
•Social Networks
Privacy Concerns
•Effective Data Mining requires large sources of data
•To achieve a wide spectrum of data, link multiple data
sources
•Linking sources leads can be problematic for privacy as
follows:
If the following histories of a customer were linked:
•Shopping History
•Credit History
•Bank History
•Employment History
•The users life story can be painted from the collected data
THANK YOU

More Related Content

PPTX
Data mining
PPTX
Data mining
DOCX
MC0088 Internal Assignment (SMU)
PDF
Data Mining Techniques
PPT
Data mining by_ashok
PPTX
What is Data mining? Data mining Presentation
PPTX
Data mining introduction
Data mining
Data mining
MC0088 Internal Assignment (SMU)
Data Mining Techniques
Data mining by_ashok
What is Data mining? Data mining Presentation
Data mining introduction

What's hot (18)

PPT
Data mining
PPTX
Data Mining & Applications
PPTX
Data Mining: What is Data Mining?
PPTX
Data mining techniques
PPTX
Data Mining
PPTX
Data Mining: Applying data mining
DOC
Data mining notes
PDF
Data Mining: Future Trends and Applications
DOCX
data mining and data warehousing
ODP
Data mining
PPTX
Application of data mining
PPTX
Data mining
PPTX
Data Mining: Application and trends in data mining
PPTX
Data mining
PPTX
Data mining
PPTX
Additional themes of data mining for Msc CS
PPTX
Introduction to Data Mining
PPTX
Data mining services
Data mining
Data Mining & Applications
Data Mining: What is Data Mining?
Data mining techniques
Data Mining
Data Mining: Applying data mining
Data mining notes
Data Mining: Future Trends and Applications
data mining and data warehousing
Data mining
Application of data mining
Data mining
Data Mining: Application and trends in data mining
Data mining
Data mining
Additional themes of data mining for Msc CS
Introduction to Data Mining
Data mining services
Ad

Viewers also liked (8)

PDF
Summarization Techniques in Association Rule Data Mining For Risk Assessment ...
PPT
Crm unit iv (technological tools for crm)
PPTX
What is Data Mining - Olu Campbell
PPT
Ch12.ed wk9businessintelligenceanddecisionsupportsystem
PPT
Data mining
PPTX
Data warehousing and Data mining
PPTX
Data mining
PDF
Data mining (lecture 1 & 2) conecpts and techniques
Summarization Techniques in Association Rule Data Mining For Risk Assessment ...
Crm unit iv (technological tools for crm)
What is Data Mining - Olu Campbell
Ch12.ed wk9businessintelligenceanddecisionsupportsystem
Data mining
Data warehousing and Data mining
Data mining
Data mining (lecture 1 & 2) conecpts and techniques
Ad

Similar to Data mining (20)

PPT
Data mining and its concepts
PPTX
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
PPTX
Data mining an introduction
PPT
Data Mining-2023 (2).ppt
PPTX
Data mining approaches and methods
PPT
Sanjeev Kumar Dash D ata Mining-2023.ppt
PPTX
Week-1-Introduction to Data Mining.pptx
PPT
PPT
DOCX
Seminar Report Vaibhav
PPTX
Data mining
PPTX
Customer Profiling using Data Mining
PPTX
Business analytics and data mining
PPTX
Business analytics and data mining
PPTX
Business analytics and data mining
PPTX
Business analytics and data mining
PPTX
Business analytics and data mining
PPTX
Business analytics and data mining
PPTX
Business analytics and data mining
PPTX
01-data mining-introduction-bayero-u.pptx
Data mining and its concepts
DWDM_UNIT4.pptx ddddddddddddddddddddddddddddd
Data mining an introduction
Data Mining-2023 (2).ppt
Data mining approaches and methods
Sanjeev Kumar Dash D ata Mining-2023.ppt
Week-1-Introduction to Data Mining.pptx
Seminar Report Vaibhav
Data mining
Customer Profiling using Data Mining
Business analytics and data mining
Business analytics and data mining
Business analytics and data mining
Business analytics and data mining
Business analytics and data mining
Business analytics and data mining
Business analytics and data mining
01-data mining-introduction-bayero-u.pptx

More from Cloudbells.com (10)

PPT
Operating-System Structures
PPT
Raid : Redundant Array of Inexpensive Disks
PPT
Internet
PPT
Introduction to Data Management
PPT
Introductin to Data Modeling.
PPT
Client-Server Computing
PPT
Database & Data Security
PPT
Green datacenters
PPT
Big data : Coudbells.com
PPT
Introduction to Web Hosting.
Operating-System Structures
Raid : Redundant Array of Inexpensive Disks
Internet
Introduction to Data Management
Introductin to Data Modeling.
Client-Server Computing
Database & Data Security
Green datacenters
Big data : Coudbells.com
Introduction to Web Hosting.

Recently uploaded (20)

PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
PDF
Testing WebRTC applications at scale.pdf
PPTX
Introuction about ICD -10 and ICD-11 PPT.pptx
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
The Internet -By the Numbers, Sri Lanka Edition
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPTX
Internet___Basics___Styled_ presentation
PDF
Introduction to the IoT system, how the IoT system works
WebRTC in SignalWire - troubleshooting media negotiation
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
Testing WebRTC applications at scale.pdf
Introuction about ICD -10 and ICD-11 PPT.pptx
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Unit-1 introduction to cyber security discuss about how to secure a system
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
Module 1 - Cyber Law and Ethics 101.pptx
An introduction to the IFRS (ISSB) Stndards.pdf
RPKI Status Update, presented by Makito Lay at IDNOG 10
Introuction about WHO-FIC in ICD-10.pptx
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Job_Card_System_Styled_lorem_ipsum_.pptx
The Internet -By the Numbers, Sri Lanka Edition
Paper PDF World Game (s) Great Redesign.pdf
Slides PPTX World Game (s) Eco Economic Epochs.pptx
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Internet___Basics___Styled_ presentation
Introduction to the IoT system, how the IoT system works

Data mining

  • 2. What is Data Mining? •New buzzword, old idea. •“The process of semi automatically analyzing large databases to find useful patterns” (Silberschatz) •KDD – “Knowledge Discovery in Databases” •Inferring new information from already collected data. •Areas of Use : Internet – Discover needs of customers Economics – Predict stock prices Science – Predict environmental change Medicine – Match patients with similar problems  cure
  • 3. Data Mining –Main Components Wikipedia definition : “Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.” Knowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts. Knowledge Prediction Uses known data to forecast future trends, events, etc Wikipedia note: "some data mining systems such as neural networks are inherently geared towards prediction and pattern recognition, rather than knowledge discovery.“ These include applications in AI and Symbol analysis
  • 4. Data Warehouse: “is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site.” (Silberschatz) Collect data  Store in single repository Allows for easier query development as a single repository can be queried. Data Mining: Analyzing databases or Data Warehouses to discover patterns about the data to gain knowledge. Data Mining & Data Warehousing
  • 6. Classification •Classification: Given a set of items that have several classes, and given the past instances (training instances) with their associated class, Classification is the process of predicting the class of a new item. •Therefore to classify the new item and identify to which class it belongs •Example: A bank wants to classify its Home Loan Customers into groups according to their response to bank advertisements. The bank might use the classifications “Responds Rarely, Responds Sometimes, Responds Frequently”. The bank will then attempt to find rules about the customers that respond Frequently and Sometimes. The rules could be used to predict needs of potential customers.
  • 7. Clustering “Clustering algorithms find groups of items that are similar. … It divides a data set so that records with similar content are in the same group, and groups are as different as possible from each other. ” Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased. The categories are unspecified and this is referred to as ‘unsupervised learning’
  • 8. Regression “Regression deals with the prediction of a value, rather than a class Example: Find out if there is a relationship between smoking patients and cancer related illness. Given values: X1, X2... Xn Objective predict variable Y One way is to predict coefficients a0, a1, a2 Y = a0 + a1X1 + a2X2 + … anXn Linear Regression .
  • 9. Regression Example graph: Line of Best Fit Curve Fitting .
  • 10. Association Rules An association algorithm creates rules that describe how often events have occurred together.” Example: When a customer buys a hammer, then 90% of the time they will buy nails.
  • 11. Uses of Data Mining AI/Machine Learning Combinatorial/Game Data Mining Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess) Business Strategies Market Basket Analysis Identify customer demographics, preferences, and purchasing patterns. Risk Analysis Product Defect Analysis Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.
  • 12. Uses of Data Mining (Cont.) Sales/ Marketing Diversify target market Identify clients needs to increase response rates Fraud Detection Identify people misusing the system. E.g. People who have two Social Security Numbers Customer Care Identify customers likely to change providers Identify customer needs
  • 13. Sources of Data for Mining •Databases •Text Documents •Computer Simulations •Social Networks
  • 14. Privacy Concerns •Effective Data Mining requires large sources of data •To achieve a wide spectrum of data, link multiple data sources •Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked: •Shopping History •Credit History •Bank History •Employment History •The users life story can be painted from the collected data