SlideShare a Scribd company logo
DATA MINING
SUBMITTED BY :
SHUBHAM GUPTA, SUMAN CHATTERJEE,
SIDDHARTH TIU
SUBMITTED TO :
Dr. A.C.S. Rao
3
1. What is Data Mining
Data mining is the process of discovering interesting patterns (or knowledge)
from large amounts of data.
The data sources can include databases, data warehouses, the Web, other
information repositories, or data that are streamed into the system dynamically.
Why Data Mining
 Credit ratings/targeted marketing:
 Given a database of 100,000 names, which persons are the
least likely to default on their credit cards?
 Identify likely responders to sales promotions
 Fraud detection
 Which types of transactions are likely to be fraudulent, given
the demographics and transactional history of a particular
customer?
 Customer relationship management:
 Which of my customers are likely to be the most loyal, and
which are most likely to leave for a competitor? :
Data mining
 Process of semi-automatically analyzing large
databases to find patterns that are:
 valid: hold on new data with some certainity
 novel: non-obvious to the system
 useful: should be possible to act on the item
 understandable: humans should be able to interpret
the pattern
 Also known as Knowledge Discovery in Databases (KDD)
Applications
 Banking: loan/credit card approval
 predict good customers based on old customers
 Customer relationship management:
 identify those who are likely to leave for a competitor.
 Targeted marketing:
 identify likely responders to promotions
 Fraud detection: telecommunications, financial
transactions
 from an online stream of event identify fraudulent events
 Manufacturing and production:
 automatically adjust knobs when process parameter changes
Applications (continued)
 Medicine: disease outcome, effectiveness of
treatments
 analyze patient disease history: find relationship between
diseases
 Molecular/Pharmaceutical: identify new drugs
 Scientific data analysis:
 identify new galaxies by searching for sub clusters
 Web site/store design and promotion:
 find affinity of visitor to pages and modify layout
Data Mining Techniques
 Classification
 Clustering
 Regression
 Association Rules
Classification Models
 Neural networks
 Statistical models – linear/quadratic discriminants
 Decision trees
 Genetic models
8
Decision Trees
9
Technique for Classification
 Decision-Tree Classifiers
Job
Income
Job
Income Income
Carpenter
Engineer Doctor
Bad Good Bad Good Bad Good
<30K <40K <50K>50K >90K
>100K
Predicting credit risk of a person with the jobs specified.
Decision trees
 Tree where internal nodes are simple decision rules on
one or more attributes and leaf nodes are predicted
class labels.
Salary < 1 M
Prof = teacher
Good
Age < 30
BadBad
Good
Decision Trees
 A decision tree T encodes d (a classifier or regression function) in form of a
tree.
 A node t in T without children is called a leaf node. Otherwise t is called an
internal node.
12
Internal Nodes
 Each internal node has an associated splitting predicate. Most common are
binary predicates.
Example predicates:
 Age <= 20
 Profession in {student, teacher}
 5000*Age + 3*Salary – 10000 > 0
13
Leaf Nodes
Consider leaf node t:
 Classification problem: Node t is labeled with one class label c in
dom(C)
 Regression problem: Two choices
 Piecewise constant model:
t is labeled with a constant y in dom(Y).
 Piecewise linear model:
t is labeled with a linear model
Y = yt + Σ aiXi
14
Example
Encoded classifier:
If (age<30 and
carType=Minivan)
Then YES
If (age <30 and
(carType=Sports or
carType=Truck))
Then NO
If (age >= 30)
Then YES
15
Minivan
Age
Car Type
YES NO
YES
<30 >=30
Sports, Truck
Why Decision Tree Model?
 Relatively fast compared to other classification models
 Obtain similar and sometimes better accuracy compared to other models
 Simple and easy to understand
 Can be converted into simple and easy to understand classification rules
16
Pros and Cons of decision trees
· Cons
- Cannot handle complicated
relationship between features
- simple decision boundaries
- problems with lots of missing
data
· Pros
+ Reasonable training
time
+ Fast application
+ Easy to interpret
+ Easy to implement
+ Can handle large
number of features
Consumer Profiling
Businesses need to effectively leverage
available data to improve customer
acquisition and retention. We will explore
how analytics tools such as decision
trees can help with customer
acquisition.
EXAMPLE
A manufacturer of home improvement
equipment wants to identify which
existing customers are best candidates
for a new product they are developing.
A decision tree such as the one shown
below
Customer Profiling using Data Mining
Clustering
 Group Data into Clusters
 Similar data is grouped in the same cluster
 Dissimilar data is grouped in the same cluster
 How is this achieved ?
 K-Nearest Neighbor
 A classification method that classifies a point by calculating the
distances between the point and points in the training data set.
Then it assigns the point to the class that is most common among
its k-nearest neighbors (where k is an integer).(2)
 Hierarchical
 Group data into t-trees
Regression
 “Regression deals with the prediction of a value, rather than a class.”
(1, P747)
 Example: Find out if there is a relationship between smoking patients
and cancer related illness.
 Given values: X1, X2... Xn
 Objective predict variable Y
 One way is to predict coefficients a0, a1, a2
 Y = a0 + a1X1 + a2X2 + … anXn
 Linear Regression
Association Rules
 “An association algorithm creates rules that describe how often
events have occurred together.” (2)
 Example: When a customer buys a hammer, then 90% of the
time they will buy nails.
Advantages of Data Mining
 Provides new knowledge from existing data
 Public databases
 Government sources
 Company Databases
 Old data can be used to develop new knowledge
 New knowledge can be used to improve services or products
 Improvements lead to:
 Bigger profits
 More efficient service
Uses of Data Mining
 Sales/ Marketing
 Diversify target market
 Identify clients needs to increase response rates
 Risk Assessment
 Identify Customers that pose high credit risk
 Fraud Detection
 Identify people misusing the system. E.g. People who have two Social
Security Numbers
 Customer Care
 Identify customers likely to change providers
 Identify customer needs
Relationship with other fields
 Overlaps with machine learning, statistics,
artificial intelligence, databases, visualization
but more stress on
 scalability of number of features and instances
 stress on algorithms and architectures whereas
foundations of methods and formulations provided
by statistics and machine learning.
 automation for handling large, heterogeneous data
THANK YOU

More Related Content

PPTX
Introduction to data mining technique
PPTX
Introduction to Business Data Analytics
PPTX
Data Mining: Classification and analysis
PPTX
Presentation on K-Means Clustering
PDF
Data Mining Techniques
PDF
Credit card fraud detection using python machine learning
PPTX
Business analytics
Introduction to data mining technique
Introduction to Business Data Analytics
Data Mining: Classification and analysis
Presentation on K-Means Clustering
Data Mining Techniques
Credit card fraud detection using python machine learning
Business analytics

What's hot (20)

PPTX
BUSINESS INTELLIGENCE
PPTX
What Is Unstructured Data And Why Is It So Important To Businesses?
PPTX
Business analytics and data mining
PPTX
Augmented analytics will push the analytics adoption
PPTX
Predictive Analytics - An Introduction
PPTX
Data Visualization & Analytics.pptx
PDF
"Introduction to Data Visualization" Workshop for General Assembly by Hunter ...
PDF
Data Analytics PowerPoint Presentation Slides
PDF
Lecture 1: What is Machine Learning?
PPTX
Business intelligence 101
PPTX
Data Visualization
PPTX
Data Analytics
PPT
Idiro Analytics - Analytics & Big Data
PDF
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
PPT
Intro to social network analysis | What is Network Analysis? | History of (So...
PPTX
Data Analytics: Improving Business
PPTX
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
PPTX
Data mining , Knowledge Discovery Process, Classification
PPTX
Business intelligence
BUSINESS INTELLIGENCE
What Is Unstructured Data And Why Is It So Important To Businesses?
Business analytics and data mining
Augmented analytics will push the analytics adoption
Predictive Analytics - An Introduction
Data Visualization & Analytics.pptx
"Introduction to Data Visualization" Workshop for General Assembly by Hunter ...
Data Analytics PowerPoint Presentation Slides
Lecture 1: What is Machine Learning?
Business intelligence 101
Data Visualization
Data Analytics
Idiro Analytics - Analytics & Big Data
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Intro to social network analysis | What is Network Analysis? | History of (So...
Data Analytics: Improving Business
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Data mining , Knowledge Discovery Process, Classification
Business intelligence
Ad

Similar to Customer Profiling using Data Mining (20)

PPT
Data mining and its concepts
PPTX
Data analytics and visualization
PPT
Cluster2
PPT
PPT
PPT
Datamining for crm
PPTX
Data Mining Lec1.pptx
PPT
Chapter14 example2
PPTX
1. Introduction to Data Mining (12).pptx
PDF
Data MiningData MiningData MiningData Mining
PPT
Data mining techniques and dss
PPT
datamining.ppt
PPTX
datamining management slyabbus and ppt.pptx
PPT
datamining.ppt
PPTX
Introduction to Business Analytics---PPT
PPTX
Unit 1.pptx
PPT
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
PPT
Dwdm ppt for the btech student contain basis
PPTX
Dwd mdatamining intro-iep
PDF
Chapter 1.pdf
Data mining and its concepts
Data analytics and visualization
Cluster2
Datamining for crm
Data Mining Lec1.pptx
Chapter14 example2
1. Introduction to Data Mining (12).pptx
Data MiningData MiningData MiningData Mining
Data mining techniques and dss
datamining.ppt
datamining management slyabbus and ppt.pptx
datamining.ppt
Introduction to Business Analytics---PPT
Unit 1.pptx
1328cvkdlgkdgjfdkjgjdfgdfkgdflgkgdfglkjgld8679 - Copy.ppt
Dwdm ppt for the btech student contain basis
Dwd mdatamining intro-iep
Chapter 1.pdf
Ad

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Microbiology with diagram medical studies .pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
2. Earth - The Living Planet earth and life
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Cell Membrane: Structure, Composition & Functions
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Comparative Structure of Integument in Vertebrates.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The KM-GBF monitoring framework – status & key messages.pptx
protein biochemistry.ppt for university classes
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Microbiology with diagram medical studies .pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
TOTAL hIP ARTHROPLASTY Presentation.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
lecture 2026 of Sjogren's syndrome l .pdf
AlphaEarth Foundations and the Satellite Embedding dataset
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
2. Earth - The Living Planet earth and life

Customer Profiling using Data Mining

  • 1. DATA MINING SUBMITTED BY : SHUBHAM GUPTA, SUMAN CHATTERJEE, SIDDHARTH TIU SUBMITTED TO : Dr. A.C.S. Rao
  • 2. 3 1. What is Data Mining Data mining is the process of discovering interesting patterns (or knowledge) from large amounts of data. The data sources can include databases, data warehouses, the Web, other information repositories, or data that are streamed into the system dynamically.
  • 3. Why Data Mining  Credit ratings/targeted marketing:  Given a database of 100,000 names, which persons are the least likely to default on their credit cards?  Identify likely responders to sales promotions  Fraud detection  Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?  Customer relationship management:  Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? :
  • 4. Data mining  Process of semi-automatically analyzing large databases to find patterns that are:  valid: hold on new data with some certainity  novel: non-obvious to the system  useful: should be possible to act on the item  understandable: humans should be able to interpret the pattern  Also known as Knowledge Discovery in Databases (KDD)
  • 5. Applications  Banking: loan/credit card approval  predict good customers based on old customers  Customer relationship management:  identify those who are likely to leave for a competitor.  Targeted marketing:  identify likely responders to promotions  Fraud detection: telecommunications, financial transactions  from an online stream of event identify fraudulent events  Manufacturing and production:  automatically adjust knobs when process parameter changes
  • 6. Applications (continued)  Medicine: disease outcome, effectiveness of treatments  analyze patient disease history: find relationship between diseases  Molecular/Pharmaceutical: identify new drugs  Scientific data analysis:  identify new galaxies by searching for sub clusters  Web site/store design and promotion:  find affinity of visitor to pages and modify layout
  • 7. Data Mining Techniques  Classification  Clustering  Regression  Association Rules
  • 8. Classification Models  Neural networks  Statistical models – linear/quadratic discriminants  Decision trees  Genetic models 8
  • 10. Technique for Classification  Decision-Tree Classifiers Job Income Job Income Income Carpenter Engineer Doctor Bad Good Bad Good Bad Good <30K <40K <50K>50K >90K >100K Predicting credit risk of a person with the jobs specified.
  • 11. Decision trees  Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels. Salary < 1 M Prof = teacher Good Age < 30 BadBad Good
  • 12. Decision Trees  A decision tree T encodes d (a classifier or regression function) in form of a tree.  A node t in T without children is called a leaf node. Otherwise t is called an internal node. 12
  • 13. Internal Nodes  Each internal node has an associated splitting predicate. Most common are binary predicates. Example predicates:  Age <= 20  Profession in {student, teacher}  5000*Age + 3*Salary – 10000 > 0 13
  • 14. Leaf Nodes Consider leaf node t:  Classification problem: Node t is labeled with one class label c in dom(C)  Regression problem: Two choices  Piecewise constant model: t is labeled with a constant y in dom(Y).  Piecewise linear model: t is labeled with a linear model Y = yt + Σ aiXi 14
  • 15. Example Encoded classifier: If (age<30 and carType=Minivan) Then YES If (age <30 and (carType=Sports or carType=Truck)) Then NO If (age >= 30) Then YES 15 Minivan Age Car Type YES NO YES <30 >=30 Sports, Truck
  • 16. Why Decision Tree Model?  Relatively fast compared to other classification models  Obtain similar and sometimes better accuracy compared to other models  Simple and easy to understand  Can be converted into simple and easy to understand classification rules 16
  • 17. Pros and Cons of decision trees · Cons - Cannot handle complicated relationship between features - simple decision boundaries - problems with lots of missing data · Pros + Reasonable training time + Fast application + Easy to interpret + Easy to implement + Can handle large number of features
  • 18. Consumer Profiling Businesses need to effectively leverage available data to improve customer acquisition and retention. We will explore how analytics tools such as decision trees can help with customer acquisition.
  • 19. EXAMPLE A manufacturer of home improvement equipment wants to identify which existing customers are best candidates for a new product they are developing. A decision tree such as the one shown below
  • 21. Clustering  Group Data into Clusters  Similar data is grouped in the same cluster  Dissimilar data is grouped in the same cluster  How is this achieved ?  K-Nearest Neighbor  A classification method that classifies a point by calculating the distances between the point and points in the training data set. Then it assigns the point to the class that is most common among its k-nearest neighbors (where k is an integer).(2)  Hierarchical  Group data into t-trees
  • 22. Regression  “Regression deals with the prediction of a value, rather than a class.” (1, P747)  Example: Find out if there is a relationship between smoking patients and cancer related illness.  Given values: X1, X2... Xn  Objective predict variable Y  One way is to predict coefficients a0, a1, a2  Y = a0 + a1X1 + a2X2 + … anXn  Linear Regression
  • 23. Association Rules  “An association algorithm creates rules that describe how often events have occurred together.” (2)  Example: When a customer buys a hammer, then 90% of the time they will buy nails.
  • 24. Advantages of Data Mining  Provides new knowledge from existing data  Public databases  Government sources  Company Databases  Old data can be used to develop new knowledge  New knowledge can be used to improve services or products  Improvements lead to:  Bigger profits  More efficient service
  • 25. Uses of Data Mining  Sales/ Marketing  Diversify target market  Identify clients needs to increase response rates  Risk Assessment  Identify Customers that pose high credit risk  Fraud Detection  Identify people misusing the system. E.g. People who have two Social Security Numbers  Customer Care  Identify customers likely to change providers  Identify customer needs
  • 26. Relationship with other fields  Overlaps with machine learning, statistics, artificial intelligence, databases, visualization but more stress on  scalability of number of features and instances  stress on algorithms and architectures whereas foundations of methods and formulations provided by statistics and machine learning.  automation for handling large, heterogeneous data