SlideShare a Scribd company logo
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Classification and Prediction
Chapter 8
Introduction
There are two forms of data analysis that can be used for extracting models
describing important classes or to predict future data trends.
These two forms are as follows:
 Classification
 Prediction
Introduction
 Classification models predict categorical class labels; and prediction models
predict continuous valued functions.
 For example, we can build a classification model to categorize bank loan
applications as either safe or risky
 Prediction model to predict the expenditures in dollars of potential customers
on computer equipment given their income and occupation.
What is classification?
 Following are the examples of cases where the data analysis task is
Classification:
 A bank loan officer wants to analyze the data in order to know which customer
(loan applicant) is risky or which are safe.
 A marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new computer.
 In both of the above examples, a model or classifier is constructed to predict
the categorical labels. These labels are risky or safe for loan application data
and yes or no for marketing data.
What is prediction?
 Following are the examples of cases where the data analysis task is Prediction:
 Suppose the marketing manager needs to predict how much a given customer
will spend during a sale at his company.
 In this example we are bothered to predict a numeric value. Therefore the data
analysis task is an example of numeric prediction. In this case, a model or a
predictor will be constructed that predicts a continuous-valued-function or
ordered value.
 Regression analysis is a statistical methodology that is most often used for
numeric prediction.
How Does Classification Works?
With the help of the bank loan application that we have discussed above, let us
understand the working of classification. The Data Classification process includes
two steps:
 Building the Classifier or Model
 Using Classifier for Classification
Building the Classifier or Model
 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their
associated class labels.
 Each tuple that constitutes the training set is referred to as a category or class.
These tuples can also be referred to as sample, object or data points.
Building the Classifier or Model
Using Classifier for Classification
 In this step, the classifier is used for classification.
 Here the test data is used to estimate the accuracy of classification rules.
 The classification rules can be applied to the new data tuples if the accuracy is
considered acceptable.
Classification by Decision Tree Induction
Decision tree induction is the learning of decision trees from class labeled
training tuples.
Decision tree is a flowchart-like tree structure where internal nodes (non leaf
node) denotes a test on an attribute branches represent outcomes of tests Leaf
nodes (terminal nodes) hold class labels and Root node is the topmost node.
Classification by Decision Tree Induction
Classification by Decision Tree Induction
Example
RID age income student credit-rating Class
1 youth high no fair ?
Test on age: youth
Test of student: no
Reach leaf node
Class NO: the customer Is Unlikely to buy a computer
Algorithm for constructing Decision Tress
Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide-
and-conquer manner.
• At start, all the training tuples are at the root
• Tuples are partitioned recursively based on selected attributes
• If all samples for a given node belong to the same class
Label the class
• If There are no remaining attributes for further partitioning
Majority voting is employed for classifying the leaf
• There are no samples left
Label the class and terminate
• Else
Got to step 2
K-Mean Algorithms
1. Take mean value
2. Find nearest number of mean and put in cluster.
3. Repeat one and two until we get same mean
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

More Related Content

PPTX
Introduction to data mining and data warehousing
PPTX
Classification and prediction in data mining
PPTX
Data mining: Classification and prediction
PPTX
Data mining an introduction
PDF
Ghhh
PPTX
The 8 Step Data Mining Process
PPTX
Data Cleaning Techniques
PDF
Introduction to Data Mining
Introduction to data mining and data warehousing
Classification and prediction in data mining
Data mining: Classification and prediction
Data mining an introduction
Ghhh
The 8 Step Data Mining Process
Data Cleaning Techniques
Introduction to Data Mining

What's hot (19)

PPTX
Data Mining
PDF
Data mining and data warehouse lab manual updated
PPTX
Data mining concepts and work
PPTX
01 Introduction to Data Mining
PPT
Datawarehousing
ODP
Data mining
PPT
Lecture1
PPT
Cssu dw dm
PPT
Data Warehouse By Piyush
PPTX
142230 633685297550892500
PPT
Database
PPT
Chapter 13 data warehousing
PPTX
Introduction to Datamining Concept and Techniques
PPT
Part1
PPTX
Data mining presentation.ppt
PPTX
4 Data preparation and processing
PPT
Data warehousing and online analytical processing
PPT
1.2 steps and functionalities
PPT
Data mininng trends
Data Mining
Data mining and data warehouse lab manual updated
Data mining concepts and work
01 Introduction to Data Mining
Datawarehousing
Data mining
Lecture1
Cssu dw dm
Data Warehouse By Piyush
142230 633685297550892500
Database
Chapter 13 data warehousing
Introduction to Datamining Concept and Techniques
Part1
Data mining presentation.ppt
4 Data preparation and processing
Data warehousing and online analytical processing
1.2 steps and functionalities
Data mininng trends
Ad

Similar to Research trends in data warehousing and data mining (20)

PPTX
Chapter4-ML.pptx slide for concept of mechanic learning
PPTX
dataminingclassificationprediction123 .pptx
PPTX
3 classification
PPTX
Machine learning Chapter three (16).pptx
PPTX
UNIT 3: Data Warehousing and Data Mining
PPT
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
PDF
classification in data mining and data warehousing.pdf
PDF
IJCSI-10-6-1-288-292
PDF
BIM Data Mining Unit3 by Tekendra Nath Yogi
PPT
classification in data warehouse and mining
PPT
Data Mining
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
PDF
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
PPT
Business Analytics using R.ppt
PPTX
Classification techniques in data mining
PPT
Unit-4 classification
PPT
Dm bs-lec7-classification - dti
PPTX
Unit 4 Classification of data and more info on it
PPT
2.1 Data Mining-classification Basic concepts
Chapter4-ML.pptx slide for concept of mechanic learning
dataminingclassificationprediction123 .pptx
3 classification
Machine learning Chapter three (16).pptx
UNIT 3: Data Warehousing and Data Mining
CS 402 DATAMINING AND WAREHOUSING -MODULE 3
classification in data mining and data warehousing.pdf
IJCSI-10-6-1-288-292
BIM Data Mining Unit3 by Tekendra Nath Yogi
classification in data warehouse and mining
Data Mining
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
Business Analytics using R.ppt
Classification techniques in data mining
Unit-4 classification
Dm bs-lec7-classification - dti
Unit 4 Classification of data and more info on it
2.1 Data Mining-classification Basic concepts
Ad

More from Er. Nawaraj Bhandari (20)

PPTX
Data mining approaches and methods
PPTX
Mining Association Rules in Large Database
PPTX
Data warehouse testing
PPTX
Data warehouse physical design
PPTX
Data warehouse logical design
PPTX
Chapter 3: Simplification of Boolean Function
PPTX
Chapter 6: Sequential Logic
PPTX
Chapter 5: Cominational Logic with MSI and LSI
PPTX
Chapter 4: Combinational Logic
PPTX
Chapter 2: Boolean Algebra and Logic Gates
PPTX
Chapter 1: Binary System
PPTX
Introduction to Electronic Commerce
PPT
Evaluating software development
PPT
Using macros in microsoft excel part 2
PPT
Using macros in microsoft excel part 1
PPTX
Using macros in microsoft access
PPTX
Testing software development
PPTX
Application software and business processes
PPTX
An introduction to vba and macros
PPTX
An introduction to end user software development
Data mining approaches and methods
Mining Association Rules in Large Database
Data warehouse testing
Data warehouse physical design
Data warehouse logical design
Chapter 3: Simplification of Boolean Function
Chapter 6: Sequential Logic
Chapter 5: Cominational Logic with MSI and LSI
Chapter 4: Combinational Logic
Chapter 2: Boolean Algebra and Logic Gates
Chapter 1: Binary System
Introduction to Electronic Commerce
Evaluating software development
Using macros in microsoft excel part 2
Using macros in microsoft excel part 1
Using macros in microsoft access
Testing software development
Application software and business processes
An introduction to vba and macros
An introduction to end user software development

Recently uploaded (20)

PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Global journeys: estimating international migration
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Knowledge Engineering Part 1
Launch Your Data Science Career in Kochi – 2025
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Global journeys: estimating international migration
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Miokarditis (Inflamasi pada Otot Jantung)
IBA_Chapter_11_Slides_Final_Accessible.pptx
Database Infoormation System (DBIS).pptx
Fluorescence-microscope_Botany_detailed content
Galatica Smart Energy Infrastructure Startup Pitch Deck
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
climate analysis of Dhaka ,Banglades.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Reliability_Chapter_ presentation 1221.5784
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Knowledge Engineering Part 1

Research trends in data warehousing and data mining

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Classification and Prediction Chapter 8
  • 2. Introduction There are two forms of data analysis that can be used for extracting models describing important classes or to predict future data trends. These two forms are as follows:  Classification  Prediction
  • 3. Introduction  Classification models predict categorical class labels; and prediction models predict continuous valued functions.  For example, we can build a classification model to categorize bank loan applications as either safe or risky  Prediction model to predict the expenditures in dollars of potential customers on computer equipment given their income and occupation.
  • 4. What is classification?  Following are the examples of cases where the data analysis task is Classification:  A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe.  A marketing manager at a company needs to analyze a customer with a given profile, who will buy a new computer.  In both of the above examples, a model or classifier is constructed to predict the categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.
  • 5. What is prediction?  Following are the examples of cases where the data analysis task is Prediction:  Suppose the marketing manager needs to predict how much a given customer will spend during a sale at his company.  In this example we are bothered to predict a numeric value. Therefore the data analysis task is an example of numeric prediction. In this case, a model or a predictor will be constructed that predicts a continuous-valued-function or ordered value.  Regression analysis is a statistical methodology that is most often used for numeric prediction.
  • 6. How Does Classification Works? With the help of the bank loan application that we have discussed above, let us understand the working of classification. The Data Classification process includes two steps:  Building the Classifier or Model  Using Classifier for Classification
  • 7. Building the Classifier or Model  This step is the learning step or the learning phase.  In this step the classification algorithms build the classifier.  The classifier is built from the training set made up of database tuples and their associated class labels.  Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object or data points.
  • 9. Using Classifier for Classification  In this step, the classifier is used for classification.  Here the test data is used to estimate the accuracy of classification rules.  The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.
  • 10. Classification by Decision Tree Induction Decision tree induction is the learning of decision trees from class labeled training tuples. Decision tree is a flowchart-like tree structure where internal nodes (non leaf node) denotes a test on an attribute branches represent outcomes of tests Leaf nodes (terminal nodes) hold class labels and Root node is the topmost node.
  • 11. Classification by Decision Tree Induction
  • 12. Classification by Decision Tree Induction Example RID age income student credit-rating Class 1 youth high no fair ? Test on age: youth Test of student: no Reach leaf node Class NO: the customer Is Unlikely to buy a computer
  • 13. Algorithm for constructing Decision Tress Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide- and-conquer manner. • At start, all the training tuples are at the root • Tuples are partitioned recursively based on selected attributes • If all samples for a given node belong to the same class Label the class • If There are no remaining attributes for further partitioning Majority voting is employed for classifying the leaf • There are no samples left Label the class and terminate • Else Got to step 2
  • 14. K-Mean Algorithms 1. Take mean value 2. Find nearest number of mean and put in cluster. 3. Repeat one and two until we get same mean
  • 15. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3

Editor's Notes

  • #14: Example is inside hardcopy please follow hardcopy
  • #15: Example is inside hardcopy please follow hardcopy