SlideShare a Scribd company logo
Module 3
• Introduction to Classification and Prediction
• Issues regarding classification and prediction
• Decision Tree- ID3, C4.5
• Naive Bayes Classifier
1
1
6/30/2020 NIMMY RAJU,AP,VKCET,TVM
• Classification and prediction are two forms of data analysis that
extracts models describing important data classes or to predict
future data trends.
• Classification models (classifiers)predicts categorical labels
(discrete, unordered )
• Prediction models (Predictors) continuos-valued functions.
• For example, we can build a classification model to categorize
bank loan applications as either safe or risky, while a prediction
model may be built to predict the expenditures of potential
customers on computer equipment given their income and
occupation .
26/30/2020 NIMMY RAJU,AP,VKCET,TVM
Classification and Prediction has numerous applications,
including
• fraud detection
• performance prediction
• medical diagnosis
36/30/2020 NIMMY RAJU,AP,VKCET,TVM
What Is Classification?
• Data classification is a two-step process, consisting of a
learning step (where a classification model is constructed) and
a classification step (where the model is used to predict class
labels for given data).
• In the first step, a classifier is built describing a predetermined
set of data classes or concepts. This is the learning step (or
training phase), where a classification algorithm builds the
classifier by analyzing or “learning from” a training set made
up of database tuples and their associated class labels.
4
4
6/30/2020 NIMMY RAJU,AP,VKCET,TVM
• A tuple, X, is represented by an n-dimensional
attribute vector, X = (x1, x2,..., xn), represents n
database attributes, respectively, A1, A2,..., An.
• Each tuple, X, is assumed to belong to a predefined
class as determined by another database attribute
called the class label attribute.
• The class label attribute is discrete-valued and
unordered. It is categorical (or nominal) in that each
value serves as a category or class.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 5
• The individual tuples making up the training set are
referred to as training tuples and are randomly
sampled from the database under analysis.
• Data tuples can be referred to as samples,
examples, instances, data points, or objects.
• Because the class label of each training tuple is
provided, this step is also known as supervised
learning (i.e., the learning of the classifier is
“supervised” in that it is told to which class each
training tuple belongs).
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 6
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 7
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 8
• This first step of the classification process can also be viewed as
the learning of a mapping or function, y = f (X), that can predict
the associated class label y of a given tuple X.
• Typically, this mapping is represented in the form of
classification rules, decision trees, or mathematical formulae.
• In our example, the mapping is represented as classification
rules that identify loan applications as being either safe or risky
.
• The rules can be used to categorize future data tuples, as well
as provide deeper insight into the data contents. They also
provide a compressed data representation.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 9
“What about classification accuracy?”
• In the second step the model is used for classification.
• First, the predictive accuracy of the classifier is estimated.
• A test set is used, made up of test tuples and their associated
class labels.
• They are independent of the training tuples, meaning that they
were not used to construct the classifier. .
• The accuracy of a classifier on a given test set is the percentage
of test set tuples that are correctly classified by the classifier.
• The associated class label of each test tuple is compared with
the learned classifier’s class prediction for that tuple.
• If the accuracy of the classifier is considered acceptable, the
classifier can be used to classify future data tuples for which the
class label is not known.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 10
What Is Prediction ?
• Data prediction is a two step process, similar to that
of data classification .
– First, construct a model
– Second, use model to predict unknown value
• Attribute for which values are being predicted is
continuous-valued (ordered) rather than categorical
(discrete-valued and unordered). The attribute can be
referred to simply as the predicted attribute.
• Suppose that, in our example, we instead wanted to
predict the amount that would be “safe” for the bank
to loan an applicant
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 11
Issues regarding classification and prediction
1.Preparing data for classification and prediction
• Data cleaning
– Preprocess data in order to reduce noise and handle
missing values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize and/or normalize data
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 12
• prediction can also be viewed as a mapping or function, y = f
(X), where X
is the input (e.g., a tuple describing a loan applicant), and the
output y is a continuous or ordered value (such as the
predicted amount that the bank can safely loan the applicant) .
• As with classification, the training set used to build a predictor
should not be used to assess its accuracy.
• An independent test set should be used instead.
• The accuracy of a predictor is estimated by computing an error
based on the difference between the predicted value and the
actual known value of y for each of the test tuples, X.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 13
• Comparing Classification and Prediction Methods
• accuracy: This refers to ability of the model to correctly
classify or predict the class label of new or previously unseen
data.
• Speed: This refers to the computation costs involved in
generating and using the model.
• Robustness: This is the ability of the model to make correct
predictions given noisy data or data with the missing values.
• Scalability: This refers to the ability to construct the model
efficiently given large amounts of data.
• Interpretability: This refers to the level of understanding and
insight that is provided by the model.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 14
3.Issues in Classification
• Missing Data. Missing data values cause problems
during both the training phase and the classification
process itself.
• There are many approaches to handling missing data:
• Ignore the missing data.
• Assume a value for the missing data.
Assume a special value for the missing data.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 15
DECISION TREE-BASED ALGORITHMS
• The decision tree approach is most useful in
classification problems.
• With this technique, a tree is constructed to model
the classification process.
• Once the tree is built, it is applied to each tuple in
the database and results in a classification for that
tuple.
• There are two basic steps in the technique: building
the tree and applying the tree to the database.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 16
• Definition:Given a database D = {t1 , ... , tn } where ti
= (ti1 , . .. , tih} and the database schema contains the
following attributes {A1 , A2, ... , Ah }. Also given is a
set of classes C = { C 1 , ... , C m}.
• A decision tree(DT) or classification tree is a tree
associated with D that has the following properties:
• Each internal node is labeled with an attribute, Ai.
• Each arc is labeled with a predicate that can be
applied to the attribute associated with the parent.
• Each leaf node is labeled with a class, C j.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 17
• Solving the classification problem using decision
trees is a two-step process:
1. Decision tree induction: Construct a DT using
training data.
2. For each ti E D, apply the DT to determine its
class.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 18
Advantages
• DTs certainly are easy to use and efficient.
• Rules can be generated that are easy to interpret
and understand.
• They scale well for large databases because the
tree size is independent of the database size.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 19
Disadvantages
• Do not easily handle continuous data.
• Handling missing data is difficult because correct
branches in the tree could not be taken.
• Since the DT is constructed from the training data,
overfitting may occur. This can be overcome via tree
pruning.
• Finally, correlations among attributes in the database
are ignored by the DT process.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 20
Algorithm
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 21
• This recursive algorithm builds the tree in a top-down
fashion by examining the training data.
• Using the initial training data, the "best" splitting
attribute is chosen first.
• Algorithms differ in how they determine the "best
attribute" and its "best predicates" to use for splitting.
• Once this has been determined, the node and its arcs
are created and added to the created tree.
• The algorithm continues recursively by adding new
subtrees to each branching arc.
• The algorithm terminates when some "stopping
criteria" is reached
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 22
• Again, each algorithm determines when to stop the
tree differently.
• One simple approach would be to stop when the
tuples in the reduced training set all belong to the
same class.
• This class is then used to label the leaf node
created.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 23
The following issues are faced by most DT algorithms:
• Choosing splitting attributes:
• Ordering of splitting attributes
• Splits:
• Tree structure
• Training data
• Stopping criteria:
• Pruning:
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 24
ID3
• The ID3 technique to building a decision tree is based on information theory and
attempts to minimize the expected number of comparisons.
• The basic strategy used by ID3 is to choose splitting attributes with the highest
information gain first.
• The concept used to quantify information is called entropy.
• Entropy is used to measure the amount of uncertainty or surprise or randomness
in a set of data.
• Certainly, when all data in a set belong to a single class, there is no uncertainty. In
this case the entropy is zero.
• The objective of decision tree classification is to iteratively partition the given
data set into subsets where all elements in each final subset belong to the same
class.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 25
DEFINITION
Given probabilities PI , P2, ... , Ps where, entropy is
defined as
• Each step in ID3 chooses the state that orders splitting the
most.
• A database state is completely ordered if all tuples in it are
in the same class.
• ID3 chooses the splitting attribute with the highest gain in
information, where gain is defined as the difference
between how much information is needed to make a
correct classification before the split versus how much
information is needed after the split.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 26
• This is calculated by determining the differences
between the entropies of the original dataset and the
weighted sum of the entropies from each of the
subdivided datasets.
• The ID3 algorithm calculates the gain of a particular
split by the following formula:
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 27
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 28
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 29
Naive Bayes Classifier
• Bayesian classifiers predict class membership probabilities,
such as the probability that a given tuple belongs to a
particular class.
• Bayesian classification is based on Bayes’ theorem.
• Naive Bayesian classifier is a simple Bayesian classifier .
• Bayesian classifiers have also exhibited high accuracy and
speed when applied to large databases.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 30
Bayes’ Theorem
• Let X be a data tuple.
• In Bayesian terms, X is considered “evidence.”
• It is described by measurements made on a set of n attributes.
• Let H be some hypothesis, such as that the data tuple X
belongs to a specified class C.
For classification problems:
• Determine P(H|X), the probability that the hypothesis H holds
given the “evidence” or observed data tuple X.
• In other words, it is the probability that tuple X belongs to
class C, given the attribute description of X.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 31
Bayes’ Theorem
• P(H|X) is the posterior probability, or a posteriori
probability, of H conditioned on X.
• Eg:data tuples is confined to customers described by
the attributes age and income, respectively, and that
X is a 35-year-old customer with an income of
$40,000. Suppose that H is the hypothesis that our
customer will buy a computer.
• Then P(H|X) reflects the probability that customer X
will buy a computer given that we know the
customer’s age and income.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 32
• P(H) is the prior probability, or a priori probability, of
H.
Eg:This is the probability that any given customer will
buy a computer, regardless of age, income, or any
other information, for that matter.
• The posterior probability, P(H|X),
is based on more information (e.g., customer
information) than the prior probability,
P(H), which is independent of X.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 33
• P(X|H) is the posterior probability of X conditioned
on H.
• Eg: That is, it is the probability that a customer, X, is
35 years old and earns $40,000, given that we know
the customer will buy a computer.
• P(X) is the prior probability of X.
• Using our example, it is the probability that a person
from our set of customers is 35 years old and earns
$40,000 .
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 34
• How are these probabilities estimated?”
• P(H), P(X|H), and P(X) may be estimated from the
given data.
• Bayes’ theorem is useful in that it provides
• a way of calculating the posterior probability,
P(HjX), from P(H), P(XjH), and P(X)
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 35
Naive Bayes Classifier
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 36
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 37
• As P(X) is constant for all classes, only P(X|Ci)P(Ci)
need be maximized.
• Note that the class prior probabilities may be
estimated by P(Ci) = Ci /D, where Ci,D is the number
of training tuples of class Ci in D.
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 38
• Given data sets with many attributes, it would be
extremely computationally expensive to compute
P(X|Ci). In order to reduce computation in evaluating
P(X|Ci), presumes that the values of the attributes
are conditionally independent of one another, given
the class label of the tuple
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 39
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 40
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 41
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 42
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 43
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 44
6/30/2020 NIMMY RAJU,AP,VKCET,TVM 45

More Related Content

PPT
CS 402 DATAMINING AND WAREHOUSING -MODULE 2
PPTX
CS 402 DATAMINING AND WAREHOUSING -MODULE 4
PPTX
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
PPTX
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
PPTX
CS 402 DATAMINING AND WAREHOUSING -MODULE 6
PPT
Mining Frequent Patterns, Association and Correlations
PPT
3. mining frequent patterns
PDF
Density Based Clustering
CS 402 DATAMINING AND WAREHOUSING -MODULE 2
CS 402 DATAMINING AND WAREHOUSING -MODULE 4
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 6
Mining Frequent Patterns, Association and Correlations
3. mining frequent patterns
Density Based Clustering

What's hot (20)

PPT
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
2.1 Data Mining-classification Basic concepts
PPTX
Data Mining Primitives, Languages & Systems
PPTX
Knowledge Discovery and Data Mining
PPT
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPTX
Classification Algorithm.
PDF
08. Mining Type Of Complex Data
PPTX
Classification in data mining
PPTX
Introduction to Machine Learning
PPT
Classification (ML).ppt
PPTX
05 Clustering in Data Mining
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
PPTX
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
PDF
Decision tree lecture 3
ODP
Introduction To Data Warehousing
PPT
Indexing and Hashing
PPTX
Credit defaulter analysis
PPTX
APRIORI ALGORITHM -PPT.pptx
PPTX
Decision tree induction
PPTX
Dijkstra & flooding ppt(Routing algorithm)
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
2.1 Data Mining-classification Basic concepts
Data Mining Primitives, Languages & Systems
Knowledge Discovery and Data Mining
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Classification Algorithm.
08. Mining Type Of Complex Data
Classification in data mining
Introduction to Machine Learning
Classification (ML).ppt
05 Clustering in Data Mining
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Decision tree lecture 3
Introduction To Data Warehousing
Indexing and Hashing
Credit defaulter analysis
APRIORI ALGORITHM -PPT.pptx
Decision tree induction
Dijkstra & flooding ppt(Routing algorithm)
Ad

Similar to CS 402 DATAMINING AND WAREHOUSING -MODULE 3 (20)

PPTX
UNIT 3: Data Warehousing and Data Mining
PPTX
Unit 4 Classification of data and more info on it
PPTX
dataminingclassificationprediction123 .pptx
PPTX
Chapter4-ML.pptx slide for concept of mechanic learning
PDF
classification in data mining and data warehousing.pdf
PDF
Data mining chapter04and5-best
PPTX
Machine learning Chapter three (16).pptx
PDF
BIM Data Mining Unit3 by Tekendra Nath Yogi
PPT
4_22865_IS465_2019_1__2_1_08ClassBasic.ppt
PPT
Unit-4 classification
PPTX
Research trends in data warehousing and data mining
PPT
3 DM Classification HFCS kilometres .ppt
PPT
Business Analytics using R.ppt
PDF
IJCSI-10-6-1-288-292
PPTX
NN Classififcation Neural Network NN.pptx
PDF
A Decision Tree Based Classifier for Classification & Prediction of Diseases
PPTX
unit classification.pptx
PDF
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
PPT
Data Mining
PPT
classification in data warehouse and mining
UNIT 3: Data Warehousing and Data Mining
Unit 4 Classification of data and more info on it
dataminingclassificationprediction123 .pptx
Chapter4-ML.pptx slide for concept of mechanic learning
classification in data mining and data warehousing.pdf
Data mining chapter04and5-best
Machine learning Chapter three (16).pptx
BIM Data Mining Unit3 by Tekendra Nath Yogi
4_22865_IS465_2019_1__2_1_08ClassBasic.ppt
Unit-4 classification
Research trends in data warehousing and data mining
3 DM Classification HFCS kilometres .ppt
Business Analytics using R.ppt
IJCSI-10-6-1-288-292
NN Classififcation Neural Network NN.pptx
A Decision Tree Based Classifier for Classification & Prediction of Diseases
unit classification.pptx
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
Data Mining
classification in data warehouse and mining
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
additive manufacturing of ss316l using mig welding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering
Internet of Things (IOT) - A guide to understanding
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
bas. eng. economics group 4 presentation 1.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
additive manufacturing of ss316l using mig welding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

CS 402 DATAMINING AND WAREHOUSING -MODULE 3

  • 1. Module 3 • Introduction to Classification and Prediction • Issues regarding classification and prediction • Decision Tree- ID3, C4.5 • Naive Bayes Classifier 1 1 6/30/2020 NIMMY RAJU,AP,VKCET,TVM
  • 2. • Classification and prediction are two forms of data analysis that extracts models describing important data classes or to predict future data trends. • Classification models (classifiers)predicts categorical labels (discrete, unordered ) • Prediction models (Predictors) continuos-valued functions. • For example, we can build a classification model to categorize bank loan applications as either safe or risky, while a prediction model may be built to predict the expenditures of potential customers on computer equipment given their income and occupation . 26/30/2020 NIMMY RAJU,AP,VKCET,TVM
  • 3. Classification and Prediction has numerous applications, including • fraud detection • performance prediction • medical diagnosis 36/30/2020 NIMMY RAJU,AP,VKCET,TVM
  • 4. What Is Classification? • Data classification is a two-step process, consisting of a learning step (where a classification model is constructed) and a classification step (where the model is used to predict class labels for given data). • In the first step, a classifier is built describing a predetermined set of data classes or concepts. This is the learning step (or training phase), where a classification algorithm builds the classifier by analyzing or “learning from” a training set made up of database tuples and their associated class labels. 4 4 6/30/2020 NIMMY RAJU,AP,VKCET,TVM
  • 5. • A tuple, X, is represented by an n-dimensional attribute vector, X = (x1, x2,..., xn), represents n database attributes, respectively, A1, A2,..., An. • Each tuple, X, is assumed to belong to a predefined class as determined by another database attribute called the class label attribute. • The class label attribute is discrete-valued and unordered. It is categorical (or nominal) in that each value serves as a category or class. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 5
  • 6. • The individual tuples making up the training set are referred to as training tuples and are randomly sampled from the database under analysis. • Data tuples can be referred to as samples, examples, instances, data points, or objects. • Because the class label of each training tuple is provided, this step is also known as supervised learning (i.e., the learning of the classifier is “supervised” in that it is told to which class each training tuple belongs). 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 6
  • 9. • This first step of the classification process can also be viewed as the learning of a mapping or function, y = f (X), that can predict the associated class label y of a given tuple X. • Typically, this mapping is represented in the form of classification rules, decision trees, or mathematical formulae. • In our example, the mapping is represented as classification rules that identify loan applications as being either safe or risky . • The rules can be used to categorize future data tuples, as well as provide deeper insight into the data contents. They also provide a compressed data representation. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 9
  • 10. “What about classification accuracy?” • In the second step the model is used for classification. • First, the predictive accuracy of the classifier is estimated. • A test set is used, made up of test tuples and their associated class labels. • They are independent of the training tuples, meaning that they were not used to construct the classifier. . • The accuracy of a classifier on a given test set is the percentage of test set tuples that are correctly classified by the classifier. • The associated class label of each test tuple is compared with the learned classifier’s class prediction for that tuple. • If the accuracy of the classifier is considered acceptable, the classifier can be used to classify future data tuples for which the class label is not known. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 10
  • 11. What Is Prediction ? • Data prediction is a two step process, similar to that of data classification . – First, construct a model – Second, use model to predict unknown value • Attribute for which values are being predicted is continuous-valued (ordered) rather than categorical (discrete-valued and unordered). The attribute can be referred to simply as the predicted attribute. • Suppose that, in our example, we instead wanted to predict the amount that would be “safe” for the bank to loan an applicant 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 11
  • 12. Issues regarding classification and prediction 1.Preparing data for classification and prediction • Data cleaning – Preprocess data in order to reduce noise and handle missing values • Relevance analysis (feature selection) – Remove the irrelevant or redundant attributes • Data transformation – Generalize and/or normalize data 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 12
  • 13. • prediction can also be viewed as a mapping or function, y = f (X), where X is the input (e.g., a tuple describing a loan applicant), and the output y is a continuous or ordered value (such as the predicted amount that the bank can safely loan the applicant) . • As with classification, the training set used to build a predictor should not be used to assess its accuracy. • An independent test set should be used instead. • The accuracy of a predictor is estimated by computing an error based on the difference between the predicted value and the actual known value of y for each of the test tuples, X. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 13
  • 14. • Comparing Classification and Prediction Methods • accuracy: This refers to ability of the model to correctly classify or predict the class label of new or previously unseen data. • Speed: This refers to the computation costs involved in generating and using the model. • Robustness: This is the ability of the model to make correct predictions given noisy data or data with the missing values. • Scalability: This refers to the ability to construct the model efficiently given large amounts of data. • Interpretability: This refers to the level of understanding and insight that is provided by the model. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 14
  • 15. 3.Issues in Classification • Missing Data. Missing data values cause problems during both the training phase and the classification process itself. • There are many approaches to handling missing data: • Ignore the missing data. • Assume a value for the missing data. Assume a special value for the missing data. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 15
  • 16. DECISION TREE-BASED ALGORITHMS • The decision tree approach is most useful in classification problems. • With this technique, a tree is constructed to model the classification process. • Once the tree is built, it is applied to each tuple in the database and results in a classification for that tuple. • There are two basic steps in the technique: building the tree and applying the tree to the database. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 16
  • 17. • Definition:Given a database D = {t1 , ... , tn } where ti = (ti1 , . .. , tih} and the database schema contains the following attributes {A1 , A2, ... , Ah }. Also given is a set of classes C = { C 1 , ... , C m}. • A decision tree(DT) or classification tree is a tree associated with D that has the following properties: • Each internal node is labeled with an attribute, Ai. • Each arc is labeled with a predicate that can be applied to the attribute associated with the parent. • Each leaf node is labeled with a class, C j. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 17
  • 18. • Solving the classification problem using decision trees is a two-step process: 1. Decision tree induction: Construct a DT using training data. 2. For each ti E D, apply the DT to determine its class. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 18
  • 19. Advantages • DTs certainly are easy to use and efficient. • Rules can be generated that are easy to interpret and understand. • They scale well for large databases because the tree size is independent of the database size. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 19
  • 20. Disadvantages • Do not easily handle continuous data. • Handling missing data is difficult because correct branches in the tree could not be taken. • Since the DT is constructed from the training data, overfitting may occur. This can be overcome via tree pruning. • Finally, correlations among attributes in the database are ignored by the DT process. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 20
  • 22. • This recursive algorithm builds the tree in a top-down fashion by examining the training data. • Using the initial training data, the "best" splitting attribute is chosen first. • Algorithms differ in how they determine the "best attribute" and its "best predicates" to use for splitting. • Once this has been determined, the node and its arcs are created and added to the created tree. • The algorithm continues recursively by adding new subtrees to each branching arc. • The algorithm terminates when some "stopping criteria" is reached 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 22
  • 23. • Again, each algorithm determines when to stop the tree differently. • One simple approach would be to stop when the tuples in the reduced training set all belong to the same class. • This class is then used to label the leaf node created. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 23
  • 24. The following issues are faced by most DT algorithms: • Choosing splitting attributes: • Ordering of splitting attributes • Splits: • Tree structure • Training data • Stopping criteria: • Pruning: 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 24
  • 25. ID3 • The ID3 technique to building a decision tree is based on information theory and attempts to minimize the expected number of comparisons. • The basic strategy used by ID3 is to choose splitting attributes with the highest information gain first. • The concept used to quantify information is called entropy. • Entropy is used to measure the amount of uncertainty or surprise or randomness in a set of data. • Certainly, when all data in a set belong to a single class, there is no uncertainty. In this case the entropy is zero. • The objective of decision tree classification is to iteratively partition the given data set into subsets where all elements in each final subset belong to the same class. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 25
  • 26. DEFINITION Given probabilities PI , P2, ... , Ps where, entropy is defined as • Each step in ID3 chooses the state that orders splitting the most. • A database state is completely ordered if all tuples in it are in the same class. • ID3 chooses the splitting attribute with the highest gain in information, where gain is defined as the difference between how much information is needed to make a correct classification before the split versus how much information is needed after the split. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 26
  • 27. • This is calculated by determining the differences between the entropies of the original dataset and the weighted sum of the entropies from each of the subdivided datasets. • The ID3 algorithm calculates the gain of a particular split by the following formula: 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 27
  • 30. Naive Bayes Classifier • Bayesian classifiers predict class membership probabilities, such as the probability that a given tuple belongs to a particular class. • Bayesian classification is based on Bayes’ theorem. • Naive Bayesian classifier is a simple Bayesian classifier . • Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 30
  • 31. Bayes’ Theorem • Let X be a data tuple. • In Bayesian terms, X is considered “evidence.” • It is described by measurements made on a set of n attributes. • Let H be some hypothesis, such as that the data tuple X belongs to a specified class C. For classification problems: • Determine P(H|X), the probability that the hypothesis H holds given the “evidence” or observed data tuple X. • In other words, it is the probability that tuple X belongs to class C, given the attribute description of X. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 31
  • 32. Bayes’ Theorem • P(H|X) is the posterior probability, or a posteriori probability, of H conditioned on X. • Eg:data tuples is confined to customers described by the attributes age and income, respectively, and that X is a 35-year-old customer with an income of $40,000. Suppose that H is the hypothesis that our customer will buy a computer. • Then P(H|X) reflects the probability that customer X will buy a computer given that we know the customer’s age and income. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 32
  • 33. • P(H) is the prior probability, or a priori probability, of H. Eg:This is the probability that any given customer will buy a computer, regardless of age, income, or any other information, for that matter. • The posterior probability, P(H|X), is based on more information (e.g., customer information) than the prior probability, P(H), which is independent of X. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 33
  • 34. • P(X|H) is the posterior probability of X conditioned on H. • Eg: That is, it is the probability that a customer, X, is 35 years old and earns $40,000, given that we know the customer will buy a computer. • P(X) is the prior probability of X. • Using our example, it is the probability that a person from our set of customers is 35 years old and earns $40,000 . 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 34
  • 35. • How are these probabilities estimated?” • P(H), P(X|H), and P(X) may be estimated from the given data. • Bayes’ theorem is useful in that it provides • a way of calculating the posterior probability, P(HjX), from P(H), P(XjH), and P(X) 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 35
  • 36. Naive Bayes Classifier 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 36
  • 38. • As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. • Note that the class prior probabilities may be estimated by P(Ci) = Ci /D, where Ci,D is the number of training tuples of class Ci in D. 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 38
  • 39. • Given data sets with many attributes, it would be extremely computationally expensive to compute P(X|Ci). In order to reduce computation in evaluating P(X|Ci), presumes that the values of the attributes are conditionally independent of one another, given the class label of the tuple 6/30/2020 NIMMY RAJU,AP,VKCET,TVM 39