SlideShare a Scribd company logo
Alex Mirugwe | Classification Models
Classification
Instructor:
Alex Mirugwe
Victoria University
1
Alex Mirugwe | Classification Models
Classification Modelling
In this topic, we are going to study approaches for predicting
qualitative responses, a process that is known as classification.
Victoria University 2
Alex Mirugwe | Classification Models
Classification
Predicting a qualitative response for an observation can be referred to as
classifying that observation since it involves assigning the observation to a
category, or class.
On the other hand, often the methods used for classification first predict
the probability of each of the categories of a qualitative variable, as the
basis for making the classification.
Victoria University
3
Alex Mirugwe | Classification Models
Methods
There are many possible classification techniques, or classifiers, that
one might use to predict a qualitative response including,
• Logistic Regression
• Linear Discriminant Analysis (LDA)
• K-Nearest Neighbors (KNN)
• Trees, Random Forests, and Boosting
• Support Vector Machines (SVM)
• Neural Networks
Victoria University
4
Alex Mirugwe | Classification Models
An Overview of Classification
Classification problems occur often, perhaps even more so than regression
problems. Some examples include:
1. A person arrives at the emergency room with a set of symptoms that
could possibly be attributed to one of three medical conditions. Which of
the three conditions does the individual have?
2. An online banking service must be able to determine whether or not a
transaction being performed on the site is fraudulent, on the basis of the
user’s IP address, past transaction history, and so forth.
3. On the basis of DNA sequence data for a number of patients with and
without a given disease, a biologist would like to figure out which DNA
mutations are deleterious (disease-causing) and which are not.
Victoria University
5
Alex Mirugwe | Classification Models
Example
Consider a Default data set, where the response default falls
into one of two categories, Yes or No. Rather than modeling
this response Y directly, logistic regression models the
probability that Y belongs to a particular category.
Victoria University
6
Alex Mirugwe | Classification Models
Example
Dataset: Simulated Default dataset (n=10,000):
• default: A factor with levels No and Yes indicating whether
the customer defaulted on their debt.
• student: A factor with levels No and Yes indicating whether
the customer is a student
• balance: The average balance that the customer has
remaining on their credit card after making their monthly
payment
• income: Income of customer
Problem: Predicting whether an individual will default on his or
her credit card payment, on the basis of annual income and
monthly credit card balance.
Victoria University
7
Alex Mirugwe | Classification Models
Some Probability Basics to Remember
Victoria University
8
Alex Mirugwe | Classification Models
The following are useful theorems which are derived directly from
Kolmogrov's axioms.
Victoria University
9
Alex Mirugwe | Classification Models
Example
Victoria University
10
Alex Mirugwe | Classification Models
Probability Concepts
Victoria University
11
Alex Mirugwe | Classification Models
Conditional Probability
Victoria University
12
Alex Mirugwe | Classification Models
Multiplication Rule
Victoria University
13
Alex Mirugwe | Classification Models
Independent Events
Victoria University
14
Alex Mirugwe | Classification Models
Multiplication Rule
Victoria University
15
Alex Mirugwe | Classification Models
Multiplication Rule
Victoria University
16
Alex Mirugwe | Classification Models
Logistic Regression
For the Default data, logistic regression models the probability of default. For
example, the probability of default given balance can be written as
Pr(default = Yes|balance).
The values of Pr(default = Yes|balance), which we abbreviate p(balance), will
range between 0 and 1. Then for any given value of balance, a prediction can
be made for default. For example, one might predict default = Yes for any
individual for whom p(balance) > 0.5. Alternatively, if a company wishes to be
conservative in predicting individuals who are at risk for default, then they
may choose to use a lower threshold, such as p(balance) > 0.1.
Victoria University
17
Alex Mirugwe | Classification Models
Binary response variables
Consider a binary classification problem - an observation is classified as belonging to either one of
two classes.
Define a binary response variable:
𝑌 =
1 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 `𝑐𝑙𝑎𝑠𝑠 𝐴`
0 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 `𝑐𝑙𝑎𝑠𝑠 𝐵1
Can we use the same linear model structure for classification problems?
𝑌 = 𝑓 𝑋 = 𝛽0 + 𝛽1𝑋
say for one predictor.
Victoria University
18
Alex Mirugwe | Classification Models
Logistic Regression: Motivation
Victoria University
19
Alex Mirugwe | Classification Models
Cont’d
If we use this approach to predict default=Yes using balance, then we obtain
the model shown in the left-hand panel of Figure 4.2. Here we see the
problem with this approach: for balances close to zero we predict a negative
probability of default; if we were to predict for very large balances, we would
get values bigger than 1. These predictions are not sensible, since of course
the true probability of default, regardless of the credit card balance, must fall
between 0 and 1. This problem is not unique to the credit default data. Any
time a straight line is fit to a binary response that is coded as 0 or 1, in
principle we can always predict p(X) < 0 for some values of X and p(X) > 1
Victoria University
20
Alex Mirugwe | Classification Models
Cont’d
To avoid this problem, we must model p(X) using a function that gives outputs between 0
and 1 for all values of X. Many functions meet this description. In logistic regression, we use
the logistic function,
𝑝 𝑋 = 𝑓(𝑋) =
𝑒𝛽0+𝛽1𝑋
1 + 𝑒𝛽0+𝛽1𝑋
For more than one predictor
𝑝 𝑋 = 𝑓(𝑋) =
𝑒𝛽0+𝛽1𝑋+⋯+𝛽𝑝𝑋𝑝
1 + 𝑒𝛽0+𝛽1𝑋+⋯+𝛽𝑝𝑋𝑝
The logistic map is a simple sigmoidal (S-shaped) function that asymptotes to 0 and 1.
Victoria University
21
Alex Mirugwe | Classification Models
Logistic Regression: Construction
We employ a sigmoidal transform of the linear model equation to ensure our
model maps to a set of probabilities.
Victoria University
22
Alex Mirugwe | Classification Models
Logistic Regression: Interpretation
The logistic function can be interpreted in terms of `odds’ of one event vs. another:
𝑜𝑑𝑑𝑠 =
𝑝(𝑋)
1 − 𝑝(𝑋)
=
𝑝(𝑌 = 1|𝑋 = 𝑥)
𝑝(𝑌 = 0|𝑋 = 𝑥)
= 𝑒𝛽0+𝛽1𝑋
This is referred to as the odds that Y = 1 vs. Y = 0 given X and odds can take on any
value between 0 and 1.
Victoria University
23
Alex Mirugwe | Classification Models
Making Predictions
Once the coefficients have been estimated, it is a simple matter to compute the
probability of default for any given credit card balance. For example, using the
coefficient estimates given in Table:
we predict that the default probability for an individual with a balance of $1, 000 is
Victoria University
24
Alex Mirugwe | Classification Models
Dummy Variable
One can use qualitative predictors with the logistic regression model using the dummy variable. As an example,
the Default data set contains the qualitative variable student. To fit the model we simply create a dummy
variable that takes on a value of 1 for students and 0 for non-students. The logistic regression model that
results from predicting probability of default from student status can be seen in Table 4.2. The coefficient
associated with the dummy variable is positive,
Victoria University
25
Alex Mirugwe | Classification Models
Model Assessment
Classification Error
Due to the discrete nature of classification problems, one has to take note of the
kinds of classification errors that may arise.
A binary classier can have one of two types of error:
• 1 False Positive: Classify 𝑌 = 1 when Y = 0.
• 2 False Negative: Classify 𝑌 = 0 when Y = 1.
Victoria University
26
Alex Mirugwe | Classification Models
Performance Measure
Victoria University
27
Alex Mirugwe | Classification Models
In turn, these quantities define the:
True positive rate (TPR) – Sensitivity – the proportion of positive cases correctly
identified (sensitivity).
False positive rate (FPR) - (1-Specificity) – the proportion of negative cases
incorrectly identified as positive.
As we vary the decision threshold from 0 to 1, these rates may vary.
By plotting the TPR vs. the FPR for varying thresholds we construct what is known
as the receiver operating characteristic (ROC) curve.
Victoria University
28
Alex Mirugwe | Classification Models
Victoria University
29
Go to page 101 of the Hands-on Machine Learning with Scikit-Learn,
Keras, and TensorFlow Text-Book and read about ROC and other
evaluation metrics.

More Related Content

PDF
Naive Bayes Classifier
PPTX
Feature selection
PDF
Linear regression
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPT
2.5 backpropagation
PPTX
Machine Learning-Linear regression
PDF
Data preprocessing using Machine Learning
PDF
Training Deep Neural Nets
Naive Bayes Classifier
Feature selection
Linear regression
Naïve Bayes Classifier Algorithm.pptx
2.5 backpropagation
Machine Learning-Linear regression
Data preprocessing using Machine Learning
Training Deep Neural Nets

What's hot (20)

PPT
Data preprocessing
PDF
Bias and variance trade off
PDF
Artificial neural network for machine learning
PDF
Feature selection
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
PDF
Performance Metrics for Machine Learning Algorithms
PPTX
Introduction to Machine Learning
PPTX
Linear Regression and Logistic Regression in ML
PPTX
Machine learning session4(linear regression)
PPT
Chapter 15 software product metrics
PDF
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
PPTX
Lecture-12Evaluation Measures-ML.pptx
PPTX
Machine Learning - Accuracy and Confusion Matrix
PDF
Machine Learning and its Applications
PPTX
Support Vector Machine ppt presentation
PDF
Classification Based Machine Learning Algorithms
PPTX
Machine Learning Algorithms
PPTX
Overfitting & Underfitting
PPTX
Machine Learning - Splitting Datasets
PDF
Naive Bayes
Data preprocessing
Bias and variance trade off
Artificial neural network for machine learning
Feature selection
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Performance Metrics for Machine Learning Algorithms
Introduction to Machine Learning
Linear Regression and Logistic Regression in ML
Machine learning session4(linear regression)
Chapter 15 software product metrics
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Lecture-12Evaluation Measures-ML.pptx
Machine Learning - Accuracy and Confusion Matrix
Machine Learning and its Applications
Support Vector Machine ppt presentation
Classification Based Machine Learning Algorithms
Machine Learning Algorithms
Overfitting & Underfitting
Machine Learning - Splitting Datasets
Naive Bayes
Ad

Similar to Machine Learning (Classification Models) (20)

PPTX
Supervised Machine learning Algorithm.pptx
PPTX
supervised-learning.pptx
PDF
Machine Learning-Lec5.pdf_explain of logistic regression
PPTX
lec+5+_part+1 cloud .pptx
DOCX
Logistic Regression in machine learning.docx
PPTX
CHAPTER 11 LOGISTIC REGRESSION.pptx
PDF
7. logistics regression using spss
PPT
Summer 07-mfin7011-tang1922
PPTX
Logistic Regression in machine learning ppt
PDF
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
PDF
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
PPTX
Chapter III.pptx
PDF
Linear Regression vs Logistic Regression | Edureka
PDF
multinomial-pdf.pdf
PPTX
Regression Analysis in Machine Learning.pptx
PDF
Supervised Learning.pdf
PPTX
Forecasting Using the Predictive Analytics
PPTX
MACHINE LEARNING Unit -2 Algorithm.pptx
PDF
Logistic regression sage
PDF
the unconditional Logistic Regression .pdf
Supervised Machine learning Algorithm.pptx
supervised-learning.pptx
Machine Learning-Lec5.pdf_explain of logistic regression
lec+5+_part+1 cloud .pptx
Logistic Regression in machine learning.docx
CHAPTER 11 LOGISTIC REGRESSION.pptx
7. logistics regression using spss
Summer 07-mfin7011-tang1922
Logistic Regression in machine learning ppt
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
Chapter III.pptx
Linear Regression vs Logistic Regression | Edureka
multinomial-pdf.pdf
Regression Analysis in Machine Learning.pptx
Supervised Learning.pdf
Forecasting Using the Predictive Analytics
MACHINE LEARNING Unit -2 Algorithm.pptx
Logistic regression sage
the unconditional Logistic Regression .pdf
Ad

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Spectroscopy.pptx food analysis technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectroscopy.pptx food analysis technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Machine Learning (Classification Models)

  • 1. Alex Mirugwe | Classification Models Classification Instructor: Alex Mirugwe Victoria University 1
  • 2. Alex Mirugwe | Classification Models Classification Modelling In this topic, we are going to study approaches for predicting qualitative responses, a process that is known as classification. Victoria University 2
  • 3. Alex Mirugwe | Classification Models Classification Predicting a qualitative response for an observation can be referred to as classifying that observation since it involves assigning the observation to a category, or class. On the other hand, often the methods used for classification first predict the probability of each of the categories of a qualitative variable, as the basis for making the classification. Victoria University 3
  • 4. Alex Mirugwe | Classification Models Methods There are many possible classification techniques, or classifiers, that one might use to predict a qualitative response including, • Logistic Regression • Linear Discriminant Analysis (LDA) • K-Nearest Neighbors (KNN) • Trees, Random Forests, and Boosting • Support Vector Machines (SVM) • Neural Networks Victoria University 4
  • 5. Alex Mirugwe | Classification Models An Overview of Classification Classification problems occur often, perhaps even more so than regression problems. Some examples include: 1. A person arrives at the emergency room with a set of symptoms that could possibly be attributed to one of three medical conditions. Which of the three conditions does the individual have? 2. An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent, on the basis of the user’s IP address, past transaction history, and so forth. 3. On the basis of DNA sequence data for a number of patients with and without a given disease, a biologist would like to figure out which DNA mutations are deleterious (disease-causing) and which are not. Victoria University 5
  • 6. Alex Mirugwe | Classification Models Example Consider a Default data set, where the response default falls into one of two categories, Yes or No. Rather than modeling this response Y directly, logistic regression models the probability that Y belongs to a particular category. Victoria University 6
  • 7. Alex Mirugwe | Classification Models Example Dataset: Simulated Default dataset (n=10,000): • default: A factor with levels No and Yes indicating whether the customer defaulted on their debt. • student: A factor with levels No and Yes indicating whether the customer is a student • balance: The average balance that the customer has remaining on their credit card after making their monthly payment • income: Income of customer Problem: Predicting whether an individual will default on his or her credit card payment, on the basis of annual income and monthly credit card balance. Victoria University 7
  • 8. Alex Mirugwe | Classification Models Some Probability Basics to Remember Victoria University 8
  • 9. Alex Mirugwe | Classification Models The following are useful theorems which are derived directly from Kolmogrov's axioms. Victoria University 9
  • 10. Alex Mirugwe | Classification Models Example Victoria University 10
  • 11. Alex Mirugwe | Classification Models Probability Concepts Victoria University 11
  • 12. Alex Mirugwe | Classification Models Conditional Probability Victoria University 12
  • 13. Alex Mirugwe | Classification Models Multiplication Rule Victoria University 13
  • 14. Alex Mirugwe | Classification Models Independent Events Victoria University 14
  • 15. Alex Mirugwe | Classification Models Multiplication Rule Victoria University 15
  • 16. Alex Mirugwe | Classification Models Multiplication Rule Victoria University 16
  • 17. Alex Mirugwe | Classification Models Logistic Regression For the Default data, logistic regression models the probability of default. For example, the probability of default given balance can be written as Pr(default = Yes|balance). The values of Pr(default = Yes|balance), which we abbreviate p(balance), will range between 0 and 1. Then for any given value of balance, a prediction can be made for default. For example, one might predict default = Yes for any individual for whom p(balance) > 0.5. Alternatively, if a company wishes to be conservative in predicting individuals who are at risk for default, then they may choose to use a lower threshold, such as p(balance) > 0.1. Victoria University 17
  • 18. Alex Mirugwe | Classification Models Binary response variables Consider a binary classification problem - an observation is classified as belonging to either one of two classes. Define a binary response variable: 𝑌 = 1 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 `𝑐𝑙𝑎𝑠𝑠 𝐴` 0 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 `𝑐𝑙𝑎𝑠𝑠 𝐵1 Can we use the same linear model structure for classification problems? 𝑌 = 𝑓 𝑋 = 𝛽0 + 𝛽1𝑋 say for one predictor. Victoria University 18
  • 19. Alex Mirugwe | Classification Models Logistic Regression: Motivation Victoria University 19
  • 20. Alex Mirugwe | Classification Models Cont’d If we use this approach to predict default=Yes using balance, then we obtain the model shown in the left-hand panel of Figure 4.2. Here we see the problem with this approach: for balances close to zero we predict a negative probability of default; if we were to predict for very large balances, we would get values bigger than 1. These predictions are not sensible, since of course the true probability of default, regardless of the credit card balance, must fall between 0 and 1. This problem is not unique to the credit default data. Any time a straight line is fit to a binary response that is coded as 0 or 1, in principle we can always predict p(X) < 0 for some values of X and p(X) > 1 Victoria University 20
  • 21. Alex Mirugwe | Classification Models Cont’d To avoid this problem, we must model p(X) using a function that gives outputs between 0 and 1 for all values of X. Many functions meet this description. In logistic regression, we use the logistic function, 𝑝 𝑋 = 𝑓(𝑋) = 𝑒𝛽0+𝛽1𝑋 1 + 𝑒𝛽0+𝛽1𝑋 For more than one predictor 𝑝 𝑋 = 𝑓(𝑋) = 𝑒𝛽0+𝛽1𝑋+⋯+𝛽𝑝𝑋𝑝 1 + 𝑒𝛽0+𝛽1𝑋+⋯+𝛽𝑝𝑋𝑝 The logistic map is a simple sigmoidal (S-shaped) function that asymptotes to 0 and 1. Victoria University 21
  • 22. Alex Mirugwe | Classification Models Logistic Regression: Construction We employ a sigmoidal transform of the linear model equation to ensure our model maps to a set of probabilities. Victoria University 22
  • 23. Alex Mirugwe | Classification Models Logistic Regression: Interpretation The logistic function can be interpreted in terms of `odds’ of one event vs. another: 𝑜𝑑𝑑𝑠 = 𝑝(𝑋) 1 − 𝑝(𝑋) = 𝑝(𝑌 = 1|𝑋 = 𝑥) 𝑝(𝑌 = 0|𝑋 = 𝑥) = 𝑒𝛽0+𝛽1𝑋 This is referred to as the odds that Y = 1 vs. Y = 0 given X and odds can take on any value between 0 and 1. Victoria University 23
  • 24. Alex Mirugwe | Classification Models Making Predictions Once the coefficients have been estimated, it is a simple matter to compute the probability of default for any given credit card balance. For example, using the coefficient estimates given in Table: we predict that the default probability for an individual with a balance of $1, 000 is Victoria University 24
  • 25. Alex Mirugwe | Classification Models Dummy Variable One can use qualitative predictors with the logistic regression model using the dummy variable. As an example, the Default data set contains the qualitative variable student. To fit the model we simply create a dummy variable that takes on a value of 1 for students and 0 for non-students. The logistic regression model that results from predicting probability of default from student status can be seen in Table 4.2. The coefficient associated with the dummy variable is positive, Victoria University 25
  • 26. Alex Mirugwe | Classification Models Model Assessment Classification Error Due to the discrete nature of classification problems, one has to take note of the kinds of classification errors that may arise. A binary classier can have one of two types of error: • 1 False Positive: Classify 𝑌 = 1 when Y = 0. • 2 False Negative: Classify 𝑌 = 0 when Y = 1. Victoria University 26
  • 27. Alex Mirugwe | Classification Models Performance Measure Victoria University 27
  • 28. Alex Mirugwe | Classification Models In turn, these quantities define the: True positive rate (TPR) – Sensitivity – the proportion of positive cases correctly identified (sensitivity). False positive rate (FPR) - (1-Specificity) – the proportion of negative cases incorrectly identified as positive. As we vary the decision threshold from 0 to 1, these rates may vary. By plotting the TPR vs. the FPR for varying thresholds we construct what is known as the receiver operating characteristic (ROC) curve. Victoria University 28
  • 29. Alex Mirugwe | Classification Models Victoria University 29 Go to page 101 of the Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow Text-Book and read about ROC and other evaluation metrics.