Machine Learning (Classification Models)

Alex Mirugwe | Classification Models
Classification
Instructor:
Alex Mirugwe
Victoria University
1

Classification Modelling
In this topic, we are going to study approaches for predicting
qualitative responses, a process that is known as classification.
Victoria University 2

Classification
Predicting a qualitative response for an observation can be referred to as
classifying that observation since it involves assigning the observation to a
category, or class.
On the other hand, often the methods used for classification first predict
the probability of each of the categories of a qualitative variable, as the
basis for making the classification.
Victoria University
3

Methods
There are many possible classification techniques, or classifiers, that
one might use to predict a qualitative response including,
• Logistic Regression
• Linear Discriminant Analysis (LDA)
• K-Nearest Neighbors (KNN)
• Trees, Random Forests, and Boosting
• Support Vector Machines (SVM)
• Neural Networks
Victoria University
4

An Overview of Classification
Classification problems occur often, perhaps even more so than regression
problems. Some examples include:
1. A person arrives at the emergency room with a set of symptoms that
could possibly be attributed to one of three medical conditions. Which of
the three conditions does the individual have?
2. An online banking service must be able to determine whether or not a
transaction being performed on the site is fraudulent, on the basis of the
user’s IP address, past transaction history, and so forth.
3. On the basis of DNA sequence data for a number of patients with and
without a given disease, a biologist would like to figure out which DNA
mutations are deleterious (disease-causing) and which are not.
Victoria University
5

Example
Consider a Default data set, where the response default falls
into one of two categories, Yes or No. Rather than modeling
this response Y directly, logistic regression models the
probability that Y belongs to a particular category.
Victoria University
6

Example
Dataset: Simulated Default dataset (n=10,000):
• default: A factor with levels No and Yes indicating whether
the customer defaulted on their debt.
• student: A factor with levels No and Yes indicating whether
the customer is a student
• balance: The average balance that the customer has
remaining on their credit card after making their monthly
payment
• income: Income of customer
Problem: Predicting whether an individual will default on his or
her credit card payment, on the basis of annual income and
monthly credit card balance.
Victoria University
7

Some Probability Basics to Remember
Victoria University
8

The following are useful theorems which are derived directly from
Kolmogrov's axioms.
Victoria University
9

Example
Victoria University
10

Probability Concepts
Victoria University
11

Conditional Probability
Victoria University
12

Multiplication Rule
Victoria University
13

Independent Events
Victoria University
14

Multiplication Rule
Victoria University
15

Multiplication Rule
Victoria University
16

Logistic Regression
For the Default data, logistic regression models the probability of default. For
example, the probability of default given balance can be written as
Pr(default = Yes|balance).
The values of Pr(default = Yes|balance), which we abbreviate p(balance), will
range between 0 and 1. Then for any given value of balance, a prediction can
be made for default. For example, one might predict default = Yes for any
individual for whom p(balance) > 0.5. Alternatively, if a company wishes to be
conservative in predicting individuals who are at risk for default, then they
may choose to use a lower threshold, such as p(balance) > 0.1.
Victoria University
17

Binary response variables
Consider a binary classification problem - an observation is classified as belonging to either one of
two classes.
Define a binary response variable:
𝑌 =
1 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 `𝑐𝑙𝑎𝑠𝑠 𝐴`
0 𝑖𝑓 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑖𝑠 `𝑐𝑙𝑎𝑠𝑠 𝐵1
Can we use the same linear model structure for classification problems?
𝑌 = 𝑓 𝑋 = 𝛽0 + 𝛽1𝑋
say for one predictor.
Victoria University
18

Logistic Regression: Motivation
Victoria University
19

Cont’d
If we use this approach to predict default=Yes using balance, then we obtain
the model shown in the left-hand panel of Figure 4.2. Here we see the
problem with this approach: for balances close to zero we predict a negative
probability of default; if we were to predict for very large balances, we would
get values bigger than 1. These predictions are not sensible, since of course
the true probability of default, regardless of the credit card balance, must fall
between 0 and 1. This problem is not unique to the credit default data. Any
time a straight line is fit to a binary response that is coded as 0 or 1, in
principle we can always predict p(X) < 0 for some values of X and p(X) > 1
Victoria University
20

Cont’d
To avoid this problem, we must model p(X) using a function that gives outputs between 0
and 1 for all values of X. Many functions meet this description. In logistic regression, we use
the logistic function,
𝑝 𝑋 = 𝑓(𝑋) =
𝑒𝛽0+𝛽1𝑋
1 + 𝑒𝛽0+𝛽1𝑋
For more than one predictor
𝑝 𝑋 = 𝑓(𝑋) =
𝑒𝛽0+𝛽1𝑋+⋯+𝛽𝑝𝑋𝑝
1 + 𝑒𝛽0+𝛽1𝑋+⋯+𝛽𝑝𝑋𝑝
The logistic map is a simple sigmoidal (S-shaped) function that asymptotes to 0 and 1.
Victoria University
21

Logistic Regression: Construction
We employ a sigmoidal transform of the linear model equation to ensure our
model maps to a set of probabilities.
Victoria University
22

Logistic Regression: Interpretation
The logistic function can be interpreted in terms of `odds’ of one event vs. another:
𝑜𝑑𝑑𝑠 =
𝑝(𝑋)
1 − 𝑝(𝑋)
=
𝑝(𝑌 = 1|𝑋 = 𝑥)
𝑝(𝑌 = 0|𝑋 = 𝑥)
= 𝑒𝛽0+𝛽1𝑋
This is referred to as the odds that Y = 1 vs. Y = 0 given X and odds can take on any
value between 0 and 1.
Victoria University
23

Making Predictions
Once the coefficients have been estimated, it is a simple matter to compute the
probability of default for any given credit card balance. For example, using the
coefficient estimates given in Table:
we predict that the default probability for an individual with a balance of $1, 000 is
Victoria University
24

Dummy Variable
One can use qualitative predictors with the logistic regression model using the dummy variable. As an example,
the Default data set contains the qualitative variable student. To fit the model we simply create a dummy
variable that takes on a value of 1 for students and 0 for non-students. The logistic regression model that
results from predicting probability of default from student status can be seen in Table 4.2. The coefficient
associated with the dummy variable is positive,
Victoria University
25

Model Assessment
Classification Error
Due to the discrete nature of classification problems, one has to take note of the
kinds of classification errors that may arise.
A binary classier can have one of two types of error:
• 1 False Positive: Classify 𝑌 = 1 when Y = 0.
• 2 False Negative: Classify 𝑌 = 0 when Y = 1.
Victoria University
26

Performance Measure
Victoria University
27

In turn, these quantities define the:
True positive rate (TPR) – Sensitivity – the proportion of positive cases correctly
identified (sensitivity).
False positive rate (FPR) - (1-Specificity) – the proportion of negative cases
incorrectly identified as positive.
As we vary the decision threshold from 0 to 1, these rates may vary.
By plotting the TPR vs. the FPR for varying thresholds we construct what is known
as the receiver operating characteristic (ROC) curve.
Victoria University
28

Victoria University
29
Go to page 101 of the Hands-on Machine Learning with Scikit-Learn,
Keras, and TensorFlow Text-Book and read about ROC and other
evaluation metrics.

Machine Learning (Classification Models)

More Related Content

What's hot (20)

Similar to Machine Learning (Classification Models) (20)

Recently uploaded (20)

Machine Learning (Classification Models)