SlideShare a Scribd company logo
Machine Learning
Logistic Regression
Agenda
• Logistic Regression

• Generalisation, Over-fitting & Regularisation

• Donut Problem

• XOR Problem
What is Logistic Regression?
• Learning

• A supervised algorithm that learns to separate training samples into two categories.

• Each training sample has one or more input values and a single target value of
either 0 or 1.

• The algorithm learns the line, plane or hyper-plane that best divides the training
samples with targets of 0 from those with targets of 1.

• Prediction

• Uses the learned line, plane or hyper-plane to predict the whether an input sample
results in a target of 0 or 1.
Logistic Regression
Logistic Regression
• Each training sample has an x made
up of multiple input values and a
corresponding t with a single value. 

• The inputs can be represented as an
X matrix in which each row is sample
and each column is a dimension. 

• The outputs can be represented as T
matrix in which each row is a sample
has has a value of either 0 or 1.
Logistic Regression
• Our predicated T values are
calculated by multiplying out X
values by a weight vector and
applying the sigmoid function to the
result.
Logistic Regression
• The sigmoid function is:

• And has a graph like this:

• By applying this function we end up
with predictions that are between
zero and one
Logistic Regression
• We use an error function know as
the cross-entropy error function: 

• Where t is the actual target value (0
or 1) and t circumflex is the
predicted target value for a sample.

• If the actual target is 0 the left hand
term is 0, leaving the red line:

• If the actual target is 1, the right
hand term is 0, leaving the blue line:
Logistic Regression
• We use the chain rule to partially
differentiate E with respect to wi to find
the gradient to use for this weight in
gradient descent:

• Where:
Logistic Regression
• Taking the first term:

• Taking the third term:
Logistic Regression
• Taking the second term:
Logistic Regression
• Multiplying the three
derivatives and simplifying
ends up with:

• In matrix form, for all weights:

• In code we use this with
gradient descent to derive the
weights that minimise the
error.
Logistic Regression
Logistic Regression
Generalisation, Over-fitting &
Regularisation
Generalisation & Over-fitting
• As we train our model with more and more data the it may start to fit the training data more and
more accurately, but become worse at handling test data that we feed to it later. 

• This is know as “over-fitting” and results in an increased generalisation error.

• To minimise the generalisation error we should 

• Collect as much sample data as possible. 

• Use a random subset of our sample data for training.

• Use the remaining sample data to test how well our model copes with data it was not trained
with.

• Also, experiment with adding higher degrees of polynomials (X2, X3, etc) as this can reduce
overfitting.
L1 Regularisation (Lasso)
• In L1 regularisation we add a penalty to
the error function: 

• Expanding this we get: 

• Take the derivative with respect to w to
find our gradient:

• Where sign(w) is -1 if w < 0, 0 if w = 0
and +1 if w > 0

• Note that because sign(w) has no
inverse function we cannot solve for w
and so must use gradient descent.
L1 Regularisation (Lasso)
L2 Regularisation (Ridge)
• In L2 regularisation we the sum of
the squares of the weights to the
error function.

• Expanding this we get: 

• Take the derivative with respect to
w to find our gradient:
L2 Regularisation (Ridge)
Donut Problem
Donut Problem
• Sometimes data will be distributed like
this

• In this cases it would appear that logistic
regression cannot be used to classify the
red and blue points because there is no
single line that separates them.

• However, one way to workaround this
problem is to add a bias column of ones
and a column whose value is the distance
of each sample from the centre of these
circles.
XOR Problem
XOR Problem
• Another tricky situation is where the  input
samples are as below, because in this
case there isn’t a single line that can
separate the purple points from the
yellow.

• One way to workaround this problem is to
add a bias column on ones and a column
whose value is the multiplication of the 2
dimensions (X1 and X2) of each sample. 

• This has the effect of “pushing” the top
right purple point back in the Z
dimension. Once this has been done, a
plane can separate the blue and red
points.
Summary
• Logistic Regression

• Generalisation, Over-fitting & Regularisation

• Donut Problem

• XOR Problem

More Related Content

PDF
Naive Bayes Classifier
PPTX
Naive Bayes Presentation
PPTX
Bagging.pptx
PDF
Logistic regression in Machine Learning
PDF
Linear regression
PPTX
PPTX
Wrapper feature selection method
PPTX
Logistic regression
Naive Bayes Classifier
Naive Bayes Presentation
Bagging.pptx
Logistic regression in Machine Learning
Linear regression
Wrapper feature selection method
Logistic regression

What's hot (20)

PPTX
Random Forest
PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
PDF
Principal component analysis and lda
PDF
Logistic regression : Use Case | Background | Advantages | Disadvantages
PDF
Support Vector Machines (SVM)
 
PDF
Bias and variance trade off
PDF
Understanding Bagging and Boosting
PDF
Modelling and evaluation
PPTX
SVM
PPTX
Feedforward neural network
PPTX
Linear models and multiclass classification
PPTX
ML - Multiple Linear Regression
PPTX
Machine learning session4(linear regression)
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
ODP
Machine Learning With Logistic Regression
PPTX
Ensemble methods in machine learning
PDF
Machine Learning Model Evaluation Methods
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PPTX
Feature selection
PPTX
Ensemble Method (Bagging Boosting)
Random Forest
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Principal component analysis and lda
Logistic regression : Use Case | Background | Advantages | Disadvantages
Support Vector Machines (SVM)
 
Bias and variance trade off
Understanding Bagging and Boosting
Modelling and evaluation
SVM
Feedforward neural network
Linear models and multiclass classification
ML - Multiple Linear Regression
Machine learning session4(linear regression)
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Machine Learning With Logistic Regression
Ensemble methods in machine learning
Machine Learning Model Evaluation Methods
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Feature selection
Ensemble Method (Bagging Boosting)
Ad

Similar to Logistic regression (20)

PPTX
Supervised learning for IOT IN Vellore Institute of Technology
PPTX
Logistic Regression in machine learning ppt
PPTX
Linear regression in machine learning
PPTX
Introduction to classification in machine learning and artificial intelligenc...
PDF
Scaling and Normalization
PPTX
11Polynomial RegressionPolynomial RegressionPolynomial RegressionPolynomial R...
PDF
Model Selection and Validation
PPTX
2a-linear-regression-18Maykjkij;oik;.pptx
PDF
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
PPTX
Polynomial Regression explaining with examples .pptx
PPTX
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
PPT
5954987.ppt
PPTX
10_support_vector_machines (1).pptx
PPTX
Unit III_Ch 17_Probablistic Methods.pptx
PPTX
Lecture 2 - Error Analysis - Numerical Analysis
PPTX
regression analysis presentation slides.
PDF
Introduction to Artificial Intelligence_ Lec 7
PPTX
Optimization techniq
PPTX
K nearest neighbor: classify by closest training points.
PPT
15303589.ppt
Supervised learning for IOT IN Vellore Institute of Technology
Logistic Regression in machine learning ppt
Linear regression in machine learning
Introduction to classification in machine learning and artificial intelligenc...
Scaling and Normalization
11Polynomial RegressionPolynomial RegressionPolynomial RegressionPolynomial R...
Model Selection and Validation
2a-linear-regression-18Maykjkij;oik;.pptx
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
Polynomial Regression explaining with examples .pptx
Lecture 11.pptxVYFYFYF UYF6 F7T7T7ITY8Y8YUO
5954987.ppt
10_support_vector_machines (1).pptx
Unit III_Ch 17_Probablistic Methods.pptx
Lecture 2 - Error Analysis - Numerical Analysis
regression analysis presentation slides.
Introduction to Artificial Intelligence_ Lec 7
Optimization techniq
K nearest neighbor: classify by closest training points.
15303589.ppt
Ad

Recently uploaded (20)

PPTX
Introduction to Artificial Intelligence
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
medical staffing services at VALiNTRY
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
System and Network Administraation Chapter 3
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
history of c programming in notes for students .pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How Creative Agencies Leverage Project Management Software.pdf
Introduction to Artificial Intelligence
CHAPTER 2 - PM Management and IT Context
medical staffing services at VALiNTRY
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Understanding Forklifts - TECH EHS Solution
ISO 45001 Occupational Health and Safety Management System
System and Network Administraation Chapter 3
Which alternative to Crystal Reports is best for small or large businesses.pdf
history of c programming in notes for students .pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Odoo Companies in India – Driving Business Transformation.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
How to Choose the Right IT Partner for Your Business in Malaysia
ManageIQ - Sprint 268 Review - Slide Deck
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Softaken Excel to vCard Converter Software.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How Creative Agencies Leverage Project Management Software.pdf

Logistic regression

  • 2. Agenda • Logistic Regression • Generalisation, Over-fitting & Regularisation • Donut Problem • XOR Problem
  • 3. What is Logistic Regression? • Learning • A supervised algorithm that learns to separate training samples into two categories. • Each training sample has one or more input values and a single target value of either 0 or 1. • The algorithm learns the line, plane or hyper-plane that best divides the training samples with targets of 0 from those with targets of 1. • Prediction • Uses the learned line, plane or hyper-plane to predict the whether an input sample results in a target of 0 or 1.
  • 5. Logistic Regression • Each training sample has an x made up of multiple input values and a corresponding t with a single value. • The inputs can be represented as an X matrix in which each row is sample and each column is a dimension. • The outputs can be represented as T matrix in which each row is a sample has has a value of either 0 or 1.
  • 6. Logistic Regression • Our predicated T values are calculated by multiplying out X values by a weight vector and applying the sigmoid function to the result.
  • 7. Logistic Regression • The sigmoid function is: • And has a graph like this: • By applying this function we end up with predictions that are between zero and one
  • 8. Logistic Regression • We use an error function know as the cross-entropy error function: • Where t is the actual target value (0 or 1) and t circumflex is the predicted target value for a sample. • If the actual target is 0 the left hand term is 0, leaving the red line: • If the actual target is 1, the right hand term is 0, leaving the blue line:
  • 9. Logistic Regression • We use the chain rule to partially differentiate E with respect to wi to find the gradient to use for this weight in gradient descent: • Where:
  • 10. Logistic Regression • Taking the first term: • Taking the third term:
  • 11. Logistic Regression • Taking the second term:
  • 12. Logistic Regression • Multiplying the three derivatives and simplifying ends up with: • In matrix form, for all weights: • In code we use this with gradient descent to derive the weights that minimise the error.
  • 16. Generalisation & Over-fitting • As we train our model with more and more data the it may start to fit the training data more and more accurately, but become worse at handling test data that we feed to it later. • This is know as “over-fitting” and results in an increased generalisation error. • To minimise the generalisation error we should • Collect as much sample data as possible. • Use a random subset of our sample data for training. • Use the remaining sample data to test how well our model copes with data it was not trained with. • Also, experiment with adding higher degrees of polynomials (X2, X3, etc) as this can reduce overfitting.
  • 17. L1 Regularisation (Lasso) • In L1 regularisation we add a penalty to the error function: • Expanding this we get: • Take the derivative with respect to w to find our gradient: • Where sign(w) is -1 if w < 0, 0 if w = 0 and +1 if w > 0 • Note that because sign(w) has no inverse function we cannot solve for w and so must use gradient descent.
  • 19. L2 Regularisation (Ridge) • In L2 regularisation we the sum of the squares of the weights to the error function. • Expanding this we get: • Take the derivative with respect to w to find our gradient:
  • 22. Donut Problem • Sometimes data will be distributed like this • In this cases it would appear that logistic regression cannot be used to classify the red and blue points because there is no single line that separates them. • However, one way to workaround this problem is to add a bias column of ones and a column whose value is the distance of each sample from the centre of these circles.
  • 24. XOR Problem • Another tricky situation is where the  input samples are as below, because in this case there isn’t a single line that can separate the purple points from the yellow. • One way to workaround this problem is to add a bias column on ones and a column whose value is the multiplication of the 2 dimensions (X1 and X2) of each sample. • This has the effect of “pushing” the top right purple point back in the Z dimension. Once this has been done, a plane can separate the blue and red points.
  • 25. Summary • Logistic Regression • Generalisation, Over-fitting & Regularisation • Donut Problem • XOR Problem