SlideShare a Scribd company logo
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
What Will You Learn Today?
What is Regression?
The 5 Questions
asked in Data Science
Logistic Regression – What
and Why?
How does Logistic
Regression work?
Demo In R: Diabetes
Use Case
1 2 3
4 65
Logistic Regression –
Use Cases
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
The 5 Questions Asked In Data
Science
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
The 5 Questions Asked In Data Science
In data science, basically we have 5 kind of problems.
Classification Algorithm
Anomaly Detection Algorithm
Regression Algorithms
Clustering Algorithms
Reinforcement Learning
Q1.
Q2.
Q4.
Q3.
Q5.
Is this A or B?
Is this weird?
How much or how many?
How is this organized?
What should I do next?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
The 5 Questions Asked In Data Science
In data science, basically we have 5 kind of problems.
Q1.
Q2.
Q4.
Q3.
Q5.
Is this A or B?
Is this weird?
How is this organized?
What should I do next?
Classification Algorithm
Anomaly Detection Algorithm
Regression Algorithms
Clustering Algorithms
Reinforcement Learning
How much or how many?
Is this A or B?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
What Is Regression?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
What Is Regression?
➢ Regression analysis is a predictive
modelling technique.
➢ It estimates the relationship between
a dependent (target) and an
independent variable (predictor).
Input value = 7.00
Predicted outcome = 123.9
X-axis
Y-axis
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Types Of Regression
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Types Of Regression
Linear Regression
• When there is a linear
relationship between
independent and dependent
variables.
• When the dependent variable is
categorical (0/ 1, True/ False, Yes/
No, A/B/C) in nature.
Logistic Regression Polynomial Regression
• When the power of independent
variable is more than 1.
X
Y
X
Y
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Types Of Regression
Linear Regression
• When there is a linear
relationship between
independent and dependent
variables.
• When the dependent variable is
categorical (0/ 1, True/ False, Yes/
No, A/B/C) in nature.
Logistic Regression Polynomial Regression
• When the power of independent
variable is more than 1.
X
Y
X
Y
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why Logistic Regression?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why Logistic Regression?
Whenever the outcome of the dependent variable (Y) is discrete, like 0 or 1, Yes or No, A, B or C, we use
logistic regression.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why can’t we use Linear Regression?
Why Not Linear Regression?
Whenever the outcome of the dependent variable (Y) is discrete, like 0 or 1, Yes or No, A, B or C, we use
logistic regression.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Y-axis
X-axis
0
1
Now since our value of Y will be between 0 and 1, the linear line has to be clipped at 0 and 1.
Why Not Linear Regression?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Y-axis
X-axis
0
1
With this, our resulting curve cannot be formulated into a single formula. We needed a new way to solve this kind of problem.
Hence, we came up with Logistic Regression!
Why Not Linear Regression?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10
LOGISTIC REGRESSION
The S Curve
Logistic Regression Curve
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why Logistic Regression?
Equation for a straight line
Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why Logistic Regression?
Equation for a straight line
Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity
Let’s try to reduce the Logistic Regression Equation from this equation
Y = C + B1X1 + B2X2 + …. In Logistic Regression Y can only be between 0 and 1.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why Logistic Regression?
Equation for a straight line
Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity
Let’s try to reduce the Logistic Regression Equation from this equation
Y = C + B1X1 + B2X2 + …. In Logistic Equation Y can only be between 0 and 1.
Now, to get the range of Y between 0 and infinity, let’s transform Y
Y
1 − Y
Y=0 | 0
Y=1 | infinity
Now, we have the range between 0 and infinity
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Why Logistic Regression?
Equation for a straight line
Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity
Let’s try to reduce the Logistic Regression Equation from this equation
Y = C + B1X1 + B2X2 + …. In Logistic Equation Y can only be between 0 and 1.
Now, to get the range of Y between 0 and infinity, let’s transform Y
Y
1 − Y
Y=0 | 0
Y=1 | infinity
Now, we have the range between 0 and infinity
Let us transform it further, to get the range between –( infinity ) and infinity
Y
1 − Y
log
𝐘
𝟏 − 𝐘
log = C + B1X1 + B2X2 + ….
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
What Is Logistic Regression?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
What Is Logistic Regression?
Logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical.
DependentCategorical
Variables that can have only fixed values
such as A, B or C, Yes or No
Y = f(X)
i.e Y is dependent on X.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Therefore, whenever the outcome of the dependent variable (Y) is categorical, like 0 or 1, Yes or No, A, B or
C, we use logistic regression.
What Is Logistic Regression?
0.0
1.0
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Let us take an example to understand this:
MODEL
Selected
147, 120, 121, 128, 110,
119, 133
Not Selected
107, 89, 92, 106, 104,
114
How does Logistic Regression Work?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Let us take an example to understand this:
MODEL
Selected
147, 120, 121, 128, 110,
119, 133
Not Selected
107, 89, 92, 106, 104,
114
How does Logistic Regression Work?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
Let’s take a sample dataset in R, which is called mtcars.
Our aim is to predict whether a car will have a V-engine or
a Straight engine based on our inputs.
Mpg - Miles/US Gallon
Cyl – Number of cylinders
Disp – Number of cylinders
Hp – Gross horsepower
Drat – Rear axle ratio
Wt – Weight (lb/1000)
Qsec – 1/4 mile time
Vs – V Engine
Am – Transmission Type
Gear – Number of forward gears
Carb - Number of carburetors
Key
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
For now, let’s take disp and wt as our primary
independent variables. Why? We’ll be discussing it in
our next section.
Mpg - Miles/US Gallon
Cyl – Number of cylinders
Disp – Number of cylinders
Hp – Gross horsepower
Drat – Rear axle ratio
Wt – Weight (lb/1000)
Qsec – 1/4 mile time
Vs – V Engine
Am – Transmission Type
Gear – Number of forward gears
Carb - Number of carburetors
Key
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
Since our aim is to know which engine will fit, the engine will
either be V – type or not, i.e either 1 or 0. Therefore, our
dependent variable is Y.
Mpg - Miles/US Gallon
Cyl – Number of cylinders
Disp – Number of cylinders
Hp – Gross horsepower
Drat – Rear axle ratio
Wt – Weight (lb/1000)
Qsec – 1/4 mile time
Vs – V Engine
Am – Transmission Type
Gear – Number of forward gears
Carb - Number of carburetors
Key
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
Before creating the model, we divide our dataset
into training and testing.
80 %
20%
Training Dataset
Testing Dataset
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
Training to create our model and testing to validate it.
80 %
Create model from this Training Dataset
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How Does Logistic Regression Work?
Once the model is created we get the following
outputs, which are calculated using MLE*.
𝛽°
𝛽1
𝛽2
*Maximum Likelihood Estimation is a method of estimating the parameters.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Estimated Regression Equation
Estimated Regression Equation:
Here,
β° = Constant Coefficient
𝛽1 = Coefficient of x1
𝛽2 = Coefficient of x2
𝑥1 = Independent variable
𝑥2 = Independent variable
e = Euler’s Number
P(Y) = Probability that Y equals 1
𝑒
𝛽
°
+ 𝛽1 𝑥 1+ 𝛽2 𝑥 2
1 + 𝑒
𝛽
°
+ 𝛽1 𝑥 1+ 𝛽2 𝑥 2
Logit (Y) =
Y
1 − Y
log =
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Substituting Values
Let’s take a value from the test dataset
How Does Logistic Regression Work?
0.9849
1.9849
= = 0.4962
β° = 1.83010
β1 = 1.09428
β2 = - 0.02529
e = 2.7183
X1 = 120.3
X2 = 2.140
= 0.4962Probability of ‘vs’ being ‘1’ We will assume the
threshold to be 0.5
Hence our car will not have a VS engine and hence have a straight engine.
Logit (Y)
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Substituting Values
Let’s take a value from the test dataset
How Does Logistic Regression Work?
0.9849
1.9849
= = 0.4962
β° = 1.83010
β1 = 1.09428
β2 = - 0.02529
e = 2.7183
X1 = 120.3
X2 = 2.140
= 0.4962Probability of ‘vs’ being ‘1’ We will assume the
threshold to be 0.5
Hence our car will not have a VS engine and hence have a straight engine.
Logit (Y)
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
Key
Npreg – number of pregnancies
Glu – plasma glucose concentration
Bp – diastolic blood pressure
Skin – triceps skin fold thickness
Bmi – body mass index
Ped – diabetes pedigree function
Age – age in years
Type – 1 for yes and 0 for No for diabetic
Our aim is to predict whether a patient is diabetic or not based on the following values.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
First, we will read the data from our CSV file, by entering this command:
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
First, we will read the data from our CSV file, by entering this command:
Then, we will split our dataset into training and testing, with the ratio 8:2
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
First, we will read the data from our CSV file, by entering this command:
Then, we will split our dataset into training and testing, with the ratio 8:2
After that we’ll create our model using the training dataset
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
The summary of the model will give this.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
The summary of the model will give this.
*** - 99.9% confident
** - 99% confident
* - 95% confident
. - 90% confident
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
• This is the summary model that we get
after improving our model.
• So the insignificant fields is skin
Logistic Regression Demo In R
Null deviance shows how
well the response variable
is predicted by a model
that includes only the
intercept (grand mean)
Residual deviance shows
how well the response
variable is predicted with
inclusion of independent
variables.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
We will the predict the values for the test dataset and
then categorize them according to threshold which is 0.5
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
We will the predict the values for the test dataset and
then categorize them according to threshold which is 0.5
Create Confusion Matrix for the training dataset
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
We will the predict the values for the test dataset and
then categorize them according to threshold which is 0.5
And then finding the accuracy
Create Confusion Matrix for the training dataset
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
How To Find The Threshold?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we
created.
Store the predicted values for training
dataset in ‘res’ variable
Logistic Regression Demo In R
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we
created.
Store the predicted values for training
dataset in ‘res’ variable
Logistic Regression Demo In R
Import the library for the ROCR package
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we
created.
Store the predicted values for training
dataset in ‘res’ variable
Logistic Regression Demo In R
Import the library for the ROCR package
Define the ‘ROCRPred’ and and ‘ROCRPerf’
variables
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we
created.
Store the predicted values for training
dataset in ‘res’ variable
Logistic Regression Demo In R
Import the library for the ROCR package
Define the ‘ROCRPred’ and and ‘ROCRPerf’
variables
Plot the graph!
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
Confusion Matrix for Test Dataset with 0.5
threshold
Confusion Matrix for Test Dataset with 0.3
threshold
Accuracy = 73.8%
TrueNegatives = 9
Accuracy = 67.8%
TrueNegatives = 5
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression Demo In R
Confusion Matrix for Test Dataset with 0.5
threshold
Confusion Matrix for Test Dataset with 0.3
threshold
Accuracy = 73.8%
TrueNegatives = 9
Accuracy = 67.8%
TrueNegatives = 5
Confusion Matrix for Training Dataset with
0.3 threshold
Accuracy = 79.4%
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression: Use Cases
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression – Use Case
• Logistic Regression was used in
conjugation with Geographic Information
system in 2005, to predict the malaria
breeding grounds in Africa.
• Logistic regression was used to
approximate areas where malaria patients
would exist based on geographical inputs.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Logistic Regression – Use Case
• Logit analysis is a statistical technique used
by marketers to assess the scope of customer
acceptance of a product, particularly a new product.
• It attempts to determine the intensity or magnitude of
customers' purchase intentions and translates that
into a measure of actual buying behavior.
• Many e-commerce websites assess this behavior using
this model.
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Session In A Minute
The 5 Questions In Data Science
Use CasesLogistic Regression Working?
Logistic Regression – What & Why?
Demo
What Is Regression?
www.edureka.co/data-scienceEdureka’s Data Science Certification Training
Course Details
Go to www.edureka.co/data-science
Get Edureka Certified in Data Science Today!
What our learners have to say about us!
Shravan Reddy says- “I would like to recommend any one who
wants to be a Data Scientist just one place: Edureka. Explanations
are clean, clear, easy to understand. Their support team works
very well.. I took the Data Science course and I'm going to take
Machine Learning with Mahout and then Big Data and Hadoop”.
Gnana Sekhar says - “Edureka Data science course provided me a very
good mixture of theoretical and practical training. LMS pre recorded
sessions and assignments were very good as there is a lot of
information in them that will help me in my job. Edureka is my
teaching GURU now...Thanks EDUREKA.”
Balu Samaga says - “It was a great experience to undergo and get
certified in the Data Science course from Edureka. Quality of the
training materials, assignments, project, support and other
infrastructures are a top notch.”
www.edureka.co/data-scienceEdureka’s Data Science Certification Training

More Related Content

PDF
Linear Regression vs Logistic Regression | Edureka
PPTX
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
ODP
NAIVE BAYES CLASSIFIER
PDF
Overview of Data Cleaning.pdf
PPTX
Belief Networks & Bayesian Classification
PPT
Decision tree and random forest
PPTX
Machine Learning-Linear regression
PPTX
Data discretization
Linear Regression vs Logistic Regression | Edureka
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
NAIVE BAYES CLASSIFIER
Overview of Data Cleaning.pdf
Belief Networks & Bayesian Classification
Decision tree and random forest
Machine Learning-Linear regression
Data discretization

What's hot (20)

PPTX
Linear Discriminant Analysis (LDA)
PPTX
Decision trees for machine learning
ODP
Machine Learning with Decision trees
PPT
2.3 bayesian classification
PPSX
Lasso and ridge regression
PPTX
Logistic Regression.pptx
PPTX
Linear Regression and Logistic Regression in ML
PDF
Logistic regression
PPTX
K-Folds Cross Validation Method
PPT
PDF
Logistic regression
PPTX
Logistic regression
PDF
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
ODP
Introduction to Bayesian Statistics
PPTX
Machine Learning: Bias and Variance Trade-off
ODP
Machine Learning With Logistic Regression
PPTX
Lecture 6: Ensemble Methods
PDF
Decision trees in Machine Learning
PPTX
Exploratory data analysis with Python
PDF
PCA (Principal component analysis)
Linear Discriminant Analysis (LDA)
Decision trees for machine learning
Machine Learning with Decision trees
2.3 bayesian classification
Lasso and ridge regression
Logistic Regression.pptx
Linear Regression and Logistic Regression in ML
Logistic regression
K-Folds Cross Validation Method
Logistic regression
Logistic regression
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutori...
Introduction to Bayesian Statistics
Machine Learning: Bias and Variance Trade-off
Machine Learning With Logistic Regression
Lecture 6: Ensemble Methods
Decision trees in Machine Learning
Exploratory data analysis with Python
PCA (Principal component analysis)
Ad

Similar to Logistic Regression in R | Machine Learning Algorithms | Data Science Training | Edureka (20)

PDF
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
PPTX
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
PDF
Fast Distributed Online Classification
PPTX
CHAPTER 11 LOGISTIC REGRESSION.pptx
PDF
Fast Distributed Online Classification
PPTX
Lecture 3.1_ Logistic Regression.pptx
PPTX
Lecture 3.1_ Logistic Regression powerpoint
PDF
Kaggle KDD Cup Report
PDF
Logistic regression, machine learning algorithms
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Machine learning Algorithms with a Sagemaker demo
PDF
L1 intro2 supervised_learning
PPTX
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
PDF
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
PDF
Machine Learning deep learning artificial
PPTX
Machine_Learning.pptx
DOCX
Essentials of machine learning algorithms
PDF
7. logistics regression using spss
PPTX
ML Study Jams - Session 3.pptx
PDF
3ml.pdf
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Fast Distributed Online Classification
CHAPTER 11 LOGISTIC REGRESSION.pptx
Fast Distributed Online Classification
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression powerpoint
Kaggle KDD Cup Report
Logistic regression, machine learning algorithms
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Machine learning Algorithms with a Sagemaker demo
L1 intro2 supervised_learning
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
Machine Learning deep learning artificial
Machine_Learning.pptx
Essentials of machine learning algorithms
7. logistics regression using spss
ML Study Jams - Session 3.pptx
3ml.pdf
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Spectroscopy.pptx food analysis technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Teaching material agriculture food technology
A Presentation on Artificial Intelligence
Programs and apps: productivity, graphics, security and other tools
Dropbox Q2 2025 Financial Results & Investor Presentation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
sap open course for s4hana steps from ECC to s4
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Logistic Regression in R | Machine Learning Algorithms | Data Science Training | Edureka

  • 1. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression
  • 2. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Will You Learn Today? What is Regression? The 5 Questions asked in Data Science Logistic Regression – What and Why? How does Logistic Regression work? Demo In R: Diabetes Use Case 1 2 3 4 65 Logistic Regression – Use Cases
  • 3. www.edureka.co/data-scienceEdureka’s Data Science Certification Training The 5 Questions Asked In Data Science
  • 4. www.edureka.co/data-scienceEdureka’s Data Science Certification Training The 5 Questions Asked In Data Science In data science, basically we have 5 kind of problems. Classification Algorithm Anomaly Detection Algorithm Regression Algorithms Clustering Algorithms Reinforcement Learning Q1. Q2. Q4. Q3. Q5. Is this A or B? Is this weird? How much or how many? How is this organized? What should I do next?
  • 5. www.edureka.co/data-scienceEdureka’s Data Science Certification Training The 5 Questions Asked In Data Science In data science, basically we have 5 kind of problems. Q1. Q2. Q4. Q3. Q5. Is this A or B? Is this weird? How is this organized? What should I do next? Classification Algorithm Anomaly Detection Algorithm Regression Algorithms Clustering Algorithms Reinforcement Learning How much or how many? Is this A or B?
  • 6. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Is Regression?
  • 7. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Is Regression? ➢ Regression analysis is a predictive modelling technique. ➢ It estimates the relationship between a dependent (target) and an independent variable (predictor). Input value = 7.00 Predicted outcome = 123.9 X-axis Y-axis
  • 8. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types Of Regression
  • 9. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types Of Regression Linear Regression • When there is a linear relationship between independent and dependent variables. • When the dependent variable is categorical (0/ 1, True/ False, Yes/ No, A/B/C) in nature. Logistic Regression Polynomial Regression • When the power of independent variable is more than 1. X Y X Y
  • 10. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Types Of Regression Linear Regression • When there is a linear relationship between independent and dependent variables. • When the dependent variable is categorical (0/ 1, True/ False, Yes/ No, A/B/C) in nature. Logistic Regression Polynomial Regression • When the power of independent variable is more than 1. X Y X Y
  • 11. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Logistic Regression?
  • 12. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Logistic Regression? Whenever the outcome of the dependent variable (Y) is discrete, like 0 or 1, Yes or No, A, B or C, we use logistic regression.
  • 13. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why can’t we use Linear Regression? Why Not Linear Regression? Whenever the outcome of the dependent variable (Y) is discrete, like 0 or 1, Yes or No, A, B or C, we use logistic regression.
  • 14. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Y-axis X-axis 0 1 Now since our value of Y will be between 0 and 1, the linear line has to be clipped at 0 and 1. Why Not Linear Regression?
  • 15. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Y-axis X-axis 0 1 With this, our resulting curve cannot be formulated into a single formula. We needed a new way to solve this kind of problem. Hence, we came up with Logistic Regression! Why Not Linear Regression?
  • 16. www.edureka.co/data-scienceEdureka’s Data Science Certification Training 0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 5 6 7 8 9 10 LOGISTIC REGRESSION The S Curve Logistic Regression Curve
  • 17. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity
  • 18. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity Let’s try to reduce the Logistic Regression Equation from this equation Y = C + B1X1 + B2X2 + …. In Logistic Regression Y can only be between 0 and 1.
  • 19. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity Let’s try to reduce the Logistic Regression Equation from this equation Y = C + B1X1 + B2X2 + …. In Logistic Equation Y can only be between 0 and 1. Now, to get the range of Y between 0 and infinity, let’s transform Y Y 1 − Y Y=0 | 0 Y=1 | infinity Now, we have the range between 0 and infinity
  • 20. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity Let’s try to reduce the Logistic Regression Equation from this equation Y = C + B1X1 + B2X2 + …. In Logistic Equation Y can only be between 0 and 1. Now, to get the range of Y between 0 and infinity, let’s transform Y Y 1 − Y Y=0 | 0 Y=1 | infinity Now, we have the range between 0 and infinity Let us transform it further, to get the range between –( infinity ) and infinity Y 1 − Y log 𝐘 𝟏 − 𝐘 log = C + B1X1 + B2X2 + ….
  • 21. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Is Logistic Regression?
  • 22. www.edureka.co/data-scienceEdureka’s Data Science Certification Training What Is Logistic Regression? Logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical. DependentCategorical Variables that can have only fixed values such as A, B or C, Yes or No Y = f(X) i.e Y is dependent on X.
  • 23. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Therefore, whenever the outcome of the dependent variable (Y) is categorical, like 0 or 1, Yes or No, A, B or C, we use logistic regression. What Is Logistic Regression? 0.0 1.0
  • 24. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work?
  • 25. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Let us take an example to understand this: MODEL Selected 147, 120, 121, 128, 110, 119, 133 Not Selected 107, 89, 92, 106, 104, 114 How does Logistic Regression Work?
  • 26. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Let us take an example to understand this: MODEL Selected 147, 120, 121, 128, 110, 119, 133 Not Selected 107, 89, 92, 106, 104, 114 How does Logistic Regression Work?
  • 27. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work? Let’s take a sample dataset in R, which is called mtcars. Our aim is to predict whether a car will have a V-engine or a Straight engine based on our inputs. Mpg - Miles/US Gallon Cyl – Number of cylinders Disp – Number of cylinders Hp – Gross horsepower Drat – Rear axle ratio Wt – Weight (lb/1000) Qsec – 1/4 mile time Vs – V Engine Am – Transmission Type Gear – Number of forward gears Carb - Number of carburetors Key
  • 28. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work? For now, let’s take disp and wt as our primary independent variables. Why? We’ll be discussing it in our next section. Mpg - Miles/US Gallon Cyl – Number of cylinders Disp – Number of cylinders Hp – Gross horsepower Drat – Rear axle ratio Wt – Weight (lb/1000) Qsec – 1/4 mile time Vs – V Engine Am – Transmission Type Gear – Number of forward gears Carb - Number of carburetors Key
  • 29. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work? Since our aim is to know which engine will fit, the engine will either be V – type or not, i.e either 1 or 0. Therefore, our dependent variable is Y. Mpg - Miles/US Gallon Cyl – Number of cylinders Disp – Number of cylinders Hp – Gross horsepower Drat – Rear axle ratio Wt – Weight (lb/1000) Qsec – 1/4 mile time Vs – V Engine Am – Transmission Type Gear – Number of forward gears Carb - Number of carburetors Key
  • 30. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work? Before creating the model, we divide our dataset into training and testing. 80 % 20% Training Dataset Testing Dataset
  • 31. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work? Training to create our model and testing to validate it. 80 % Create model from this Training Dataset
  • 32. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How Does Logistic Regression Work? Once the model is created we get the following outputs, which are calculated using MLE*. 𝛽° 𝛽1 𝛽2 *Maximum Likelihood Estimation is a method of estimating the parameters.
  • 33. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Estimated Regression Equation Estimated Regression Equation: Here, β° = Constant Coefficient 𝛽1 = Coefficient of x1 𝛽2 = Coefficient of x2 𝑥1 = Independent variable 𝑥2 = Independent variable e = Euler’s Number P(Y) = Probability that Y equals 1 𝑒 𝛽 ° + 𝛽1 𝑥 1+ 𝛽2 𝑥 2 1 + 𝑒 𝛽 ° + 𝛽1 𝑥 1+ 𝛽2 𝑥 2 Logit (Y) = Y 1 − Y log =
  • 34. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Substituting Values Let’s take a value from the test dataset How Does Logistic Regression Work? 0.9849 1.9849 = = 0.4962 β° = 1.83010 β1 = 1.09428 β2 = - 0.02529 e = 2.7183 X1 = 120.3 X2 = 2.140 = 0.4962Probability of ‘vs’ being ‘1’ We will assume the threshold to be 0.5 Hence our car will not have a VS engine and hence have a straight engine. Logit (Y)
  • 35. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Substituting Values Let’s take a value from the test dataset How Does Logistic Regression Work? 0.9849 1.9849 = = 0.4962 β° = 1.83010 β1 = 1.09428 β2 = - 0.02529 e = 2.7183 X1 = 120.3 X2 = 2.140 = 0.4962Probability of ‘vs’ being ‘1’ We will assume the threshold to be 0.5 Hence our car will not have a VS engine and hence have a straight engine. Logit (Y)
  • 36. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R
  • 37. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R Key Npreg – number of pregnancies Glu – plasma glucose concentration Bp – diastolic blood pressure Skin – triceps skin fold thickness Bmi – body mass index Ped – diabetes pedigree function Age – age in years Type – 1 for yes and 0 for No for diabetic Our aim is to predict whether a patient is diabetic or not based on the following values.
  • 38. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R First, we will read the data from our CSV file, by entering this command:
  • 39. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R First, we will read the data from our CSV file, by entering this command: Then, we will split our dataset into training and testing, with the ratio 8:2
  • 40. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R First, we will read the data from our CSV file, by entering this command: Then, we will split our dataset into training and testing, with the ratio 8:2 After that we’ll create our model using the training dataset
  • 41. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R The summary of the model will give this.
  • 42. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R The summary of the model will give this. *** - 99.9% confident ** - 99% confident * - 95% confident . - 90% confident
  • 43. www.edureka.co/data-scienceEdureka’s Data Science Certification Training • This is the summary model that we get after improving our model. • So the insignificant fields is skin Logistic Regression Demo In R Null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean) Residual deviance shows how well the response variable is predicted with inclusion of independent variables.
  • 44. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R We will the predict the values for the test dataset and then categorize them according to threshold which is 0.5
  • 45. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R We will the predict the values for the test dataset and then categorize them according to threshold which is 0.5 Create Confusion Matrix for the training dataset
  • 46. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R We will the predict the values for the test dataset and then categorize them according to threshold which is 0.5 And then finding the accuracy Create Confusion Matrix for the training dataset
  • 47. www.edureka.co/data-scienceEdureka’s Data Science Certification Training How To Find The Threshold?
  • 48. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable Logistic Regression Demo In R
  • 49. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable Logistic Regression Demo In R Import the library for the ROCR package
  • 50. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable Logistic Regression Demo In R Import the library for the ROCR package Define the ‘ROCRPred’ and and ‘ROCRPerf’ variables
  • 51. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable Logistic Regression Demo In R Import the library for the ROCR package Define the ‘ROCRPred’ and and ‘ROCRPerf’ variables Plot the graph!
  • 52. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R
  • 53. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R Confusion Matrix for Test Dataset with 0.5 threshold Confusion Matrix for Test Dataset with 0.3 threshold Accuracy = 73.8% TrueNegatives = 9 Accuracy = 67.8% TrueNegatives = 5
  • 54. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression Demo In R Confusion Matrix for Test Dataset with 0.5 threshold Confusion Matrix for Test Dataset with 0.3 threshold Accuracy = 73.8% TrueNegatives = 9 Accuracy = 67.8% TrueNegatives = 5 Confusion Matrix for Training Dataset with 0.3 threshold Accuracy = 79.4%
  • 55. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression: Use Cases
  • 56. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression – Use Case • Logistic Regression was used in conjugation with Geographic Information system in 2005, to predict the malaria breeding grounds in Africa. • Logistic regression was used to approximate areas where malaria patients would exist based on geographical inputs.
  • 57. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Logistic Regression – Use Case • Logit analysis is a statistical technique used by marketers to assess the scope of customer acceptance of a product, particularly a new product. • It attempts to determine the intensity or magnitude of customers' purchase intentions and translates that into a measure of actual buying behavior. • Many e-commerce websites assess this behavior using this model.
  • 58. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Session In A Minute The 5 Questions In Data Science Use CasesLogistic Regression Working? Logistic Regression – What & Why? Demo What Is Regression?
  • 59. www.edureka.co/data-scienceEdureka’s Data Science Certification Training Course Details Go to www.edureka.co/data-science Get Edureka Certified in Data Science Today! What our learners have to say about us! Shravan Reddy says- “I would like to recommend any one who wants to be a Data Scientist just one place: Edureka. Explanations are clean, clear, easy to understand. Their support team works very well.. I took the Data Science course and I'm going to take Machine Learning with Mahout and then Big Data and Hadoop”. Gnana Sekhar says - “Edureka Data science course provided me a very good mixture of theoretical and practical training. LMS pre recorded sessions and assignments were very good as there is a lot of information in them that will help me in my job. Edureka is my teaching GURU now...Thanks EDUREKA.” Balu Samaga says - “It was a great experience to undergo and get certified in the Data Science course from Edureka. Quality of the training materials, assignments, project, support and other infrastructures are a top notch.”