SlideShare a Scribd company logo
Regression and Classification
3/31/2020 Shivani Saluja(G D Goenka University ) 1
3/31/2020 Shivani Saluja(G D Goenka University ) 2
Regression
• It is a form of predictive modelling technique which investigates the
relationship between a dependent and independent variable
• It involves graphing a line over a set of data points which closely fits
the overall shape of data
• Change of dependent variable on y and independent on x
3/31/2020 Shivani Saluja(G D Goenka University ) 3
Applications
• Determing strength of indicators(impact of independent and
dependent relationship between sales and marketing )
• Forecasting the effect
• Trend forecasting
3/31/2020 Shivani Saluja(G D Goenka University ) 4
Classification
• A classification problem is when the output variable is a category, such as “red” or “blue”
or “disease” and “no disease”.
• A classification model attempts to draw some conclusion from observed values.
• Given one or more inputs a classification model will try to predict the value of one or
more outcomes.
For example, when filtering emails “spam” or “not spam”, when looking at transaction
data, “fraudulent”, or “authorized”.
• Classification either predicts categorical class labels or classifies data (construct a
model) based on the training set and the values (class labels) in classifying attributes and
uses it in classifying new data.
• There are a number of classification models. Classification models include logistic
regression, decision tree, random forest, gradient-boosted tree, multilayer perceptron,
one-vs-rest, and Naive Bayes
3/31/2020 Shivani Saluja(G D Goenka University ) 5
Examples
• Predicting the gender of a person by his/her handwriting style(C)
• Predicting house price based on area(R)
• Predicting whether monsoon will be normal next year(C)
• Predict the number of copies a music album will be sold next
month(R).
3/31/2020 Shivani Saluja(G D Goenka University ) 6
3/31/2020 Shivani Saluja(G D Goenka University ) 7
Linear vs Logistic Regression
Linear Regression
1. Data is modelling using straight line
2. Used with continuous variable
3. Output /Prediction is value of variable
4. Accuracy is measured by method of
least squares
Logistic Regression
1. Probability of some event is
represented as a linear function of
combination of predictor variables
2. Used with categorical variable
3. Output /Prediction is Probability of
occurrence of an event
4. Accuracy is measured in terms of
precision and recall
3/31/2020 Shivani Saluja(G D Goenka University ) 8
Linear(Univariate)Regression
• Linear regression may be defined as the statistical model that analyzes the linear relationship
between a dependent variable with given set of independent variables.
• Linear relationship between variables means that when the value of one or more independent
variables will change (increase or decrease), the value of dependent variable will also change
accordingly (increase or decrease).
• Mathematically the relationship can be represented with the help of following equation
𝑌 = 𝑚𝑥 + 𝑏
Y is the dependent variable we are trying to predict.
X is the dependent variable we are using to make predictions.
m is the slop of the regression line which represents the effect X has on Y
b is a constant, known as the Y-intercept. If X = 0,Y would be equal to b.
• It relates one predictor and one response
3/31/2020 Shivani Saluja(G D Goenka University ) 9
Positive vs Negative Linear Relationship
A linear relationship will be called
positive if both independent and
dependent variable increases.
3/31/2020 Shivani Saluja(G D Goenka University ) 10
A linear relationship will be called
negative if independent increases and
dependent variable decreases.
Assumptions
The following are some assumptions about dataset that is made by
Linear Regression model −
• Multi-collinearity − Linear regression model assumes that there is
very little or no multi-collinearity in the data. Basically, multi-
collinearity occurs when the independent variables or features have
dependency in them.
• Auto-correlation − Another assumption Linear regression model
assumes is that there is very little or no auto-correlation in the data.
Basically, auto-correlation occurs when there is dependency between
residual errors.
3/31/2020 Shivani Saluja(G D Goenka University ) 11
Types of Linear Regression
• Simple Linear Regression
• Multiple Linear Regression
3/31/2020 Shivani Saluja(G D Goenka University ) 12
Principle of Least Squares
• The Least Squares Regression Line is the line that makes the vertical
distance from the data points to the regression line as small as
possible. It's called a “least squares” because the best line of fit is one
that minimizes the variance (the sum of squares of the errors).
• Y must be the best possible estimate of the real data.
3/31/2020 Shivani Saluja(G D Goenka University ) 13
Numerical
• The values of dependent (y) and independent (x) variable are given in
table below .Find the least square regression line y=ax+b .Estimate
the value of y when x is 6
3/31/2020 Shivani Saluja(G D Goenka University ) 14
X Y
1 3
2 4
3 2
4 4
5 5
x y (x-x’) (y-y’) Square(x-x’) (x-x’)(y-y’)
1 3
-2 -.6 4 1.2
2 4
-1 0.4 1 -0.4
3 2
0 -1.6 0 0
4 4
1 0.4 1 0.4
5 5
2 1.4 4 2.8
Mean =3 Mean =3.6
ɛ =10 ɛ =4
3/31/2020 Shivani Saluja(G D Goenka University ) 15
Y=mx+c
Substitute mean value of y and x in equation
3.6= .4*3 +c
C=3.6-1.2 =2.4
When x is 6
Y= .4*6 +2.4= 4.8
Multiple Regression
3/31/2020 Shivani Saluja(G D Goenka University ) 16
It r elates more than one predictor and one
response
Multivariate Regression
It is a method used to measure the degree at which more than one
independent variable (predictors) and more than one dependent
variable (responses), are linearly related.
3/31/2020 Shivani Saluja(G D Goenka University ) 17
Logistic Regression
3/31/2020 Shivani Saluja(G D Goenka University ) 18
Definition
• Logistic Regression is used when the dependent variable(target) is
categorical.
For example,
• To predict whether an email is spam (1) or (0)
• Whether the tumor is malignant (1) or not (0)
3/31/2020 Shivani Saluja(G D Goenka University ) 19
Types of logistic Regression
1) Binary Logistic Regression
The categorical response has only two 2 possible outcomes. Example:
Spam or Not
2. Multinomial Logistic Regression
Three or more categories without ordering. Example: Predicting which
food is preferred more (Veg, Non-Veg, Vegan)
3. Ordinal Logistic Regression
• Three or more categories with ordering. Example: Movie rating from 1
to 5
3/31/2020 Shivani Saluja(G D Goenka University ) 20
3/31/2020 Shivani Saluja(G D Goenka University ) 21

More Related Content

PPTX
Statistical tools in research 1
PDF
Multiple Linear Regression Applications in Real Estate Pricing
PPT
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
PPTX
Statistical Approaches to Missing Data
PPTX
Data Analysis and Statistics
DOCX
One Graduate Paper
PPTX
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
PDF
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
Statistical tools in research 1
Multiple Linear Regression Applications in Real Estate Pricing
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Statistical Approaches to Missing Data
Data Analysis and Statistics
One Graduate Paper
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...

What's hot (17)

PPTX
General Linear Model | Statistics
PDF
Ordinal logistic regression
PPTX
PyData Miami 2019, Quantum Generalized Linear Models
PDF
Logistic Regression Analysis
PPT
Mann Whitney U Test | Statistics
PPT
Chi Squared
DOCX
Mba103 statistics for management
PPTX
Quantum generalized linear models
PPTX
Logistic regression
PPTX
Data mining Part 1
PDF
Regression analysis made easy
PPTX
Missing Data and data imputation techniques
PDF
Data analysis
PPT
Statistical Methods
PPTX
PDF
Business statistics-i-part1-aarhus-bss
PPT
MELJUN CORTES research lectures_evaluating_data_statistical_treatment
General Linear Model | Statistics
Ordinal logistic regression
PyData Miami 2019, Quantum Generalized Linear Models
Logistic Regression Analysis
Mann Whitney U Test | Statistics
Chi Squared
Mba103 statistics for management
Quantum generalized linear models
Logistic regression
Data mining Part 1
Regression analysis made easy
Missing Data and data imputation techniques
Data analysis
Statistical Methods
Business statistics-i-part1-aarhus-bss
MELJUN CORTES research lectures_evaluating_data_statistical_treatment
Ad

Similar to Regression (20)

PPTX
UNIT-II-Describing Data and Relationships
PPS
Discrete data mapping
PPTX
Advanced Methods of Statistical Analysis used in Animal Breeding.
PDF
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
PDF
Deepak_DAI101_Data_Anal_lecture6 (1).pdf
PDF
Anomaly detection Meetup Slides
PPTX
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
PDF
Anomaly detection
PPTX
12.2 TW0-WAY-ANALYSIS-OF-VARIANCE Stat.pptx
PPTX
Correlation biostatistics
PDF
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
PDF
Section 5 - Improve Phase pdf Lean Six sigma
PDF
KIT-601 Lecture Notes-UNIT-2.pdf
PDF
Correlation analysis
PPTX
Correlation research
PDF
Introduction to correlation and regression analysis
PPTX
Correlation.pptx
PPT
SPSS statistics - get help using SPSS
PPT
Lecture-10 Correlation-and-Regression.ppt
PPT
lecture-10----Correlation-and-Regression.ppt
UNIT-II-Describing Data and Relationships
Discrete data mapping
Advanced Methods of Statistical Analysis used in Animal Breeding.
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Deepak_DAI101_Data_Anal_lecture6 (1).pdf
Anomaly detection Meetup Slides
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
Anomaly detection
12.2 TW0-WAY-ANALYSIS-OF-VARIANCE Stat.pptx
Correlation biostatistics
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Section 5 - Improve Phase pdf Lean Six sigma
KIT-601 Lecture Notes-UNIT-2.pdf
Correlation analysis
Correlation research
Introduction to correlation and regression analysis
Correlation.pptx
SPSS statistics - get help using SPSS
Lecture-10 Correlation-and-Regression.ppt
lecture-10----Correlation-and-Regression.ppt
Ad

More from shivani saluja (6)

PPTX
Reinforcement learning
PPTX
Decision tree
PPTX
supervised and unsupervised learning
PPTX
Bayes and naive bayes
PPTX
Introduction to Machine Learning
PPT
Prolog basics
Reinforcement learning
Decision tree
supervised and unsupervised learning
Bayes and naive bayes
Introduction to Machine Learning
Prolog basics

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
web development for engineering and engineering
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Mechanical Engineering MATERIALS Selection
DOCX
573137875-Attendance-Management-System-original
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Model Code of Practice - Construction Work - 21102022 .pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Arduino robotics embedded978-1-4302-3184-4.pdf
Foundation to blockchain - A guide to Blockchain Tech
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Embodied AI: Ushering in the Next Era of Intelligent Systems
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
web development for engineering and engineering
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CH1 Production IntroductoryConcepts.pptx
Construction Project Organization Group 2.pptx
Structs to JSON How Go Powers REST APIs.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Mechanical Engineering MATERIALS Selection
573137875-Attendance-Management-System-original
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...

Regression

  • 1. Regression and Classification 3/31/2020 Shivani Saluja(G D Goenka University ) 1
  • 2. 3/31/2020 Shivani Saluja(G D Goenka University ) 2
  • 3. Regression • It is a form of predictive modelling technique which investigates the relationship between a dependent and independent variable • It involves graphing a line over a set of data points which closely fits the overall shape of data • Change of dependent variable on y and independent on x 3/31/2020 Shivani Saluja(G D Goenka University ) 3
  • 4. Applications • Determing strength of indicators(impact of independent and dependent relationship between sales and marketing ) • Forecasting the effect • Trend forecasting 3/31/2020 Shivani Saluja(G D Goenka University ) 4
  • 5. Classification • A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. • A classification model attempts to draw some conclusion from observed values. • Given one or more inputs a classification model will try to predict the value of one or more outcomes. For example, when filtering emails “spam” or “not spam”, when looking at transaction data, “fraudulent”, or “authorized”. • Classification either predicts categorical class labels or classifies data (construct a model) based on the training set and the values (class labels) in classifying attributes and uses it in classifying new data. • There are a number of classification models. Classification models include logistic regression, decision tree, random forest, gradient-boosted tree, multilayer perceptron, one-vs-rest, and Naive Bayes 3/31/2020 Shivani Saluja(G D Goenka University ) 5
  • 6. Examples • Predicting the gender of a person by his/her handwriting style(C) • Predicting house price based on area(R) • Predicting whether monsoon will be normal next year(C) • Predict the number of copies a music album will be sold next month(R). 3/31/2020 Shivani Saluja(G D Goenka University ) 6
  • 7. 3/31/2020 Shivani Saluja(G D Goenka University ) 7
  • 8. Linear vs Logistic Regression Linear Regression 1. Data is modelling using straight line 2. Used with continuous variable 3. Output /Prediction is value of variable 4. Accuracy is measured by method of least squares Logistic Regression 1. Probability of some event is represented as a linear function of combination of predictor variables 2. Used with categorical variable 3. Output /Prediction is Probability of occurrence of an event 4. Accuracy is measured in terms of precision and recall 3/31/2020 Shivani Saluja(G D Goenka University ) 8
  • 9. Linear(Univariate)Regression • Linear regression may be defined as the statistical model that analyzes the linear relationship between a dependent variable with given set of independent variables. • Linear relationship between variables means that when the value of one or more independent variables will change (increase or decrease), the value of dependent variable will also change accordingly (increase or decrease). • Mathematically the relationship can be represented with the help of following equation 𝑌 = 𝑚𝑥 + 𝑏 Y is the dependent variable we are trying to predict. X is the dependent variable we are using to make predictions. m is the slop of the regression line which represents the effect X has on Y b is a constant, known as the Y-intercept. If X = 0,Y would be equal to b. • It relates one predictor and one response 3/31/2020 Shivani Saluja(G D Goenka University ) 9
  • 10. Positive vs Negative Linear Relationship A linear relationship will be called positive if both independent and dependent variable increases. 3/31/2020 Shivani Saluja(G D Goenka University ) 10 A linear relationship will be called negative if independent increases and dependent variable decreases.
  • 11. Assumptions The following are some assumptions about dataset that is made by Linear Regression model − • Multi-collinearity − Linear regression model assumes that there is very little or no multi-collinearity in the data. Basically, multi- collinearity occurs when the independent variables or features have dependency in them. • Auto-correlation − Another assumption Linear regression model assumes is that there is very little or no auto-correlation in the data. Basically, auto-correlation occurs when there is dependency between residual errors. 3/31/2020 Shivani Saluja(G D Goenka University ) 11
  • 12. Types of Linear Regression • Simple Linear Regression • Multiple Linear Regression 3/31/2020 Shivani Saluja(G D Goenka University ) 12
  • 13. Principle of Least Squares • The Least Squares Regression Line is the line that makes the vertical distance from the data points to the regression line as small as possible. It's called a “least squares” because the best line of fit is one that minimizes the variance (the sum of squares of the errors). • Y must be the best possible estimate of the real data. 3/31/2020 Shivani Saluja(G D Goenka University ) 13
  • 14. Numerical • The values of dependent (y) and independent (x) variable are given in table below .Find the least square regression line y=ax+b .Estimate the value of y when x is 6 3/31/2020 Shivani Saluja(G D Goenka University ) 14 X Y 1 3 2 4 3 2 4 4 5 5
  • 15. x y (x-x’) (y-y’) Square(x-x’) (x-x’)(y-y’) 1 3 -2 -.6 4 1.2 2 4 -1 0.4 1 -0.4 3 2 0 -1.6 0 0 4 4 1 0.4 1 0.4 5 5 2 1.4 4 2.8 Mean =3 Mean =3.6 ɛ =10 ɛ =4 3/31/2020 Shivani Saluja(G D Goenka University ) 15 Y=mx+c Substitute mean value of y and x in equation 3.6= .4*3 +c C=3.6-1.2 =2.4 When x is 6 Y= .4*6 +2.4= 4.8
  • 16. Multiple Regression 3/31/2020 Shivani Saluja(G D Goenka University ) 16 It r elates more than one predictor and one response
  • 17. Multivariate Regression It is a method used to measure the degree at which more than one independent variable (predictors) and more than one dependent variable (responses), are linearly related. 3/31/2020 Shivani Saluja(G D Goenka University ) 17
  • 18. Logistic Regression 3/31/2020 Shivani Saluja(G D Goenka University ) 18
  • 19. Definition • Logistic Regression is used when the dependent variable(target) is categorical. For example, • To predict whether an email is spam (1) or (0) • Whether the tumor is malignant (1) or not (0) 3/31/2020 Shivani Saluja(G D Goenka University ) 19
  • 20. Types of logistic Regression 1) Binary Logistic Regression The categorical response has only two 2 possible outcomes. Example: Spam or Not 2. Multinomial Logistic Regression Three or more categories without ordering. Example: Predicting which food is preferred more (Veg, Non-Veg, Vegan) 3. Ordinal Logistic Regression • Three or more categories with ordering. Example: Movie rating from 1 to 5 3/31/2020 Shivani Saluja(G D Goenka University ) 20
  • 21. 3/31/2020 Shivani Saluja(G D Goenka University ) 21