SlideShare a Scribd company logo
Correlation and
Regression in R
Hamid Reza Bolhasani
PhD, Data Scientist
Jan 2020
1
Table of contents
- Covariance
- Correlation
- Examples
- Regression
- Case Study in R
- Conclusion
2
Covariance
1
))((
),cov( 1




n
yyxx
yx
i
n
i
i
1
)( 2
12




n
xx
S
n
i
i
x
Variance
Gives information of a single variable
Covariance
Gives information on the degree to which two variables
vary together.
ïź When X and Y cov (x,y) = pos.
ïź When X and Y cov (x,y) = neg.
ïź When no constant relationship: cov (x,y) = 0
3
Covariance Example
High variance data Low variance data
Subject x y x error * y
error
x y X error * y
error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean 51 50 51 50
Sum of x error * y error : 7000 Sum of x error * y error : 28
Covariance: 1166.67 Covariance: 4.67
4
Correlation & Regression
Correlation
- Is there any relationship between 2 variables (x,y)?
- X is independent (Explanatory) and Y is dependent (Response)
- Correlation ≠ Causation
Regression
How well a certain independent variable predict dependent variable?
     2 2
2 2
N XY X Y
N X X N Y Y
xyr
 
  

   
xy
xy
x y
s
r
s s

5
Correlation in Scatter Diagrams
x
y
Strong negative correlation
x
y
Weak positive correlation
x
y
Strong positive correlation
x
y
Nonlinear Correlation
r = 0.91 r = 0.88
r = 0.42
r = 0.07
6
Regression Example
Smoking vs Lung Capacity
N Cigarettes (X ) Lung Capacity (Y )
1 0 45
2 5 42
3 10 33
4 15 31
5 20 29
7
Example Analysis
Smoking vs Lung Capacity
20
25
30
35
40
45
50
-5 0 5 10 15 20 25
LungCapacity
Smoking (yrs)
Lung Capacity (Y )
8
1
( 215) 53.75
4
xyS    
0.96xyr  
When smoking is above its group
means, lung capacity tends to be
below its group mean.
Greater smoking exposure implies
greater likelihood of lung damage.
Regression
9
- The process of predicting variable Y using variable X.
- Tells us how values in Y changes as a function of changes in value X.
- Calculates the “best-fit” line for a certain of data.
x
y
d1
d2
d3
Observed y-value
Predicted y-value
intercept
Δ
Ć· = ax + b
Δ = residual error
= y i , true value
slope
= Ć·, predicted value
Regression: Case Study in R
10
library(ggplot2)
ggplot(data=mtcars, aes(x=wt, y=mpg))+geom_point()
- Data = mtcars
- wt: weight
- Mpg: miles per gallon
- S(x,y) = -5.11
- r(x,y) = -0.86
Regression: Case Study in R
11
- Intercept = 37
- Slope = -5.3
ggplot(data=mtcars, aes(x=wt, y=mpg))+geom_point()+geom_smooth(method="lm",se=FALSE)
lm(data=mtcars, mpg ~ wt)
Thanks!Hamid Reza Bolhasani
bolhasani@gmail.com
Jan 2020
12

More Related Content

PPT
Chapter5
PPTX
9.2 lin reg coeff of det
PPTX
PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT
PPT
Linear regression
PPT
Malhotra17
PPTX
Basics of Regression analysis
PPT
Chapter7
DOCX
Unit 5 Correlation
Chapter5
9.2 lin reg coeff of det
PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT
Linear regression
Malhotra17
Basics of Regression analysis
Chapter7
Unit 5 Correlation

What's hot (19)

DOCX
Econometrics project mcom and mphill
PPTX
METHOD OF LEAST SQURE
PPTX
Grade 9 homework questions on 2.4 and 2.5
PPT
Chapter15
PPT
04 regression
PPT
Chapter13
PDF
Regularization and variable selection via elastic net
PDF
Linear Regression Ordinary Least Squares Distributed Calculation Example
PDF
Dag in mmhc
PDF
BlUP and BLUE- REML of linear mixed model
DOCX
5 regression
PPSX
Simple linear regression
PPT
Chapter14
PDF
Error analysis statistics
PDF
Stat sample test ch 10
PDF
2018 MUMS Fall Course - Problem Set 1 (Attachment for Class Presentation) - J...
PPT
Chapter4
PDF
Business statistics homework help
PDF
Visual Explanation of Ridge Regression and LASSO
Econometrics project mcom and mphill
METHOD OF LEAST SQURE
Grade 9 homework questions on 2.4 and 2.5
Chapter15
04 regression
Chapter13
Regularization and variable selection via elastic net
Linear Regression Ordinary Least Squares Distributed Calculation Example
Dag in mmhc
BlUP and BLUE- REML of linear mixed model
5 regression
Simple linear regression
Chapter14
Error analysis statistics
Stat sample test ch 10
2018 MUMS Fall Course - Problem Set 1 (Attachment for Class Presentation) - J...
Chapter4
Business statistics homework help
Visual Explanation of Ridge Regression and LASSO
Ad

Similar to Machine Learning in R - Part 1: Correlation and Regression (Basics) (20)

PPT
Corr And Regress
PDF
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
PDF
Chapter 2 part3-Least-Squares Regression
PPT
Correlation & Regression for Statistics Social Science
PPT
Corr-and-Regress.ppt
PPT
Cr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress (1).ppt
PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPT
Regression and Co-Relation
PPT
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
PPT
correlation and regression
PDF
Introduction to correlation and regression analysis
PPT
Chapter 10
PPT
Chapter 10
PDF
Correation, Linear Regression and Multilinear Regression using R software
PPTX
Correlation and Regression Analysis_ Understanding Relationships_AI PPT Maker...
PPT
Lecture 13 Regression & Correlation.ppt
Corr And Regress
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Chapter 2 part3-Least-Squares Regression
Correlation & Regression for Statistics Social Science
Corr-and-Regress.ppt
Cr-and-Regress.ppt
Corr-and-Regress.ppt
Corr-and-Regress (1).ppt
Corr-and-Regress.ppt
Corr-and-Regress.ppt
Regression and Co-Relation
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and regression
Introduction to correlation and regression analysis
Chapter 10
Chapter 10
Correation, Linear Regression and Multilinear Regression using R software
Correlation and Regression Analysis_ Understanding Relationships_AI PPT Maker...
Lecture 13 Regression & Correlation.ppt
Ad

More from Hamidreza Bolhasani (16)

PDF
Introduction to Research Methodology
PDF
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
PDF
Mobile Networks Architecture and Security (2G to 5G)
PDF
An Overview on the role of Artificial Intelligence (AI) and Deep Neural Netwo...
PDF
CS-Core Mobile Network (General)
PPTX
5G Network Overview
PPTX
NFV +SDN (Network Function Virtualization)
PPTX
5G New Services - Opportunities and Challenges
PPTX
5G + AI Applications in Healthcare and Medical Sciences
PPTX
2G / 3G / 4G / IMS / 5G Overview with Focus on Core Network
PPTX
Neural Networks Hardware Accelerators (An Introduction)
PPTX
An Introduction to Quantum Computers Architecture
PPTX
Transport Layer in Computer Networks (TCP / UDP / SCTP)
PPTX
IMS + VoLTE Overview
PPTX
Mobile Networks Overview (2G / 3G / 4G-LTE)
PPTX
High-Tech Telecommunication (4G/LTE) overview with focus on new services
Introduction to Research Methodology
Internet of Things (IoT) and Artificial Intelligence (AI) role in Medical and...
Mobile Networks Architecture and Security (2G to 5G)
An Overview on the role of Artificial Intelligence (AI) and Deep Neural Netwo...
CS-Core Mobile Network (General)
5G Network Overview
NFV +SDN (Network Function Virtualization)
5G New Services - Opportunities and Challenges
5G + AI Applications in Healthcare and Medical Sciences
2G / 3G / 4G / IMS / 5G Overview with Focus on Core Network
Neural Networks Hardware Accelerators (An Introduction)
An Introduction to Quantum Computers Architecture
Transport Layer in Computer Networks (TCP / UDP / SCTP)
IMS + VoLTE Overview
Mobile Networks Overview (2G / 3G / 4G-LTE)
High-Tech Telecommunication (4G/LTE) overview with focus on new services

Recently uploaded (20)

PDF
[EN] Industrial Machine Downtime Prediction
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Business Analytics and business intelligence.pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Knowledge Engineering Part 1
[EN] Industrial Machine Downtime Prediction
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Business Analytics and business intelligence.pdf
ISS -ESG Data flows What is ESG and HowHow
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
oil_refinery_comprehensive_20250804084928 (1).pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
SAP 2 completion done . PRESENTATION.pptx
IB Computer Science - Internal Assessment.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
annual-report-2024-2025 original latest.
Introduction to Knowledge Engineering Part 1

Machine Learning in R - Part 1: Correlation and Regression (Basics)

  • 1. Correlation and Regression in R Hamid Reza Bolhasani PhD, Data Scientist Jan 2020 1
  • 2. Table of contents - Covariance - Correlation - Examples - Regression - Case Study in R - Conclusion 2
  • 3. Covariance 1 ))(( ),cov( 1     n yyxx yx i n i i 1 )( 2 12     n xx S n i i x Variance Gives information of a single variable Covariance Gives information on the degree to which two variables vary together. ïź When X and Y cov (x,y) = pos. ïź When X and Y cov (x,y) = neg. ïź When no constant relationship: cov (x,y) = 0 3
  • 4. Covariance Example High variance data Low variance data Subject x y x error * y error x y X error * y error 1 101 100 2500 54 53 9 2 81 80 900 53 52 4 3 61 60 100 52 51 1 4 51 50 0 51 50 0 5 41 40 100 50 49 1 6 21 20 900 49 48 4 7 1 0 2500 48 47 9 Mean 51 50 51 50 Sum of x error * y error : 7000 Sum of x error * y error : 28 Covariance: 1166.67 Covariance: 4.67 4
  • 5. Correlation & Regression Correlation - Is there any relationship between 2 variables (x,y)? - X is independent (Explanatory) and Y is dependent (Response) - Correlation ≠ Causation Regression How well a certain independent variable predict dependent variable?      2 2 2 2 N XY X Y N X X N Y Y xyr           xy xy x y s r s s  5
  • 6. Correlation in Scatter Diagrams x y Strong negative correlation x y Weak positive correlation x y Strong positive correlation x y Nonlinear Correlation r = 0.91 r = 0.88 r = 0.42 r = 0.07 6
  • 7. Regression Example Smoking vs Lung Capacity N Cigarettes (X ) Lung Capacity (Y ) 1 0 45 2 5 42 3 10 33 4 15 31 5 20 29 7
  • 8. Example Analysis Smoking vs Lung Capacity 20 25 30 35 40 45 50 -5 0 5 10 15 20 25 LungCapacity Smoking (yrs) Lung Capacity (Y ) 8 1 ( 215) 53.75 4 xyS     0.96xyr   When smoking is above its group means, lung capacity tends to be below its group mean. Greater smoking exposure implies greater likelihood of lung damage.
  • 9. Regression 9 - The process of predicting variable Y using variable X. - Tells us how values in Y changes as a function of changes in value X. - Calculates the “best-fit” line for a certain of data. x y d1 d2 d3 Observed y-value Predicted y-value intercept Δ Ć· = ax + b Δ = residual error = y i , true value slope = Ć·, predicted value
  • 10. Regression: Case Study in R 10 library(ggplot2) ggplot(data=mtcars, aes(x=wt, y=mpg))+geom_point() - Data = mtcars - wt: weight - Mpg: miles per gallon - S(x,y) = -5.11 - r(x,y) = -0.86
  • 11. Regression: Case Study in R 11 - Intercept = 37 - Slope = -5.3 ggplot(data=mtcars, aes(x=wt, y=mpg))+geom_point()+geom_smooth(method="lm",se=FALSE) lm(data=mtcars, mpg ~ wt)