SlideShare a Scribd company logo
Correlation

 FF2613
Correlation

• 2 Continuous Variables
     • linear relationship


• analysis of the relationship between two
  quantitative outcomes, e.g., height and
  weight.
Correlation & Linear Regression
Bivariate Correlation




Positive and Linear   Negative and Linear
How to calculate r?
                              a




b                         c
Example


•∑ x = 4631          ∑ x2 = 688837
•∑ y = 2863          ∑ y2 = 264527
•∑ xy = 424780        n = 32

•a=424780-(4631*2863/32)=10,450.22
•b=688837-46312/32=18,644.47
•c=264527-28632/32=8,377.969
•r=a/(b*c)0.5
  =10,450.22/(18,644.47*83,77.969)0.5
  =0.836144


•t= 0.836144*((32-2)/(1-0.8361442))0.5
t = 8.349436 & d.f. = n - 2 = 30,
p < 0.001
Please refer to Table A3.
        We use df = 30.
 t = 8.3494 > 3.65 (p=0.001)

Therefore if t=8.3494, p<0.001.
How to interpret the value of r?
• r lies between -1 and 1. Values near 0
  means no (linear) correlation and values
  near ± 1 means very strong correlation.
• The negative sign means that the two
  variables are inversely related, that is, as
  one variable increases the other variable
  decreases.
How to interpret the value of r?
Pearson’s r
• A 0.9 is a strong positive association (as one
  variable rises, so does the other)
• A -0.9 is a strong negative association
  (as one variable rises, the other falls)



  r=0.9 has nothing to do with 90%
  r=correlation coefficient
Coefficient of Determination
             Defined
• Pearson’s r can be squared , r 2, to derive
  a coefficient of determination.


• Coefficient of determination – the portion
  of variability in one of the variables that
  can be accounted for by variability in the
  second variable
Coefficient of
   Determination
• Pearson’s r can be squared , r 2, to derive a
  coefficient of determination.

• Example of depression and CGPA
  – Pearson’s r shows negative correlation, r=-0.5
  – r2=0.25

  – In this example we can say that 1/4 or 0.25 of the
    variability in CGPA scores can be accounted for by
    depression (remaining 75% of variability is other
    factors, habits, ability, motivation, courses studied,
    etc)
Coefficient of Determination
         and Pearson’s r
• Pearson’s r can be squared , r 2

• If r=0.5, then r2=0.25
• If r=0.7 then r2=0.49

• Thus while r=0.5 versus 0.7 might not look so
  different in terms of strength, r2 tells us that
  r=0.7 accounts for about twice the variability
  relative to r=0.5
Causal Silence:
Correlation Does Not Imply
Causality


     Causality – must demonstrate that variance
      in one variable can only be due to
      influence of the other variable

     • Directionality of Effect Problem


     • Third Variable Problem
Correlation In SPSS

• For this exercise, we will
  be using the data from
  the CD, under Chapter 8,
  korelasi.sav
• This data is a subset of a
  case-control study on
  factors affecting SGA in
  Kelantan.
• Open the data & select -
  >Analyse
     >Correlate
        >Bivariate…
Correlation in SPSS
• We want to see whether
  there is any association
  between the mothers’
  weight and the
  babies’weight. So select
  the variables (weight2 &
  birthwgt) into ‘Variables’.
• Select ‘Pearson’
  Correlation Coefficients.
• Click the ‘OK’ button.
Correlations

         Correlation Results
         WEIGHT2        Pearson Correlation
                                                  WEIGHT2 BIRTHWGT
                                                         1     .431*
                        Sig. (2-tailed)                  .     .017
                        N                               30       30
         BIRTHWGT       Pearson Correlation           .431*       1
                        Sig. (2-tailed)               .017        .
                        N                               30       30
           *. Correlation is significant at the 0.05 level (2-tailed).




• The r = 0.431 and the p value is significant
  at 0.017.
• The r value indicates a fair and positive
  linear relationship.
Scatter Diagram

              3.6
              3.4
              3.2
                                                                           • If the correlation is
              3.0                                                            significant, it is best
              2.8
              2.6
                                                                             to include the
              2.4                                                            scatter diagram.
              2.2
              2.0                                                          • The r square
              1.8
              1.6                                                            indicated mothers’
              1.4                                                            weight contribute
              1.2
                                                                             19% of the
BIRTHWEIGHT




              1.0
               .8
               .6
                                                                             variability of the
               .4                                                            babies’ weight.
               .2
              0.0                                                          Rsq = 0.1861
                    0   10   20   30   40   50   60   70   80   90   100


                    MOTHERS' WEIGHT
Linear Regression
Linear Regression
 • Come up with a Linear Regression Model to
   predict a continuous outcome with a
   continuous risk factor, i.e. predict BP with age.
   Usually the next step after correlation is found
   to be strongly significant.
 • y = a + bx
    – e.g. BP = constant (a) + regression coefficient (b) * age

 • b=
Regression Line
• In a scatterplot showing the association
  between 2 variables, the regression line is
  the “best-fit” line and has the formula
y=a + bx
a=place where line crosses Y axis
b=slope of line (rise/run)
Thus, given a value of X, we can predict a
  value of Y
Regression Line
               (Defined)
Regression line is the line where absolute values
 of vertical distances between points on
 scatterplot and a line form a minimum sum
 (relative to other possible lines)




      Positive and Linear Negative and Linear
Example

      b=

∑x = 6426          ∑ x2 = 1338088
∑y = 4631          ∑ xy = 929701
n = 32
b = (929701-(6426*4631/32))/
(1338088-(64262/32)) = -0.00549
Mean x = 6426/32=200.8125
mean y = 4631/32=144.71875
y = a + bx
a = y – bx (replace the x, y & b value)
a = 144.71875+(0.00549*200.8125)
= 145.8212106
Systolic BP = 144.71875 - 0.00549.chol

More Related Content

PPT
Correlation mp
PDF
Pearson Correlation, Spearman Correlation &Linear Regression
PPTX
Simple correlation & Regression analysis
PPT
Correlation and regression
PPT
Correlation and regression
PDF
Correlation Analysis
ODP
Correlation
PDF
Correlation and regression
Correlation mp
Pearson Correlation, Spearman Correlation &Linear Regression
Simple correlation & Regression analysis
Correlation and regression
Correlation and regression
Correlation Analysis
Correlation
Correlation and regression

What's hot (18)

PPTX
Correlation and regression
PDF
Correlation 2
PPTX
Karl pearson's correlation
PPTX
Correlation
PPTX
Correlation
PPTX
Statistics-Correlation and Regression Analysis
PPTX
Correlation and regression
DOCX
Statistical technique exercise 23 and 24 correlational study
PPTX
Correlation
PPTX
Correlation and Regression
PPTX
Correlation & Regression
PPTX
Correlation
PPTX
Karl pearson's coefficient of correlation (1)
PDF
Correlation and Regression Analysis using SPSS and Microsoft Excel
PPTX
Correlation and Regression ppt
PPTX
correlation
PDF
Correlation and Regression
PPTX
Correlation
Correlation and regression
Correlation 2
Karl pearson's correlation
Correlation
Correlation
Statistics-Correlation and Regression Analysis
Correlation and regression
Statistical technique exercise 23 and 24 correlational study
Correlation
Correlation and Regression
Correlation & Regression
Correlation
Karl pearson's coefficient of correlation (1)
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression ppt
correlation
Correlation and Regression
Correlation
Ad

Similar to Correlation & Linear Regression (20)

PPT
Correlation and regression
PDF
PPT
Corelation and regression PowerPoint presentation for basic understanding
PPTX
Correlation and Regression Analysis.pptx
PPT
Regression and Co-Relation
PPTX
Correlation and Regression Analysis.pptx
PPTX
Regression
PPT
13943056.ppt
PPT
Biostatistics lecture notes 7.ppt
PPTX
correlation ;.pptx
PPTX
correlation.pptx
PPTX
Correlation continued
PPTX
Class 9 Covariance & Correlation Concepts.pptx
PPT
correlation.ppt
PPT
12943625.ppt
PPTX
CORRELATION ( srm1) - Copy.pptx
PPTX
correlation Types in statistical Education
PPTX
Correlation Statistics
PPTX
Topic 5 Covariance & Correlation.pptx
PPTX
Topic 5 Covariance & Correlation.pptx
Correlation and regression
Corelation and regression PowerPoint presentation for basic understanding
Correlation and Regression Analysis.pptx
Regression and Co-Relation
Correlation and Regression Analysis.pptx
Regression
13943056.ppt
Biostatistics lecture notes 7.ppt
correlation ;.pptx
correlation.pptx
Correlation continued
Class 9 Covariance & Correlation Concepts.pptx
correlation.ppt
12943625.ppt
CORRELATION ( srm1) - Copy.pptx
correlation Types in statistical Education
Correlation Statistics
Topic 5 Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
Ad

More from Azmi Mohd Tamil (20)

PDF
STANDARD Authorisation To Fly-FORM-02-01.pdf
PDF
HIS Standard in HUKM Hospital Information System
PDF
Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
PDF
Audiovisual and technicalities from preparation to retrieval how to enhance m...
PDF
Broadcast quality online teaching at zero budget
PDF
Video for Teaching & Learning: OBS
PDF
Bengkel 21-12-2020 - Etika atas Talian & Alat Minima
PPT
GIS & History of Mapping in Malaya (lecture notes circa 2009)
PDF
Blended e-learning in UKMFolio
PDF
How to Compute & Recode SPSS Data
PDF
Introduction to Data Analysis With R and R Studio
PDF
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
PDF
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
PDF
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
PDF
New Emerging And Reemerging Infections circa 2006
PDF
Hacks#36 -Raspberry Pi 4 Mini Computer
PDF
Hack#35 How to FB Live using a Video Encoder
PDF
Hack#34 - Online Teaching with Microsoft Teams
PDF
Hack#33 How To FB-Live
PDF
Skype for Business for UKM
STANDARD Authorisation To Fly-FORM-02-01.pdf
HIS Standard in HUKM Hospital Information System
Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
Audiovisual and technicalities from preparation to retrieval how to enhance m...
Broadcast quality online teaching at zero budget
Video for Teaching & Learning: OBS
Bengkel 21-12-2020 - Etika atas Talian & Alat Minima
GIS & History of Mapping in Malaya (lecture notes circa 2009)
Blended e-learning in UKMFolio
How to Compute & Recode SPSS Data
Introduction to Data Analysis With R and R Studio
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
New Emerging And Reemerging Infections circa 2006
Hacks#36 -Raspberry Pi 4 Mini Computer
Hack#35 How to FB Live using a Video Encoder
Hack#34 - Online Teaching with Microsoft Teams
Hack#33 How To FB-Live
Skype for Business for UKM

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
sap open course for s4hana steps from ECC to s4
Reach Out and Touch Someone: Haptics and Empathic Computing
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Correlation & Linear Regression

  • 2. Correlation • 2 Continuous Variables • linear relationship • analysis of the relationship between two quantitative outcomes, e.g., height and weight.
  • 4. Bivariate Correlation Positive and Linear Negative and Linear
  • 5. How to calculate r? a b c
  • 6. Example •∑ x = 4631 ∑ x2 = 688837 •∑ y = 2863 ∑ y2 = 264527 •∑ xy = 424780 n = 32 •a=424780-(4631*2863/32)=10,450.22 •b=688837-46312/32=18,644.47 •c=264527-28632/32=8,377.969 •r=a/(b*c)0.5 =10,450.22/(18,644.47*83,77.969)0.5 =0.836144 •t= 0.836144*((32-2)/(1-0.8361442))0.5 t = 8.349436 & d.f. = n - 2 = 30, p < 0.001
  • 7. Please refer to Table A3. We use df = 30. t = 8.3494 > 3.65 (p=0.001) Therefore if t=8.3494, p<0.001.
  • 8. How to interpret the value of r? • r lies between -1 and 1. Values near 0 means no (linear) correlation and values near ± 1 means very strong correlation. • The negative sign means that the two variables are inversely related, that is, as one variable increases the other variable decreases.
  • 9. How to interpret the value of r?
  • 10. Pearson’s r • A 0.9 is a strong positive association (as one variable rises, so does the other) • A -0.9 is a strong negative association (as one variable rises, the other falls) r=0.9 has nothing to do with 90% r=correlation coefficient
  • 11. Coefficient of Determination Defined • Pearson’s r can be squared , r 2, to derive a coefficient of determination. • Coefficient of determination – the portion of variability in one of the variables that can be accounted for by variability in the second variable
  • 12. Coefficient of Determination • Pearson’s r can be squared , r 2, to derive a coefficient of determination. • Example of depression and CGPA – Pearson’s r shows negative correlation, r=-0.5 – r2=0.25 – In this example we can say that 1/4 or 0.25 of the variability in CGPA scores can be accounted for by depression (remaining 75% of variability is other factors, habits, ability, motivation, courses studied, etc)
  • 13. Coefficient of Determination and Pearson’s r • Pearson’s r can be squared , r 2 • If r=0.5, then r2=0.25 • If r=0.7 then r2=0.49 • Thus while r=0.5 versus 0.7 might not look so different in terms of strength, r2 tells us that r=0.7 accounts for about twice the variability relative to r=0.5
  • 14. Causal Silence: Correlation Does Not Imply Causality Causality – must demonstrate that variance in one variable can only be due to influence of the other variable • Directionality of Effect Problem • Third Variable Problem
  • 15. Correlation In SPSS • For this exercise, we will be using the data from the CD, under Chapter 8, korelasi.sav • This data is a subset of a case-control study on factors affecting SGA in Kelantan. • Open the data & select - >Analyse >Correlate >Bivariate…
  • 16. Correlation in SPSS • We want to see whether there is any association between the mothers’ weight and the babies’weight. So select the variables (weight2 & birthwgt) into ‘Variables’. • Select ‘Pearson’ Correlation Coefficients. • Click the ‘OK’ button.
  • 17. Correlations Correlation Results WEIGHT2 Pearson Correlation WEIGHT2 BIRTHWGT 1 .431* Sig. (2-tailed) . .017 N 30 30 BIRTHWGT Pearson Correlation .431* 1 Sig. (2-tailed) .017 . N 30 30 *. Correlation is significant at the 0.05 level (2-tailed). • The r = 0.431 and the p value is significant at 0.017. • The r value indicates a fair and positive linear relationship.
  • 18. Scatter Diagram 3.6 3.4 3.2 • If the correlation is 3.0 significant, it is best 2.8 2.6 to include the 2.4 scatter diagram. 2.2 2.0 • The r square 1.8 1.6 indicated mothers’ 1.4 weight contribute 1.2 19% of the BIRTHWEIGHT 1.0 .8 .6 variability of the .4 babies’ weight. .2 0.0 Rsq = 0.1861 0 10 20 30 40 50 60 70 80 90 100 MOTHERS' WEIGHT
  • 20. Linear Regression • Come up with a Linear Regression Model to predict a continuous outcome with a continuous risk factor, i.e. predict BP with age. Usually the next step after correlation is found to be strongly significant. • y = a + bx – e.g. BP = constant (a) + regression coefficient (b) * age • b=
  • 21. Regression Line • In a scatterplot showing the association between 2 variables, the regression line is the “best-fit” line and has the formula y=a + bx a=place where line crosses Y axis b=slope of line (rise/run) Thus, given a value of X, we can predict a value of Y
  • 22. Regression Line (Defined) Regression line is the line where absolute values of vertical distances between points on scatterplot and a line form a minimum sum (relative to other possible lines) Positive and Linear Negative and Linear
  • 23. Example b= ∑x = 6426 ∑ x2 = 1338088 ∑y = 4631 ∑ xy = 929701 n = 32 b = (929701-(6426*4631/32))/ (1338088-(64262/32)) = -0.00549 Mean x = 6426/32=200.8125 mean y = 4631/32=144.71875 y = a + bx a = y – bx (replace the x, y & b value) a = 144.71875+(0.00549*200.8125) = 145.8212106 Systolic BP = 144.71875 - 0.00549.chol