SlideShare a Scribd company logo
Linear Regression
https://serc.carleton.edu/eddie/
Learning Objectives
After completing this vignette, a student should be able to:
• Demonstrate the ability to identify the independent and dependent
variables.
• Investigate the relationship between two datasets and use the extent
to which they vary together to predict future values
• Recognize that not all data relationships can be described using the
linear regression
StatVignette02-Regression.pptx
A first look at some data
What is being plotted here?
What is the independent and dependent variable? How do you Know?
A first look at some data
What can we glean from these two plots?
How could we investigate further?
Plotting sales vs temperature
Fitting lines (linear regression)
Equation of a line:
x = x coordinate
y = y coordinate
m = slope of the line
b = y-intercept
Line fit to Rita’s data:
How many ice cream cones are
sold for every degree increase in
temperature?
R2: The coefficient of determination
The R2 metric tells us how well the data fit the model,
and it varies from 0 to 1.
In other words: it’s the proportion of
variation that’s explained by the model.
R2 = 0 → our data don’t fit the model at all
R2 = 1 → our data fit the model perfectly
r2=0.23 r2=0.64 r2=0.99
R2: Rita’s data
StatVignette02-Regression.pptx
Take Home Messages!
When there is a general linear relationship between two variables it is
possible to use the equation of a line to predict a Y value to any know X
value
The line drawn in a regression can be used to predict the relationship in
between two data points by establishing a one-to-one relationship
between each X and Y value.

More Related Content

PPTX
StatVignette01-Correlation_06_15_2020.pptx
PPTX
Math dictionary chapter 12
PPTX
Research meathodology
DOCX
AI IoT data science and consumer behaviour with assortment planning , pricing...
PPTX
PPTX
Wisconsin hospital - Healthcare Cost Prediction
PPTX
Semi average method
DOCX
Correlation
StatVignette01-Correlation_06_15_2020.pptx
Math dictionary chapter 12
Research meathodology
AI IoT data science and consumer behaviour with assortment planning , pricing...
Wisconsin hospital - Healthcare Cost Prediction
Semi average method
Correlation

What's hot (17)

PPT
Graphing Notes
PPT
2.5 correlation & best fitting lines
PPTX
Data Applied: Correlation
PDF
Measurements and error in experiments
PPTX
my reliability
PPT
5 6 Scatter Plots & Best Fit Lines
PPTX
Outlier managment
KEY
Final study guide
PPTX
Scatter diagram
PDF
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
PPTX
Preparing data for spss analysis
PPTX
Correlational research 1 1
PPT
Poli_399_Tutorial_Week_Three_-_Sept_29th_(2)
DOC
Validity and reliability
PPTX
Business statistics
PDF
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
PPTX
Ch. 2 means to an end
Graphing Notes
2.5 correlation & best fitting lines
Data Applied: Correlation
Measurements and error in experiments
my reliability
5 6 Scatter Plots & Best Fit Lines
Outlier managment
Final study guide
Scatter diagram
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
Preparing data for spss analysis
Correlational research 1 1
Poli_399_Tutorial_Week_Three_-_Sept_29th_(2)
Validity and reliability
Business statistics
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
Ch. 2 means to an end
Ad

Similar to StatVignette02-Regression.pptx (20)

PPTX
Linear Regression.pptx
PPTX
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
PPT
Linear regression
PDF
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
PPTX
Artifical Intelligence And Machine Learning Algorithum.pptx
PPTX
Linear regression.pptx
PPTX
Data mining with R- regression models
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPTX
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
PPTX
Business Analytics Foundation with R tools - Part 2
PPTX
An Introduction to Regression Models: Linear and Logistic approaches
PPTX
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPTX
Lecture 8 Linear and Multiple Regression (1).pptx
PDF
Regression
PPTX
Linear Regression final-1.pptx thbejnnej
PDF
Data Science - Part IV - Regression Analysis & ANOVA
PPTX
regression.pptx
PPTX
STATISTICS-AND-PROBABILITY-WEEK-9-10.pptx
PPTX
Introduction-to-Linear-Regression-Concepts-Application-and-Interpretation.pptx
Linear Regression.pptx
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
Linear regression
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
Artifical Intelligence And Machine Learning Algorithum.pptx
Linear regression.pptx
Data mining with R- regression models
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Business Analytics Foundation with R tools - Part 2
An Introduction to Regression Models: Linear and Logistic approaches
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Lecture 8 Linear and Multiple Regression (1).pptx
Regression
Linear Regression final-1.pptx thbejnnej
Data Science - Part IV - Regression Analysis & ANOVA
regression.pptx
STATISTICS-AND-PROBABILITY-WEEK-9-10.pptx
Introduction-to-Linear-Regression-Concepts-Application-and-Interpretation.pptx
Ad

More from SERC at Carleton College (20)

PPTX
StatVignette03_Sig.Figs_v04_07_15_2020.pptx
PPTX
StatVignette06_HypTesting.pptx
PPTX
Unit 1 (optional slides)
PPTX
Cretaceous Coatlines and Modern Voting Patterns Presentation
PPTX
Climate and Biomes PPT 2
PPTX
weather tracking ppt
PPTX
Presentation: Unit 1 Introduction to the hydrological cycle
PPTX
StatVignette05_M3_v02_10_21_2020.pptx
PPTX
KSKL chapter 8 PPT
PPTX
KSKL chap 5 PPT
PPTX
KSKL_Chapter 4_ Chem Properties of Soils.pptx
PPTX
Degraded Soil Images.pptx
PPTX
Educators PPT file chapter 7
PPTX
Educators PPT file chapter 2
PPTX
Educators PPT file chapter 6
PPTX
Educators PPT chapter 3
PPTX
Unit 4 background presentation
PPTX
Presentation: Unit 3 background information
PPTX
Presentation: Unit 2 Measuring Groundwater Background Information
PPTX
Introduction to GPS presentation
StatVignette03_Sig.Figs_v04_07_15_2020.pptx
StatVignette06_HypTesting.pptx
Unit 1 (optional slides)
Cretaceous Coatlines and Modern Voting Patterns Presentation
Climate and Biomes PPT 2
weather tracking ppt
Presentation: Unit 1 Introduction to the hydrological cycle
StatVignette05_M3_v02_10_21_2020.pptx
KSKL chapter 8 PPT
KSKL chap 5 PPT
KSKL_Chapter 4_ Chem Properties of Soils.pptx
Degraded Soil Images.pptx
Educators PPT file chapter 7
Educators PPT file chapter 2
Educators PPT file chapter 6
Educators PPT chapter 3
Unit 4 background presentation
Presentation: Unit 3 background information
Presentation: Unit 2 Measuring Groundwater Background Information
Introduction to GPS presentation

StatVignette02-Regression.pptx

  • 2. Learning Objectives After completing this vignette, a student should be able to: • Demonstrate the ability to identify the independent and dependent variables. • Investigate the relationship between two datasets and use the extent to which they vary together to predict future values • Recognize that not all data relationships can be described using the linear regression
  • 4. A first look at some data What is being plotted here? What is the independent and dependent variable? How do you Know?
  • 5. A first look at some data What can we glean from these two plots? How could we investigate further?
  • 6. Plotting sales vs temperature
  • 7. Fitting lines (linear regression) Equation of a line: x = x coordinate y = y coordinate m = slope of the line b = y-intercept Line fit to Rita’s data: How many ice cream cones are sold for every degree increase in temperature?
  • 8. R2: The coefficient of determination The R2 metric tells us how well the data fit the model, and it varies from 0 to 1. In other words: it’s the proportion of variation that’s explained by the model. R2 = 0 → our data don’t fit the model at all R2 = 1 → our data fit the model perfectly r2=0.23 r2=0.64 r2=0.99
  • 11. Take Home Messages! When there is a general linear relationship between two variables it is possible to use the equation of a line to predict a Y value to any know X value The line drawn in a regression can be used to predict the relationship in between two data points by establishing a one-to-one relationship between each X and Y value.

Editor's Notes

  • #3: Instructor notes: Be thoughtful about getting students to say what they see. Have students recognize that not all data relationships can be described using the Correlation Coefficient. Correlation is strictly about linear relationships (saying what they see can also be when they see a relationship that is not well described by CC) Explain that r and CC are interchangeable We use regression to make predictions and to define the slope of a best fit line, which shows the rate of change for ice cream sales and temperature.
  • #4: Instructor notes: Paint a rich picture for the students that gives Rita a backstory. Why does she care about ice cream sales? Have your students make predictions: What do you think the relationship between ice cream sales and temp (and humidity, rainfall) Note that humidity and temperature are continuous variables measured at discrete intervals based on the sampling rates of the instruments providing the measurements.
  • #6: Instructor notes: Have the students write a caption for each plot? Describe the axis, describe the major trends, inflection points and discuss any outliers. What is the general pattern? What else would you need to know to make this make sense? Northern or Southern Hemisphere? How would these data look different if Rita lived in the southern hemisphere? Inflection points vs. local maxima and minima Inflection Points = the point on a curve where a change in direction is occurring; the point of transition from where the sign changes (i.e. positive to negative or from negative to positive) or from increasing then decreasing. Local Maximum = highest point on a graph Local Minimum = lowest point on a graph. What relationship can you see? How could we quantify this on a single illustration… the answer is regression.
  • #7: Instructor notes: Here’s another way to look at the data, so we can see how sales and temperature might be related. It looks like they are related! But how can we describe that in a quantitative way? Two curvy graphs produced this straight pattern….. Use the phrase “vary together” Describe the axis, describe the trends… are there inflection points? Outliers? Perhaps add a slide that discusses the Outliers, inflection points, local minima and maxima. Why this plot and not the reverse? They both would make just as much sense. What is cause and what is effect? It may also interesting to plot the other tracked variables…. their may be a relationship with another variable that is not appropriate for CC. This is great point to discuss how not all relationships can be described using linear regression. You can easily see a line forming for this dataset but not necessarily for the last dataset (slide 5). Why is the reason for the difference? Answer…This plot shows ice cream sales increasing with increasing temperature, but when the temperature decreases so does the number of sales.
  • #8: Instructor notes: Show the basic equation of a line, and maybe also the equation of this particular line of best fit. Possible Questions for the class: What are correlations that are not causation? For every degree of temp increase, how many more ice-cream cones does she sell? Emphases that the regression line is a mean. You cannot really sell 3.8 cones. The equation indicates that for every degree increase in temperature, an average of 3.8 ice cream cones are sold.
  • #9: Instructor notes: Warn that R2 = 1 is suspicious, and if you see that you should look more closely The R^2 coefficient shows us whether the model we’ve chosen explains what’s going on in the data. With an R^2 of 0.23, the model doesn’t tell us much. With an R^2 of 0.99, our model explains pretty much entirely what’s going on. Question: What is causing the differences in r2?
  • #10: Instructor notes: Show R2 of Rita’s data Have students write a figure caption that defines the axis, describes the general trends, looks for inflection points, outliers and discusses the overall significance.
  • #11: Instructor notes: Discuss the topic of correlation and causation. In this case there may be some causation. What would allow you to claim causation? It is where there is a mechanism that allows us to explain mechanism. In what situation would you use a regression analysis? A linear regression can show the variation between the X and Y variable on their respective axis - that an increase in temperature can cause more people to crave cool ice cream during hot weather. In regression, if the R2 is 0.64 it means 64% of the variability in ice cream sales can be explained by temperature. Or it can be thought that ice cream sales are 64% dependent on temperature. Does it make sense to postulate the opposite cause? Does it get hot in response to high ice cream sales? No - The same cannot be said if the variables were reverse. The R2 will no longer equal 0.64 and the causal relationship would not make sense, for it would show that greater ice cream sales is CAUSED by hotter weather. The same value in either direction can be postulated by a cause by using Correlation Coefficient.
  • #12: Citation: Gravetta, Frederick J and Wallnau, Larry B. Statistics for the Behavioral Sciences. Cengage Learning, 2015. 10th Edition. Chapter 6, 165-169. Print.