SlideShare a Scribd company logo
FSE 200
Adkins Page 1 of 10
Simple Linear Regression
Correlation only measures the strength and direction of the
linear relationship between two quantitative variables. If the
relationship is linear, then we would like to try to model that
relationship with the equation of a line. We will use a
regression line to describe the relationship between an
explanatory variable and a response variable.
A regression line is a straight line that describes how a response
variable y changes as an explanatory variable x changes. We
often use a regression line to predict the value of y for a given
value of x.
Ex. It has been suggested that there is a relationship between
sleep deprivation of employees and the ability to complete
simple tasks. To evaluate this hypothesis, 12 people were asked
to solve simple tasks after having been without sleep for 15, 18,
21, and 24 hours. The sample data are shown below.
Subject
Hours without sleep, x
Tasks completed, y
1
15
13
2
15
9
3
15
15
4
18
8
5
18
12
6
18
10
7
21
5
8
21
8
9
21
7
10
24
3
11
24
5
12
24
4
Draw a scatterplot and describe the relationship. Lay a straight-
edge on top of the plot and move it around until you find what
you think might be a “line of best fit.” Then try to predict the
number of tasks completed for someone having been without
sleep 16 hours.
Was your line the same as that of the classmate sitting next to
you? Probably not. We need a method that we can use to find
the “best” regression line to use for prediction. The method we
will use is called least-squares. No line will pass exactly
through all the points in the scatterplot. When we use the line to
predict a y for a given x value, if there is a data point with that
same x value, we can compute the error (residual):
Our goal is going to be to make the vertical distances from the
line as small as possible. The most commonly used method for
doing this is the least-squares method.
The least-squares regression line of y on x is the line that makes
the sum of the squares of the vertical distances of the data
points from the line as small as possible.
Equation of the Least-Squares Regression Line
· Least-Squares Regression Line:
· Slope of the Regression Line:
· Intercept of the Regression Line:
Generally, regression is performed using statistical software.
Clearly, given the appropriate information, the above formulas
are simple to use.
Once we have the regression line, how do we interpret it, and
what can we do with it?
The slope of a regression line is the rate of change, that amount
of change in when x increases by 1.
The intercept of the regression line is the value of when x = 0.
It is statistically meaningful only when x can take on values
that are close to zero.
To make a prediction, just substitute an x-value into the
equation and find .
To plot the line on a scatterplot, just find a couple of points on
the regression line, one near each end of the range of x in the
data. Plot the points and connect them with a line. Again, this is
something that can be done using statistical software.
Ex. Use Excel to find the equation of the least-squares
regression line for the sleep deprivation data in the previous
example.
· Click Data -> Data Analysis -> Regression -> OK
· Input the cells of the response variable y in the Input Y Range
box.
· Input the cells of the explanatory variable x in the Input X
Range box.
· If you included variable names in the Input X and Y Range
boxes, check the Labels box.
· Input the cells you would like to display the output in the
Output Range box.
· Click OK.
a. State the equation of the least-squares regression line.
b. Identify and interpret the slope.
c. Identify and interpret the intercept.
d. Use the least-squares regression equation to predict the
number of tasks completed for an employee that has been
without sleep for 16 hours.
Facts About Least-Squares Regression
· In regression, the distinction between explanatory and
response variables is very important. When we computed the
correlation coefficient r, it did not matter which variable was x
and which was y, r would be the same. However, if you perform
a regression analysis on a data set and then swap x and y and
perform another regression analysis, the results will not be the
same.
· There is a connection between the correlation coefficient r and
the slope of the least-squares line:
A change of one standard deviation in x corresponds to a change
of r standard deviations in y.
· The least squares regression line always passes through the
point () on the graph of y versus x.
· The square of the correlation, , known as the coefficient of
determination, is the proportion of variation in y that can be
explained by the least-squares regression of y on x.
Note that 0 < r2 < 1. The closer r2 is to 1, the better your
regression line is at modeling the relationship between x and y.
We usually state r2 as a percentage.
Note:
· A statistical analysis package can find r2 for you; if only r is
given on the output, square it.
· If r2 is given on the output and you want to find r, take the
square root of r2 and look at the slope of the regression line to
determine the sign; r and b will have the same sign.
Ex. Refer to the Excel output from the sleep deprivation data.
Find r2 and interpret it. Then find r.
We know that in practical applications, we are not going to be
so lucky as to have all of our data points falling exactly on a
line.
A residualis the difference between an observed value of the
response variable and the value predicted by the regression line.
That is,
Ex. Find the residual for Subject 1 in our sleep deprivation data.
We could compute a residual for each observation in the data.
Note that the mean of the least-squares residuals is always zero.
It is a good idea to examine the residuals because they can tell
us something about how appropriate our linear model is.
A residual plot is a scatterplot of the regression residuals versus
the explanatory variable, x. Residual plots help us to assess the
fit of a regression line.
· If the regression line does a good job describing the overall
relationship between x and y, the residuals should have no
systematic pattern.
When you examine a residual plot, here are some things you
should consider:
· Generally, a horizontal line is drawn at zero.
· A curved pattern tells you that the relationship is not linear;
therefore, linear regression is not an appropriate method of
analysis.
· Increasing or decreasing spread about the line (at zero) as x
increases may indicate that the prediction of y for certain values
of x will be less accurate.
· Individual points with large residuals or outliers in the y
direction can greatly affect your analysis. (Check data entry,
etc.)
· Individual points that are extreme in the x direction may not
have large residuals, but they may still have quite an impact on
the analysis.
The last two points above lead us to a discussion of outliers and
influential points.
An outlier is an observation that lies outside the overall pattern
of the other observations.
An observation is influential for a statistical calculation if
removing it would significantly change the result of the
calculation. Points that are outliers in the x direction are often
influential for the least-squares regression line.
Ex. Use Excel to obtain the residual plot for the sleep
deprivation data. Analyze the output.
· In the Regression dialog box, check the Residual Plots box.
Cautions about Correlation and Regression
Some things to remember:
· Always plot your data first!
· The correlation and regression we have been studying should
be used only to describe linear relationships.
· The correlation coefficient r and least-squares regression are
not resistant; just one influential observation can greatly affect
your analysis.
Let’s discuss a few more things of which you should be aware
with regard to correlation and regression.
Extrapolation is the use of a regression line for prediction far
outside the range of values of the explanatory variable x that
you used to obtain the line. Such predictions may not be
accurate!
Extrapolation can be dangerous!
Ex. Refer to the sleep-deprivation example. Do you think it
would be appropriate to use our least-squares regression
equation to make predictions for a person who has gone without
sleep for 40 hours?
Generally two variables won’t exist by themselves in a vacuum,
so to speak. Often we may be interested in more than two
variables. Sometimes there are variables floating around in the
background that are influencing the variables of interest, but we
may not even have considered these background variables.
A lurking variable (or extraneous variable) is a variable that has
an important effect on the relationship among the variables in a
study but is not included among the variables studied.
A lurking variable could make it falsely appear that two other
variables have a strong relationship. A lurking variable could
also mask or hide a relationship that is really there.
Ex. Suppose that someone notices that as the number of
churches in town increases, the liquor sales also go up. Is there
a lurking variable that might explain this relationship?
ASSOCIATION DOES NOT IMPLY CAUSATION!
An association between an explanatory variable x and a
response variable y, even if it is very strong, is not by itself
good evidence that changes in x actually cause changes in y.
While our goal may often be to show that changes in the
explanatory variable cause changes in the response variable,
sometimes an observed association really is due to cause and
effect, but many times it is not. There may be a lurking variable
that is causing a common response in x and y, or maybe both the
lurking variable and x are causing changes in y so that their
effects are confounded.
Ex. Does having more cars make you live longer? A serious
study once found that people with two cars live longer than
people who own only one car. Owning three cars is even better,
and so on. There is a substantial positive correlation between
number of cars x and length of life y.
A basic meaning of causation is that by changing x we can bring
about a change in y. Could we lengthen our lives by buying
more cars?
How can we tell, then, if we have a cause-and-effect
relationship? A well-designed experiment is the best way to
determine causation; we will discuss experiments later. Many
times it is not possible to do an experiment. In the absence of an
experiment, what should we examine to determine causation?
· The association is strong.
· The association is consistent.
· Higher doses are associated with stronger responses.
· The alleged cause precedes the effect in time.
· The alleged cause is plausible.
Ex. Does smoking cause lung cancer? Despite the difficulties, it
is sometimes possible to build a strong case for causation in the
absence of experiments. The evidence that smoking causes lung
cancer is about as strong as nonexperimental evidence can be.
Doctors have long observed that most lung cancer patients were
smokers. Comparison of smokers and “similar” nonsmokers
showed a very strong association between smoking and death
from lung cancer. Could the association be explained by lurking
variables? Might there be, for example, a genetic factor that
predisposes people to both nicotine addiction and to lung
cancer? Smoking and lung cancer would then be positively
associated even if smoking had no direct effect on the lungs.
How do we overcome these objections?
· The association is strong. The association between smoking
and lung cancer is very strong.
· The association is consistent. Many studies of different kinds
of people in many countries link smoking to lung cancer. That
reduces the chances that a lurking variable specific to one group
or one study explains the association.
· Higher doses are associated with stronger responses. People
who smoke more cigarettes per day or who smoke over a longer
period of time get lung cancer more often. People who stop
smoking reduce their risk.
· The alleged cause precedes the effect in time. Lung cancer
develops after years of smoking. The number of men dying of
lung cancer rose as smoking became more common, with a lag
of about 30 years. Lung cancer kills more men than any other
form of cancer. Lung cancer was rare among women until
women began to smoke. Lung cancer in women rose along with
smoking again with a lag of about 30 years.
· The alleged cause is plausible. Experiments with animals show
that tars from cigarette smoke do cause cancer.
The evidence for causation is overwhelming-but it is still not as
strong as the evidence provided by well-designed experiments.
Hours without sleep, x Residual Plot
15.0 15.0 15.0 18.0 18.0 18.0 21.0 21.0 21.0 24.0 24.0 24.0
0.499999999999998 -3.500000000000002
2.499999999999998 -1.666666666666668
2.333333333333332 0.333333333333332 -1.833
333333333336 1.166666666666664 0.166666666666664 -
1.000000000000004 0.999999999999996 -3.5527136788005E-15
Hours without sleep, x
Residuals
CJ102: Criminology
Unit 8 Worksheet
Student Name:
_____________________________________________________
__
After completing the readings, answer the following questions:
PART I
1. What is a turning point?
2. What are the characteristics of low self-control or
impulsivity?
3. Define and differentiate adolescent limited and life course
persistent criminals.
Part II: Sex Crimes
1. What are the 7 goals of a primary interview with the rape
victim?
2. What method does the FBI use to determine the profile of the
offender in a sex crime?
3. What is the importance of the profile in helping solve the
crime?
Part III: Burglary
1. What are the common methods in which burglars gain entry
into a residence or building?
2. Describe the primary characteristics of suspect(s) in burglary
cases.
3. How are burglaries and sex crimes related?
© Kaplan University
FSE 200
Adkins Page 1 of 8
Scatterplots and Correlation
So far we have been examining one variable at a time. In
practice, we often want to look at several variables at once. In
this chapter, we will specifically consider how to analyze two
quantitative variables.
A response variable measures an outcome of a study.
An explanatory variable may explain or influence changes in a
response variable.
Ex. Suppose that individuals are given different amounts of
alcohol, and then reaction times for a particular activity are
measured.
Often explanatory variables are called independent variables,
and response variables are called dependent variables.
Note that a cause-and-effect relationship may or may not exist,
but we cannot determine causality.
Two variables measured on the same individual are associated if
some values of one variable tend to occur with some values of
the second variable more than with other values of that variable.
Ex.
Displaying Relationships: Scatterplots
A scatterplot shows the relationship between two quantitative
variables measured on the same individuals. The values of one
variable appear on the horizontal or x axis, and the values of the
other variable appear on the vertical or y axis. Each individual
in the data appears as a point in the plot.
If there is an explanatory variable and a response variable, the
explanatory variable goes on the horizontal axis and the
response variable on the vertical axis. If such a distinction
cannot be made, ten either variable can go on either axis.
To interpret a scatterplot, look for the overall pattern and for
striking deviations from that pattern. To describe the overall
pattern, look at the (1) form, (2) direction, and (3) strength of
the relationship. Also look for any outliers.
Two variables are positively associated when above-average
values of one tend to accompany above-average values of the
other, and below-average values also tend to occur together.
Ex. In a large group of people, there will be a positive
association between height and weight.
Two variables are negatively associated when above-average
values of one tend to accompany below-average values of the
other, and vice versa.
Ex. In a large group of people, there will be a negative
association between packs of cigarettes smoked and length of
life.
Ex. Create a scatterplot to show the relationship between yearly
average temperature and number of fires and yearly average
temperature and area burned.
year
average temperature (oF)
number of fires
acres burned (in millions)
2000
54.52
92250
7.39
2001
52.19
84079
3.57
2002
53.74
73457
7.18
2003
53.1
63629
3.96
2004
53.6
65461
8.1
2005
53.08
66753
8.69
2006
54.38
96385
9.87
2007
53.43
85705
9.33
2008
53.04
78979
5.29
2009
52.83
78792
5.92
2010
52.06
71971
3.42
2011
52.82
74126
8.71
Source: fire data from http://wildland-fires.sciencedaily.com/#
temperature data from
http://www.ncdc.noaa.gov/temp-and-precip/time-
series/index.php?parameter=tmp&month=5&year=2000&filter=
12&state=110&div=0
In Excel, highlight the two variables of interest. Click Insert ->
Scatter and select the appropriate chart type.
Ex. Fuel used vs. Speed
How does the fuel consumption of a car change as its speed
increases?
Speed vs. fuel consumption per 100 km travelled for British
Ford Escort
Describe the form of the relationship. Explain why the form
makes sense.
Does it make sense to describe the variables as either positively
or negatively associated? Why?
Measuring Linear Association: Correlation
We will look at one numerical measure of association, the
correlation coefficient. Technically, correlation only makes
sense when both variables are quantitative.
The correlation describes the direction and strength of a linear
relationship between two quantitative variables. The correlation
coefficient is usually written as r, the Pearson product-moment
correlation coefficient.
Now let’s learn how to calculate r. We will compute r based
upon n observations on variables x and y: and . We denote
this rXY, the correlation between X and Y.
Each observation is an ordered pair (). For example, and might
be my age and my number of college hours earned.
Calculating the correlation coefficient
1. List the two values for each individual.
2. Compute the sum of X values, and compute the sum of Y
values.
3. Square the X values.
4. Square the Y values.
5. Find the sum of the XY products.
6. Plug these values into the formula.
Ex. Calculate rXY by hand.
X
Y
X2
Y2
XY
4
6
7
4
10
2
3
8
2
9
Σ
Ex. Using Excel, find r for yearly average temperature and
number of fires and yearly average temperature and area burned.
Note: The columns for the variables of interest must be next to
each other.
· Using the CORREL function: In the cell you want to display
the correlation coefficient, type = CORREL(array1, array2).
· array1 contains data for the X variable
· array2 contains data for the Y variable
· Using the Analysis ToolPak:
· click Data -> Data Analysis -> Correlation
· In the Input Range box, input the cells that contain data for
both variables.
· Make sure the Grouped By: Columns option is selected if your
data are grouped in columns.
· If you include the variable name in the first column, check the
box next to Labels in first row.
· In the Output Range box, input the cell you wish to display the
output.
· Click OK.
Facts about r:
1. Positive r indicates positive correlation between the
variables, and negative r indicates negative correlation.
2. The correlation coefficient r always falls between -1 and 1,
that is, .
3. The extreme values r = -1 and r = 1 indicate perfect straight-
line (linear) association.
4. The correlation between x and y does not change when we
change the units of measurement of x, y, or both; r has no units.
5. Correlation ignores the distinction between explanatory and
response variables.
6. Correlation measures the strength of only linear association
between two variables.
7. Like the mean and standard deviation, r is strongly affected
by a few outliers; in other words, r is not resistant.
8. Correlation only makes sense for quantitative variables. We
can talk about the relationship or association between gender of
voters and political party, but not of the correlation between
these variables.
9. Note that correlation is not a complete description of
bivariate (two-variable) data. State the means and standard
deviations of both x and y along with the correlation.
Interpreting Correlation Coefficients
Size of the Correlation
Coefficient Interpretation
.8 to 1.0
Very strong relationship
.6 to .8
Strong relationship
.4 to .6
Moderate relationship
.2 to .4
Weak relationship
.0 to .2
Weak or no relationship
Types of Correlation and Relationships
What Happens to Variable X
What Happens to Variable Y
Type of Correlation
Value
Example
X increases in value
Y increases in value
Direct or positive
Positive, ranging from 0 to +1
The more time you spend studying, the higher your test score
will be
X decreases in value
Y decreases in value
Direct or positive
Positive, ranging from 0 to +1
The less money you put in the bank, the less interest you will
earn.
X increases in value
Y decreases in value
Indirect or negative
Negative, ranging from -1 to 0
The more you exercise, the less you will weigh.
X decreases in value
Y increases in value
Indirect or negative
Negative, ranging from -1 to 0
The less time you take to complete a test, the more you’ll get
wrong.
Types of Measurement of Correlation
Variable X
Variable Y
Type of Correlation coefficient
Correlation being computed
Nominal (voting preference, such as Democrat or Republican)
Nominal (sex, such as male or female)
Phi coefficient
The correlation between voting preference and sex.
Nominal (social class, such as high, medium, or low)
Ordinal (rank in high school graduating class)
Rank biserial coefficient
The correlation between social class and rank in high school.
Nominal (family configuration, such as intact or single parent)
Interval (grade point average)
Point biserial
The correlation between family configuration and grade point
average.
Ordinal (height converted to rank)
Ordinal (weight converted to rank)
Spearman rank coefficient
The correlation between height and weight.
Interval (number of problems solved)
Interval (age in years)
Pearson product-moment correlation coefficient
The correlation between a number of problems solved and age
in years.
Remember:
· Plot your data first.
· Look at each variable separately first, then study relationships
between variables.
Sheet1FSE 200Homework Assignment 425
PointsDirections:Complete all questions below. Print out and
submit this assignment in HARD COPY on the due date listed in
the syllabus.Using the following data, determine whether or not
the square footage of a particular fire station has an effect on
the turnout time for the firefighters.(This data is recreated from
an EFO paper by Michael E. Dell'Orfano.)Completion of Chart
(4 points)StationSquare FootageTurnout Time
(in minutes)Area SquaredTime SquaredArea *
Time3138421.8232115721.863368022.1334185002.3735192321.
923645002.3337670023830932.023992292.224060942.4241151
302.444275232.2243150003.184490002.184596472.3546151302
.5SumMean (1 point)Correlation Coefficient
(1 point)Coefficient of Determination
(1 point)SD (1 point)What is the Dependent Variable (1
point)?What is the Independent Variable (1 point)?Place a
scatterplot of the data below (2 points). Show the trendline as
well. Remember to include titles and labels.What does this data
tell you about the relationship (2 points)?State the equation of
the least-squares regression line (1 point).Identify and interpret
the slope (2 points).Identify and interpret the intercept (2
points).If appropriate, use the least-squares regression equation
to predict the turnout time for a 12,000 square foot fire station
(2 points).If appropriate, use the least-squares regression
equation to predict the turnout time for a 25,000 square foot fire
station (2 points).Obtain the residual plot and analyze the
output (2 points).
Sheet1FSE 200Homework Assignment 425
PointsDirections:Complete all questions below. Save the file
containing your solutions andsubmit electronically under
Assignments -> Assignment 4 in Blackboard.Using the
following data, determine whether or not the square footage of a
particular fire station has an effect on the turnout time for the
firefighters.(This data is recreated from an EFO paper by
Michael E. Dell'Orfano.)Completion of Chart (4
points)StationSquare FootageTurnout Time
(in minutes)Area SquaredTime SquaredArea *
Time3138421.82147609643.31246992.4432115721.8613391118
43.459621523.923368022.13462672044.536914488.2634185002
.373422500005.61694384535192321.923698698243.686436925.
443645002.33202500005.4289104853767002448900004134003
830932.0295666494.08046247.863992292.22851744414.928420
488.384060942.42371368365.856414747.4841151302.44228916
9005.953636917.24275232.22565955294.928416701.064315000
3.1822500000010.1124477004490002.18810000004.752419620
4596472.35930646095.522522670.4546151302.52289169006.25
37825Sum16099435.96201757104082.4256370577.49Mean (1
point)10062.132.25Correlation Coefficient
(1 point)Coefficient of Determination
(1 point)SD (1 point)5148.650.330.3460.120What is the
Dependent Variable (1 point)?What is the Independent Variable
(1 point)?Place a scatterplot of the relationship of turnout time
and square footage below (2 points). Show the trendline as
well. Remember to include titles and labels.What does this data
tell you about the relationship (2 points)?State the equation of
the least-squares regression line (1 point).Identify and interpret
the slope (2 points).Identify and interpret the intercept (2
points).If appropriate, use the least-squares regression equation
to predict the turnout time for a 12,000 square foot fire station
(2 points).If appropriate, use the least-squares regression
equation to predict the turnout time for a 25,000 square foot fire
station (2 points).Obtain the residual plot and analyze the
output (2 points).RESIDUAL OUTPUTObservationPredicted
Turnout Time
(in minutes)Residuals12.1107255995-
0.290725599522.2807006588-0.420700658832.1758130737-
0.045813073742.4330405308-0.063040530852.4491364873-
0.529136487362.12519436910.204805630972.1735701946-
0.173570194682.0942558299-0.074255829992.2291804048-
0.0091804048102.16024485360.2597551464112.35893756190.0
810624381122.19166715110.0283328489132.35607899040.823
9210096142.2241449211-
0.0441449211152.23837181160.1116281884162.35893756190.1
410624381
Turnout time (in minutes) for the firefighters is dependent
variable.
Scatter Plot
3842.0 11572.0 6802.0 18500.0 19232.0 4500.0
6700.0 3093.0 9229.0 6094.0 15130.0 7523.0
15000.0 9000.0 9647.0 15130.0 1.82 1.86 2.13
2.37 1.92 2.33 2.0 2.02 2.22 2.42 2.44 2.22 3.18 2.18
2.35 2.5
Area (Square footage)
Turnout time
Square Footage Residual Plot
3842.0 11572.0 6802.0 18500.0 19232.0 4500.0
6700.0 3093.0 9229.0 6094.0 15130.0 7523.0
15000.0 9000.0 9647.0 15130.0 -
0.290725599547464 -0.420700658810437 -
0.0458130737283695 -0.0630405308122319 -
0.529136487265078 0.204805630854213 -0.173570194550514
-0.0742558298983091 -0.00918040475440218
0.259755146447334 0.0810624381031908
0.028332848945809 0.823921009604379 -
0.0441449211100013 0.111628188418699
0.141062438103191
Square Footage
Residuals
Area (Square footage) of a fire station is independent variable.
This data tells us that there is weak positive relationship
between square footage area and turnout time for firefighters.
Hence as square footage area increases, the turnout time for
firefighters also increases to smaller extent.
The equation of the least-squares regression line is given below.
Turnout time = 2.026 + 0.000021*Square footage
The slope of regression line is 0.000021.
The slope indicates that when there is one square foot increase
in area of a fire station, predicted turnout time for firefighters
increases by 0.000021 minutes.
The intercept of regression line is 2.026.
The intercept indicates that when area of fire station is 0 square
foot, predicted turnout time for firefighters is approximately
2.026 minutes.
The least-squares regression equationis given below.
Turnout time = 2.026 + 0.000021*Square footage
Turnout time = 2.026 + 0.000021*12,000 = 2.29 minutes
Hence predicted turnout time for a 12,000 square foot fire
station is 2.29 minutes.
Least-squares regression equation is not useful to predict the
turnout time for a 25,000 square foot fire station because value
of x (25,000) is outlier for given range of area.
Residual plot indicates that all the residuals are randomly
distributed and there is no any pattern observed.
Hence normality assumption is satisfied.

More Related Content

PPTX
Correlation and regression
PPT
Exploring bivariate data
PPT
Simple Linear Regression.pptSimple Linear Regression.ppt
PPT
Lesson 2 2
PDF
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
PPTX
Correlation and regression
PPT
2-20-04.ppt
PPT
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
Correlation and regression
Exploring bivariate data
Simple Linear Regression.pptSimple Linear Regression.ppt
Lesson 2 2
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Correlation and regression
2-20-04.ppt
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh

Similar to FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx (20)

PPTX
DrSoomro_2588_20292_1_Lecture 9 (1).pptx
PDF
Introduction to correlation and regression analysis
PPT
Chapter 10
PPT
Chapter 10
PDF
Regression Analysis-Machine Learning -Different Types
PPTX
LINEAR REGRESSION ANALYSIS.pptx
PPTX
Linear regression.pptx
DOCX
Two-Variable (Bivariate) RegressionIn the last unit, we covered
PPTX
2. diagnostics, collinearity, transformation, and missing data
PPT
Chap04 01
PPT
Math n Statistic
PPTX
Simple linear regression
DOCX
Requirements.docxRequirementsFont Times New RomanI NEED .docx
PPTX
REGRESSION ANALYSIS THEORY EXPLAINED HERE
PDF
Data Science - Part IV - Regression Analysis & ANOVA
PDF
Simple linear regression
PPT
Chapter05
PPTX
simple and multiple linear Regression. (1).pptx
PPT
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
PPT
Notes Ch8
DrSoomro_2588_20292_1_Lecture 9 (1).pptx
Introduction to correlation and regression analysis
Chapter 10
Chapter 10
Regression Analysis-Machine Learning -Different Types
LINEAR REGRESSION ANALYSIS.pptx
Linear regression.pptx
Two-Variable (Bivariate) RegressionIn the last unit, we covered
2. diagnostics, collinearity, transformation, and missing data
Chap04 01
Math n Statistic
Simple linear regression
Requirements.docxRequirementsFont Times New RomanI NEED .docx
REGRESSION ANALYSIS THEORY EXPLAINED HERE
Data Science - Part IV - Regression Analysis & ANOVA
Simple linear regression
Chapter05
simple and multiple linear Regression. (1).pptx
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
Notes Ch8
Ad

More from budbarber38650 (20)

DOCX
 Assignment 1 Discussion Question Prosocial Behavior and Altrui.docx
DOCX
● what is name of the new unit and what topics will Professor Moss c.docx
DOCX
…Multiple intelligences describe an individual’s strengths or capac.docx
DOCX
• World Cultural Perspective Paper Final SubmissionResources.docx
DOCX
•       Write a story; explaining and analyzing how a ce.docx
DOCX
•Use the general topic suggestion to form the thesis statement.docx
DOCX
•The topic is culture adaptation ( adoption )16 slides.docx
DOCX
•Choose 1 of the department work flow processes, and put together a .docx
DOCX
‘The problem is not that people remember through photographs, but th.docx
DOCX
·                                     Choose an articleo.docx
DOCX
·You have been engaged to prepare the 2015 federal income tax re.docx
DOCX
·Time Value of MoneyQuestion A·Discuss the significance .docx
DOCX
·Reviewthe steps of the communication model on in Ch. 2 of Bus.docx
DOCX
·Research Activity Sustainable supply chain can be viewed as.docx
DOCX
·DISCUSSION 1 – VARIOUS THEORIES – Discuss the following in 150-.docx
DOCX
·Module 6 Essay ContentoThe ModuleWeek 6 essay require.docx
DOCX
·Observe a group discussing a topic of interest such as a focus .docx
DOCX
·Identify any program constraints, such as financial resources, .docx
DOCX
·Double-spaced·12-15 pages each chapterThe followi.docx
DOCX
© 2019 Cengage. All Rights Reserved. Linear RegressionC.docx
 Assignment 1 Discussion Question Prosocial Behavior and Altrui.docx
● what is name of the new unit and what topics will Professor Moss c.docx
…Multiple intelligences describe an individual’s strengths or capac.docx
• World Cultural Perspective Paper Final SubmissionResources.docx
•       Write a story; explaining and analyzing how a ce.docx
•Use the general topic suggestion to form the thesis statement.docx
•The topic is culture adaptation ( adoption )16 slides.docx
•Choose 1 of the department work flow processes, and put together a .docx
‘The problem is not that people remember through photographs, but th.docx
·                                     Choose an articleo.docx
·You have been engaged to prepare the 2015 federal income tax re.docx
·Time Value of MoneyQuestion A·Discuss the significance .docx
·Reviewthe steps of the communication model on in Ch. 2 of Bus.docx
·Research Activity Sustainable supply chain can be viewed as.docx
·DISCUSSION 1 – VARIOUS THEORIES – Discuss the following in 150-.docx
·Module 6 Essay ContentoThe ModuleWeek 6 essay require.docx
·Observe a group discussing a topic of interest such as a focus .docx
·Identify any program constraints, such as financial resources, .docx
·Double-spaced·12-15 pages each chapterThe followi.docx
© 2019 Cengage. All Rights Reserved. Linear RegressionC.docx
Ad

Recently uploaded (20)

PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
master seminar digital applications in india
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Classroom Observation Tools for Teachers
PPTX
Pharma ospi slides which help in ospi learning
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Computing-Curriculum for Schools in Ghana
O7-L3 Supply Chain Operations - ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pre independence Education in Inndia.pdf
master seminar digital applications in india
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
01-Introduction-to-Information-Management.pdf
Cell Types and Its function , kingdom of life
VCE English Exam - Section C Student Revision Booklet
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Classroom Observation Tools for Teachers
Pharma ospi slides which help in ospi learning
GDM (1) (1).pptx small presentation for students
Module 4: Burden of Disease Tutorial Slides S2 2025
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx

  • 1. FSE 200 Adkins Page 1 of 10 Simple Linear Regression Correlation only measures the strength and direction of the linear relationship between two quantitative variables. If the relationship is linear, then we would like to try to model that relationship with the equation of a line. We will use a regression line to describe the relationship between an explanatory variable and a response variable. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Ex. It has been suggested that there is a relationship between sleep deprivation of employees and the ability to complete simple tasks. To evaluate this hypothesis, 12 people were asked to solve simple tasks after having been without sleep for 15, 18, 21, and 24 hours. The sample data are shown below. Subject Hours without sleep, x Tasks completed, y 1 15 13 2 15 9 3 15 15 4 18 8
  • 2. 5 18 12 6 18 10 7 21 5 8 21 8 9 21 7 10 24 3 11 24 5 12 24 4 Draw a scatterplot and describe the relationship. Lay a straight- edge on top of the plot and move it around until you find what you think might be a “line of best fit.” Then try to predict the number of tasks completed for someone having been without sleep 16 hours.
  • 3. Was your line the same as that of the classmate sitting next to you? Probably not. We need a method that we can use to find the “best” regression line to use for prediction. The method we will use is called least-squares. No line will pass exactly through all the points in the scatterplot. When we use the line to predict a y for a given x value, if there is a data point with that same x value, we can compute the error (residual): Our goal is going to be to make the vertical distances from the line as small as possible. The most commonly used method for doing this is the least-squares method. The least-squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Equation of the Least-Squares Regression Line · Least-Squares Regression Line: · Slope of the Regression Line: · Intercept of the Regression Line: Generally, regression is performed using statistical software. Clearly, given the appropriate information, the above formulas are simple to use. Once we have the regression line, how do we interpret it, and what can we do with it? The slope of a regression line is the rate of change, that amount of change in when x increases by 1. The intercept of the regression line is the value of when x = 0. It is statistically meaningful only when x can take on values that are close to zero. To make a prediction, just substitute an x-value into the equation and find . To plot the line on a scatterplot, just find a couple of points on the regression line, one near each end of the range of x in the data. Plot the points and connect them with a line. Again, this is something that can be done using statistical software.
  • 4. Ex. Use Excel to find the equation of the least-squares regression line for the sleep deprivation data in the previous example. · Click Data -> Data Analysis -> Regression -> OK · Input the cells of the response variable y in the Input Y Range box. · Input the cells of the explanatory variable x in the Input X Range box. · If you included variable names in the Input X and Y Range boxes, check the Labels box. · Input the cells you would like to display the output in the Output Range box. · Click OK. a. State the equation of the least-squares regression line. b. Identify and interpret the slope. c. Identify and interpret the intercept. d. Use the least-squares regression equation to predict the number of tasks completed for an employee that has been without sleep for 16 hours. Facts About Least-Squares Regression · In regression, the distinction between explanatory and
  • 5. response variables is very important. When we computed the correlation coefficient r, it did not matter which variable was x and which was y, r would be the same. However, if you perform a regression analysis on a data set and then swap x and y and perform another regression analysis, the results will not be the same. · There is a connection between the correlation coefficient r and the slope of the least-squares line: A change of one standard deviation in x corresponds to a change of r standard deviations in y. · The least squares regression line always passes through the point () on the graph of y versus x. · The square of the correlation, , known as the coefficient of determination, is the proportion of variation in y that can be explained by the least-squares regression of y on x. Note that 0 < r2 < 1. The closer r2 is to 1, the better your regression line is at modeling the relationship between x and y. We usually state r2 as a percentage. Note: · A statistical analysis package can find r2 for you; if only r is given on the output, square it. · If r2 is given on the output and you want to find r, take the square root of r2 and look at the slope of the regression line to determine the sign; r and b will have the same sign. Ex. Refer to the Excel output from the sleep deprivation data. Find r2 and interpret it. Then find r. We know that in practical applications, we are not going to be so lucky as to have all of our data points falling exactly on a line. A residualis the difference between an observed value of the response variable and the value predicted by the regression line.
  • 6. That is, Ex. Find the residual for Subject 1 in our sleep deprivation data. We could compute a residual for each observation in the data. Note that the mean of the least-squares residuals is always zero. It is a good idea to examine the residuals because they can tell us something about how appropriate our linear model is. A residual plot is a scatterplot of the regression residuals versus the explanatory variable, x. Residual plots help us to assess the fit of a regression line. · If the regression line does a good job describing the overall relationship between x and y, the residuals should have no systematic pattern. When you examine a residual plot, here are some things you should consider: · Generally, a horizontal line is drawn at zero. · A curved pattern tells you that the relationship is not linear; therefore, linear regression is not an appropriate method of analysis. · Increasing or decreasing spread about the line (at zero) as x increases may indicate that the prediction of y for certain values of x will be less accurate. · Individual points with large residuals or outliers in the y direction can greatly affect your analysis. (Check data entry, etc.) · Individual points that are extreme in the x direction may not have large residuals, but they may still have quite an impact on the analysis. The last two points above lead us to a discussion of outliers and influential points. An outlier is an observation that lies outside the overall pattern
  • 7. of the other observations. An observation is influential for a statistical calculation if removing it would significantly change the result of the calculation. Points that are outliers in the x direction are often influential for the least-squares regression line. Ex. Use Excel to obtain the residual plot for the sleep deprivation data. Analyze the output. · In the Regression dialog box, check the Residual Plots box. Cautions about Correlation and Regression Some things to remember: · Always plot your data first! · The correlation and regression we have been studying should be used only to describe linear relationships. · The correlation coefficient r and least-squares regression are not resistant; just one influential observation can greatly affect your analysis. Let’s discuss a few more things of which you should be aware with regard to correlation and regression. Extrapolation is the use of a regression line for prediction far outside the range of values of the explanatory variable x that you used to obtain the line. Such predictions may not be accurate! Extrapolation can be dangerous!
  • 8. Ex. Refer to the sleep-deprivation example. Do you think it would be appropriate to use our least-squares regression equation to make predictions for a person who has gone without sleep for 40 hours? Generally two variables won’t exist by themselves in a vacuum, so to speak. Often we may be interested in more than two variables. Sometimes there are variables floating around in the background that are influencing the variables of interest, but we may not even have considered these background variables. A lurking variable (or extraneous variable) is a variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied. A lurking variable could make it falsely appear that two other variables have a strong relationship. A lurking variable could also mask or hide a relationship that is really there. Ex. Suppose that someone notices that as the number of churches in town increases, the liquor sales also go up. Is there a lurking variable that might explain this relationship? ASSOCIATION DOES NOT IMPLY CAUSATION! An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. While our goal may often be to show that changes in the explanatory variable cause changes in the response variable, sometimes an observed association really is due to cause and effect, but many times it is not. There may be a lurking variable that is causing a common response in x and y, or maybe both the lurking variable and x are causing changes in y so that their
  • 9. effects are confounded. Ex. Does having more cars make you live longer? A serious study once found that people with two cars live longer than people who own only one car. Owning three cars is even better, and so on. There is a substantial positive correlation between number of cars x and length of life y. A basic meaning of causation is that by changing x we can bring about a change in y. Could we lengthen our lives by buying more cars? How can we tell, then, if we have a cause-and-effect relationship? A well-designed experiment is the best way to determine causation; we will discuss experiments later. Many times it is not possible to do an experiment. In the absence of an experiment, what should we examine to determine causation? · The association is strong. · The association is consistent. · Higher doses are associated with stronger responses. · The alleged cause precedes the effect in time. · The alleged cause is plausible. Ex. Does smoking cause lung cancer? Despite the difficulties, it is sometimes possible to build a strong case for causation in the absence of experiments. The evidence that smoking causes lung cancer is about as strong as nonexperimental evidence can be. Doctors have long observed that most lung cancer patients were smokers. Comparison of smokers and “similar” nonsmokers showed a very strong association between smoking and death from lung cancer. Could the association be explained by lurking variables? Might there be, for example, a genetic factor that predisposes people to both nicotine addiction and to lung cancer? Smoking and lung cancer would then be positively associated even if smoking had no direct effect on the lungs. How do we overcome these objections?
  • 10. · The association is strong. The association between smoking and lung cancer is very strong. · The association is consistent. Many studies of different kinds of people in many countries link smoking to lung cancer. That reduces the chances that a lurking variable specific to one group or one study explains the association. · Higher doses are associated with stronger responses. People who smoke more cigarettes per day or who smoke over a longer period of time get lung cancer more often. People who stop smoking reduce their risk. · The alleged cause precedes the effect in time. Lung cancer develops after years of smoking. The number of men dying of lung cancer rose as smoking became more common, with a lag of about 30 years. Lung cancer kills more men than any other form of cancer. Lung cancer was rare among women until women began to smoke. Lung cancer in women rose along with smoking again with a lag of about 30 years. · The alleged cause is plausible. Experiments with animals show that tars from cigarette smoke do cause cancer. The evidence for causation is overwhelming-but it is still not as strong as the evidence provided by well-designed experiments. Hours without sleep, x Residual Plot 15.0 15.0 15.0 18.0 18.0 18.0 21.0 21.0 21.0 24.0 24.0 24.0 0.499999999999998 -3.500000000000002 2.499999999999998 -1.666666666666668 2.333333333333332 0.333333333333332 -1.833 333333333336 1.166666666666664 0.166666666666664 - 1.000000000000004 0.999999999999996 -3.5527136788005E-15 Hours without sleep, x Residuals CJ102: Criminology Unit 8 Worksheet
  • 11. Student Name: _____________________________________________________ __ After completing the readings, answer the following questions: PART I 1. What is a turning point? 2. What are the characteristics of low self-control or impulsivity? 3. Define and differentiate adolescent limited and life course persistent criminals. Part II: Sex Crimes 1. What are the 7 goals of a primary interview with the rape victim? 2. What method does the FBI use to determine the profile of the offender in a sex crime? 3. What is the importance of the profile in helping solve the crime? Part III: Burglary 1. What are the common methods in which burglars gain entry into a residence or building? 2. Describe the primary characteristics of suspect(s) in burglary cases. 3. How are burglaries and sex crimes related? © Kaplan University FSE 200 Adkins Page 1 of 8 Scatterplots and Correlation So far we have been examining one variable at a time. In practice, we often want to look at several variables at once. In this chapter, we will specifically consider how to analyze two
  • 12. quantitative variables. A response variable measures an outcome of a study. An explanatory variable may explain or influence changes in a response variable. Ex. Suppose that individuals are given different amounts of alcohol, and then reaction times for a particular activity are measured. Often explanatory variables are called independent variables, and response variables are called dependent variables. Note that a cause-and-effect relationship may or may not exist, but we cannot determine causality. Two variables measured on the same individual are associated if some values of one variable tend to occur with some values of the second variable more than with other values of that variable. Ex. Displaying Relationships: Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal or x axis, and the values of the other variable appear on the vertical or y axis. Each individual in the data appears as a point in the plot. If there is an explanatory variable and a response variable, the explanatory variable goes on the horizontal axis and the response variable on the vertical axis. If such a distinction cannot be made, ten either variable can go on either axis. To interpret a scatterplot, look for the overall pattern and for striking deviations from that pattern. To describe the overall pattern, look at the (1) form, (2) direction, and (3) strength of the relationship. Also look for any outliers. Two variables are positively associated when above-average values of one tend to accompany above-average values of the other, and below-average values also tend to occur together.
  • 13. Ex. In a large group of people, there will be a positive association between height and weight. Two variables are negatively associated when above-average values of one tend to accompany below-average values of the other, and vice versa. Ex. In a large group of people, there will be a negative association between packs of cigarettes smoked and length of life. Ex. Create a scatterplot to show the relationship between yearly average temperature and number of fires and yearly average temperature and area burned. year average temperature (oF) number of fires acres burned (in millions) 2000 54.52 92250 7.39 2001 52.19 84079 3.57 2002 53.74 73457 7.18
  • 15. Source: fire data from http://wildland-fires.sciencedaily.com/# temperature data from http://www.ncdc.noaa.gov/temp-and-precip/time- series/index.php?parameter=tmp&month=5&year=2000&filter= 12&state=110&div=0 In Excel, highlight the two variables of interest. Click Insert -> Scatter and select the appropriate chart type. Ex. Fuel used vs. Speed How does the fuel consumption of a car change as its speed increases? Speed vs. fuel consumption per 100 km travelled for British Ford Escort Describe the form of the relationship. Explain why the form makes sense. Does it make sense to describe the variables as either positively or negatively associated? Why? Measuring Linear Association: Correlation We will look at one numerical measure of association, the correlation coefficient. Technically, correlation only makes sense when both variables are quantitative. The correlation describes the direction and strength of a linear relationship between two quantitative variables. The correlation coefficient is usually written as r, the Pearson product-moment correlation coefficient. Now let’s learn how to calculate r. We will compute r based upon n observations on variables x and y: and . We denote this rXY, the correlation between X and Y. Each observation is an ordered pair (). For example, and might
  • 16. be my age and my number of college hours earned. Calculating the correlation coefficient 1. List the two values for each individual. 2. Compute the sum of X values, and compute the sum of Y values. 3. Square the X values. 4. Square the Y values. 5. Find the sum of the XY products. 6. Plug these values into the formula. Ex. Calculate rXY by hand. X Y X2 Y2 XY 4 6 7 4 10 2
  • 18. Ex. Using Excel, find r for yearly average temperature and number of fires and yearly average temperature and area burned. Note: The columns for the variables of interest must be next to each other. · Using the CORREL function: In the cell you want to display the correlation coefficient, type = CORREL(array1, array2). · array1 contains data for the X variable · array2 contains data for the Y variable · Using the Analysis ToolPak: · click Data -> Data Analysis -> Correlation · In the Input Range box, input the cells that contain data for both variables. · Make sure the Grouped By: Columns option is selected if your data are grouped in columns. · If you include the variable name in the first column, check the box next to Labels in first row. · In the Output Range box, input the cell you wish to display the output. · Click OK. Facts about r:
  • 19. 1. Positive r indicates positive correlation between the variables, and negative r indicates negative correlation. 2. The correlation coefficient r always falls between -1 and 1, that is, . 3. The extreme values r = -1 and r = 1 indicate perfect straight- line (linear) association. 4. The correlation between x and y does not change when we change the units of measurement of x, y, or both; r has no units. 5. Correlation ignores the distinction between explanatory and response variables. 6. Correlation measures the strength of only linear association between two variables. 7. Like the mean and standard deviation, r is strongly affected by a few outliers; in other words, r is not resistant. 8. Correlation only makes sense for quantitative variables. We can talk about the relationship or association between gender of voters and political party, but not of the correlation between these variables. 9. Note that correlation is not a complete description of bivariate (two-variable) data. State the means and standard deviations of both x and y along with the correlation. Interpreting Correlation Coefficients Size of the Correlation Coefficient Interpretation .8 to 1.0 Very strong relationship .6 to .8 Strong relationship .4 to .6 Moderate relationship .2 to .4 Weak relationship .0 to .2 Weak or no relationship
  • 20. Types of Correlation and Relationships What Happens to Variable X What Happens to Variable Y Type of Correlation Value Example X increases in value Y increases in value Direct or positive Positive, ranging from 0 to +1 The more time you spend studying, the higher your test score will be X decreases in value Y decreases in value Direct or positive Positive, ranging from 0 to +1 The less money you put in the bank, the less interest you will earn. X increases in value Y decreases in value Indirect or negative Negative, ranging from -1 to 0 The more you exercise, the less you will weigh. X decreases in value Y increases in value Indirect or negative Negative, ranging from -1 to 0 The less time you take to complete a test, the more you’ll get wrong. Types of Measurement of Correlation Variable X Variable Y Type of Correlation coefficient Correlation being computed Nominal (voting preference, such as Democrat or Republican)
  • 21. Nominal (sex, such as male or female) Phi coefficient The correlation between voting preference and sex. Nominal (social class, such as high, medium, or low) Ordinal (rank in high school graduating class) Rank biserial coefficient The correlation between social class and rank in high school. Nominal (family configuration, such as intact or single parent) Interval (grade point average) Point biserial The correlation between family configuration and grade point average. Ordinal (height converted to rank) Ordinal (weight converted to rank) Spearman rank coefficient The correlation between height and weight. Interval (number of problems solved) Interval (age in years) Pearson product-moment correlation coefficient The correlation between a number of problems solved and age in years. Remember: · Plot your data first. · Look at each variable separately first, then study relationships between variables. Sheet1FSE 200Homework Assignment 425 PointsDirections:Complete all questions below. Print out and submit this assignment in HARD COPY on the due date listed in the syllabus.Using the following data, determine whether or not the square footage of a particular fire station has an effect on
  • 22. the turnout time for the firefighters.(This data is recreated from an EFO paper by Michael E. Dell'Orfano.)Completion of Chart (4 points)StationSquare FootageTurnout Time (in minutes)Area SquaredTime SquaredArea * Time3138421.8232115721.863368022.1334185002.3735192321. 923645002.3337670023830932.023992292.224060942.4241151 302.444275232.2243150003.184490002.184596472.3546151302 .5SumMean (1 point)Correlation Coefficient (1 point)Coefficient of Determination (1 point)SD (1 point)What is the Dependent Variable (1 point)?What is the Independent Variable (1 point)?Place a scatterplot of the data below (2 points). Show the trendline as well. Remember to include titles and labels.What does this data tell you about the relationship (2 points)?State the equation of the least-squares regression line (1 point).Identify and interpret the slope (2 points).Identify and interpret the intercept (2 points).If appropriate, use the least-squares regression equation to predict the turnout time for a 12,000 square foot fire station (2 points).If appropriate, use the least-squares regression equation to predict the turnout time for a 25,000 square foot fire station (2 points).Obtain the residual plot and analyze the output (2 points). Sheet1FSE 200Homework Assignment 425 PointsDirections:Complete all questions below. Save the file containing your solutions andsubmit electronically under Assignments -> Assignment 4 in Blackboard.Using the following data, determine whether or not the square footage of a particular fire station has an effect on the turnout time for the firefighters.(This data is recreated from an EFO paper by Michael E. Dell'Orfano.)Completion of Chart (4 points)StationSquare FootageTurnout Time (in minutes)Area SquaredTime SquaredArea * Time3138421.82147609643.31246992.4432115721.8613391118 43.459621523.923368022.13462672044.536914488.2634185002 .373422500005.61694384535192321.923698698243.686436925.
  • 23. 443645002.33202500005.4289104853767002448900004134003 830932.0295666494.08046247.863992292.22851744414.928420 488.384060942.42371368365.856414747.4841151302.44228916 9005.953636917.24275232.22565955294.928416701.064315000 3.1822500000010.1124477004490002.18810000004.752419620 4596472.35930646095.522522670.4546151302.52289169006.25 37825Sum16099435.96201757104082.4256370577.49Mean (1 point)10062.132.25Correlation Coefficient (1 point)Coefficient of Determination (1 point)SD (1 point)5148.650.330.3460.120What is the Dependent Variable (1 point)?What is the Independent Variable (1 point)?Place a scatterplot of the relationship of turnout time and square footage below (2 points). Show the trendline as well. Remember to include titles and labels.What does this data tell you about the relationship (2 points)?State the equation of the least-squares regression line (1 point).Identify and interpret the slope (2 points).Identify and interpret the intercept (2 points).If appropriate, use the least-squares regression equation to predict the turnout time for a 12,000 square foot fire station (2 points).If appropriate, use the least-squares regression equation to predict the turnout time for a 25,000 square foot fire station (2 points).Obtain the residual plot and analyze the output (2 points).RESIDUAL OUTPUTObservationPredicted Turnout Time (in minutes)Residuals12.1107255995- 0.290725599522.2807006588-0.420700658832.1758130737- 0.045813073742.4330405308-0.063040530852.4491364873- 0.529136487362.12519436910.204805630972.1735701946- 0.173570194682.0942558299-0.074255829992.2291804048- 0.0091804048102.16024485360.2597551464112.35893756190.0 810624381122.19166715110.0283328489132.35607899040.823 9210096142.2241449211- 0.0441449211152.23837181160.1116281884162.35893756190.1 410624381 Turnout time (in minutes) for the firefighters is dependent variable.
  • 24. Scatter Plot 3842.0 11572.0 6802.0 18500.0 19232.0 4500.0 6700.0 3093.0 9229.0 6094.0 15130.0 7523.0 15000.0 9000.0 9647.0 15130.0 1.82 1.86 2.13 2.37 1.92 2.33 2.0 2.02 2.22 2.42 2.44 2.22 3.18 2.18 2.35 2.5 Area (Square footage) Turnout time Square Footage Residual Plot 3842.0 11572.0 6802.0 18500.0 19232.0 4500.0 6700.0 3093.0 9229.0 6094.0 15130.0 7523.0 15000.0 9000.0 9647.0 15130.0 - 0.290725599547464 -0.420700658810437 - 0.0458130737283695 -0.0630405308122319 - 0.529136487265078 0.204805630854213 -0.173570194550514 -0.0742558298983091 -0.00918040475440218 0.259755146447334 0.0810624381031908 0.028332848945809 0.823921009604379 - 0.0441449211100013 0.111628188418699 0.141062438103191 Square Footage Residuals Area (Square footage) of a fire station is independent variable. This data tells us that there is weak positive relationship between square footage area and turnout time for firefighters. Hence as square footage area increases, the turnout time for firefighters also increases to smaller extent. The equation of the least-squares regression line is given below. Turnout time = 2.026 + 0.000021*Square footage The slope of regression line is 0.000021. The slope indicates that when there is one square foot increase in area of a fire station, predicted turnout time for firefighters increases by 0.000021 minutes. The intercept of regression line is 2.026.
  • 25. The intercept indicates that when area of fire station is 0 square foot, predicted turnout time for firefighters is approximately 2.026 minutes. The least-squares regression equationis given below. Turnout time = 2.026 + 0.000021*Square footage Turnout time = 2.026 + 0.000021*12,000 = 2.29 minutes Hence predicted turnout time for a 12,000 square foot fire station is 2.29 minutes. Least-squares regression equation is not useful to predict the turnout time for a 25,000 square foot fire station because value of x (25,000) is outlier for given range of area. Residual plot indicates that all the residuals are randomly distributed and there is no any pattern observed. Hence normality assumption is satisfied.