SlideShare a Scribd company logo
Function lm() in R
and its Basic Parameters
Jiong Xun
Learning Objectives
- Understand linear regression
- Understand purpose of lm() function
- Using lm() to fit regression models
- Interpret output of lm() function
Ever Wondered How…
- Maps are able to estimate your travelling time
- Surcharge pricing are determined to meet
demands for taxi
- HDB resale prices are forecasted
What is Linear Regression?
- Interested in the relationship between a dependent variable (y)
and one or more independent variables (x)
- Models relationships between variables
- Simple Linear Regression (1 independent variable)
- Multiple Linear Regression (2 or more independent variables)
- Finds best-fit line that minimises distance between observed
data values and predicted values
How do we obtain best-fit line?
Ordinary Least Squares
- Find the best-fit line want the line to be as
⇒ close to data points as possible
- Minimise vertical distance between each point to line
- Residual Sum of Squares (RSS) ⇒ squared sum of residuals for all data
points
- Squared as we do not want residuals to “cancel off” one another
Minimise RSS
Minimum total
distance
between line
and points
Best-fit line
Visualised on a Simple Plot
Actual Data Points
Residuals
Regression Line
How to We Use R to plot Linear Regression Model?
Syntax of lm()
Function lm() stands for linear model
⇒
Example Dataset “trees” Inches ft Cubic ft
Example Dataset “trees”
Y variable
(response)
X variable
(explanatory)
Name of data frame
that model is using
Example Dataset “trees”
Example Dataset “trees”
Difference between
observed and predicted
values
Example Dataset “trees”
Example Dataset “trees”
Used to predict value of
the response variable
Example Dataset “trees”
Example Dataset “trees”
Average amount
that estimate varies
from actual value
Example Dataset “trees”
Example Dataset “trees”
t value = Estimate / std. Error
Example Dataset “trees”
Example Dataset “trees”
p-value for the t-test to
determine if coefficient is
significant
Example Dataset “trees”
Example Dataset “trees”
Standard deviation of
the residuals
Number of data
points that went
into estimation
Example Dataset “trees”
Example Dataset “trees”
Gives a measurement of
what % of variance in
response can be explained
by the regression
Example Dataset “trees”
Example Dataset “trees”
Indicates if model as
a whole is statistically
significant
Example Dataset “trees”
Predict Volume of tree based on Girth and Height of tree?
Example Dataset “trees”

More Related Content

PPTX
linear regression in machine learning.pptx
PPT
xxxxxxxxxxxxxxspring_2008_handouts_yan.ppt
PPT
relational algebra and calculus queries .ppt
PPTX
1. linear model, inference, prediction
PPT
Intro to relational model
PDF
Linear Regression
PPT
14. Query Optimization in DBMS
PDF
4 R Tutorial DPLYR Apply Function
linear regression in machine learning.pptx
xxxxxxxxxxxxxxspring_2008_handouts_yan.ppt
relational algebra and calculus queries .ppt
1. linear model, inference, prediction
Intro to relational model
Linear Regression
14. Query Optimization in DBMS
4 R Tutorial DPLYR Apply Function

Similar to lm() Function.pptxsfdfsfsfsfsfsfsfsdfsdfsfsfs (20)

PPT
R for Statistical Computing
PDF
A Primer on Entity Resolution
PPT
chapter4 contains details about sql queries.ppt
PPTX
statistical computation using R- an intro..
PDF
The normal presentation about linear regression in machine learning
PPT
relational model in Database Management.ppt.ppt
PPTX
Linear Regression 101 Lecture SlideShare.pptx
 
PPT
relational algebra
PDF
What's new in Apache SystemML - Declarative Machine Learning
PPTX
feature matching and model fitting .pptx
PPT
Chapter05
PPTX
R Language Introduction
PDF
Simple Linear Regression with R
PPT
chapter3 of DBMS includes relational algebra.ppt
PDF
Query trees
PPTX
Asymptotic Notation and Data Structures
R for Statistical Computing
A Primer on Entity Resolution
chapter4 contains details about sql queries.ppt
statistical computation using R- an intro..
The normal presentation about linear regression in machine learning
relational model in Database Management.ppt.ppt
Linear Regression 101 Lecture SlideShare.pptx
 
relational algebra
What's new in Apache SystemML - Declarative Machine Learning
feature matching and model fitting .pptx
Chapter05
R Language Introduction
Simple Linear Regression with R
chapter3 of DBMS includes relational algebra.ppt
Query trees
Asymptotic Notation and Data Structures
Ad

Recently uploaded (20)

PPTX
Slide_Egg-81850-About Us PowerPoint Template Free.pptx
PPTX
CPAR7 ARTS GRADE 112 LITERARY ARTS OR LI
PPTX
4277547e-f8e2-414e-8962-bf501ea91259.pptx
PDF
the saint and devil who dominated the outcasts
PPTX
CPAR_QR1_WEEK1_INTRODUCTION TO CPAR.pptx
PPTX
Callie Slide Show Slide Show Slide Show S
PPTX
Socio ch 1 characteristics characteristics
PPTX
A slideshow about aesthetic value in arts
PDF
Close Enough S3 E7 "Bridgette the Brain"
PPTX
400kV_Switchyard_Training_with_Diagrams.pptx
PDF
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
PPTX
White Green Simple and Professional Business Pitch Deck Presentation.pptx
PDF
waiting, Queuing, best time an event cab be done at a time .pdf
PPTX
DIMAYUGA ANDEA MAE P. BSED ENG 3-2 (CHAPTER 7).pptx
PPTX
Review1_Bollywood_Project analysis of bolywood trends from 1950s to 2025
PDF
DPSR MUN'25 (U).pdf hhhhhhhhhhhhhbbnhhhh
PDF
; Projeto Rixa Antiga.pdf
PDF
The-Art-of-Storytelling-in-Cinema (1).pdf
PPTX
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
PDF
Ricardo Salinas Pliego Accused of Acting as A Narcotics Kingpin
Slide_Egg-81850-About Us PowerPoint Template Free.pptx
CPAR7 ARTS GRADE 112 LITERARY ARTS OR LI
4277547e-f8e2-414e-8962-bf501ea91259.pptx
the saint and devil who dominated the outcasts
CPAR_QR1_WEEK1_INTRODUCTION TO CPAR.pptx
Callie Slide Show Slide Show Slide Show S
Socio ch 1 characteristics characteristics
A slideshow about aesthetic value in arts
Close Enough S3 E7 "Bridgette the Brain"
400kV_Switchyard_Training_with_Diagrams.pptx
Love & Romance in Every Sparkle_ Discover the Magic of Diamond Painting.pdf
White Green Simple and Professional Business Pitch Deck Presentation.pptx
waiting, Queuing, best time an event cab be done at a time .pdf
DIMAYUGA ANDEA MAE P. BSED ENG 3-2 (CHAPTER 7).pptx
Review1_Bollywood_Project analysis of bolywood trends from 1950s to 2025
DPSR MUN'25 (U).pdf hhhhhhhhhhhhhbbnhhhh
; Projeto Rixa Antiga.pdf
The-Art-of-Storytelling-in-Cinema (1).pdf
Certificados y Diplomas para Educación de Colores Candy by Slidesgo.pptx
Ricardo Salinas Pliego Accused of Acting as A Narcotics Kingpin
Ad

lm() Function.pptxsfdfsfsfsfsfsfsfsdfsdfsfsfs

Editor's Notes

  • #10: Simple linear regression (SLR) (black cherry trees)
  • #13: Min ⇒ represents the data point furthest below the regression line 1Q ⇒ 25% of the residuals are less than this number 3Q ⇒ 25% of teh residuals are greater than this number Max ⇒ point that is furthest from the regression line Mean of residuals not shown as it will always be 0 Now, what can we infer just from the residuals alone to determine if this can be a good linear regression model? Median of residuals ideally should be as close to 0 as possible (hard to preface what is close or far as it is relative to your data) Why ⇒ this would imply our model isnt skewed one way or another Another would be that it is symmetrically distributed ⇒ want min and max to have same magnitude, as well as 1Q and 3Q
  • #15: Moving on to coefficients, the intercept will always be given, including any x variables that we have provided ⇒ y=mx+c
  • #17: Standard error ⇒ responsible for the width of the confidence interval Ideally, you want a lower number relative to the estimate for standard error (e.g. standard error is small compared to est)
  • #21: We use 0.05 as a benchmark, it means that it is unlikely that the relationship between the y variable and x variable is due to chance ⇒ statistically significant Stars on the right shows the significant levels, as seen in the significant codes
  • #23: Standard deviation of residuals ⇒ average distance of points from the regression line, adjusted with the number of points Degrees of freedom ⇒ number of observations (31) minus number of variables (29)
  • #25: Multiple R-squared ⇒ Generally gets better with more x variables Adjusted R-squared ⇒ Penalises for adding useless predictor (x) variables If multi R-squared is much higher than your adjusted, your model might be overfitted
  • #27: Want a value way bigger than 1 to be statistically significant, or you can just refer to p-value where if it is <0.05 it would be significant
  • #28: Multiple linear regression