SlideShare a Scribd company logo
Data Science using R
P. Rajesh,
Assistant Professor,
PG Department of Computer Science
C.Mutlur, Chidambaram
Linear Regression
Regression analysis is a very widely used statistical tool to establish a
relationship model between two variables.
One of these variable is called predictor variable whose value is gathered
through experiments.
The other variable is called response variable whose value is derived from
the predictor variable.
Mathematically a linear relationship represents a straight line when plotted
as a graph.
A non-linear relationship where the exponent of any variable is not equal to
1 creates a curve.
The general mathematical equation for a linear regression is −
y = ax + b
Following is the description of the parameters used −
y is the response variable.
x is the predictor variable.
a and b are constants which are called the coefficients.
How much money should you allocate for gas?
You approach this problem with a science-oriented mindset, thinking that there must be a
way to estimate the amount of money needed, based on the distance you're travelling.
At this point these are just numbers. It's not
very easy to get any valuable information from
this spreadsheet.
"If I drive for 1200 miles, how much will I pay for gas?"
 
 
  


 2
2
i
i
i
i
i
i
x
x
n
y
x
y
x
n
a
 

 
 i
i x
a
y
n
b
1
y = ax + b
Sl.No. Total Miles (x) Total Payed (y) x*x x*y
1 390 36.66 152100 14297.4
2 403 37.05 162409 14931.15
3 396.5 34.71 157212.25 13762.52
4 383.5 32.5 147072.25 12463.75
5 321.1 32.63 103105.21 10477.49
6 391.3 34.45 153115.69 13480.29
7 386.1 36.79 149073.21 14204.62
8 371.8 37.44 138235.24 13920.19
9 404.3 38.09 163458.49 15399.79
10 392.6 38.09 154134.76 14954.13
11 386.49 38.74 149374.5201 14972.62
12 395.2 39 156183.04 15412.8
13 385.5 40 148610.25 15420
14 372 36.21 138384 13470.12
15 397 34.05 157609 13517.85
16 407 41.79 165649 17008.53
17 372.33 30.25 138629.6289 11262.98
18 375.6 38.83 141075.36 14584.55
19 399 39.66 159201 15824.34
7330.32 696.94 2834631.899 269365.1
Visualize the Regression Graphically
# Create the predictor and response variable.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",abline(lm(x~y)),cex = 1.3,pch = 16,xlab
= "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()
Steps to Establish a Regression
A simple example of regression is predicting weight of a person when his
height is known. To do this we need to have the relationship between height
and weight of a person.
The steps to create the relationship is
Carry out the experiment of gathering a sample of observed values of
height and corresponding weight.
Create a relationship model using the lm() functions in R.
Find the coefficients from the model created and create the mathematical
equation using these
Get a summary of the relationship model to know the average error in
prediction. Also called residuals.
To predict the weight of new persons, use the predict() function in R.
Input Data
Below is the sample data representing the observations −
lm() Function
This function creates the relationship model between the predictor and the
response variable.
Syntax
The basic syntax for lm() function in linear regression is −
lm(formula,data)
Create Relationship Model & get the Coefficients
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
Output
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746
predict() Function
Syntax
The basic syntax for predict() in linear regression is −
predict(object, newdata)
Following is the description of the parameters used −
object is the formula which is already created using the lm() function.
newdata is the vector containing the new value for predictor variable.
Predict the weight of new persons
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The response vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function
relation <- lm(y~x)
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
When we execute the above code, it produces the following result −
1 76.22869
In order to answer this question, you'll use the data you've been
collecting so far, and use it to predict how much you are going to
spend. The idea is that you can make estimated guesses about
the future — your trip to Vegas — based on data from the past.
You end up with a mathematical model that describes the
relationship between miles driven and money spent to fill the
tank.
Once that model is defined, you can provide it with new
information — how many miles you're driving from San
Francisco to Las Vegas.
The model will predict how much money you're going to need.
Multiple Regression
Multiple regression is an extension of linear regression into relationship
between more than two variables.
In simple linear relation we have one predictor and one response variable.
But in multiple regression we have more than one predictor variable and
one response variable.
The general mathematical equation for multiple regression is −
y = a + b1x1 + b2x2 +...bnxn
Following is the description of the parameters used −
y is the response variable.
a, b1, b2...bn are the coefficients.
x1, x2, ...xn are the predictor variables.
We create the regression model using the lm() function in R. The model
determines the value of the coefficients using the input data.
Next we can predict the value of the response variable for a given set of
predictor variables using these coefficients.
lm() Function
This function creates the relationship model between the predictor
and the response variable.
Syntax
The basic syntax for lm() function in multiple regression is −
lm(y ~ x1+x2+x3...,data)
Following is the description of the parameters used −
• formula is a symbol presenting the relation between the response
variable and predictor variables.
• data is the vector on which the formula will be applied.
Unemployement Dataset
R Script Multiple Regression
# Capture the in R format
Year <-
c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,
2016,2016,2016,2016,2016,2016,2016,2016,2016)
Month <- c(12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1)
Interest_Rate <-
c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,
1.75,1.75,1.75)
Unemployment_Rate <-
c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1)
Stock_Index_Price <-
c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,97
1,949,884,866,876,822,704,719)
# Check the Linearity the corresponding data is correct or not
plot(x=Interest_Rate, y=Stock_Index_Price)
plot(x=Unemployment_Rate, y=Stock_Index_Price)
# Capture the in R format
student <- c(1,2,3,4,5,6,7,8,9,10)
testscore <- c(100,95,92,90,85,80,78,75,72,65)
IQ <- c(125,104,110,105,100,100,95,95,85,90)
studyhrs <- c(30,40,25,20,20,20,15,10,0,5)
# Check the Linearity the corresponding data is correct or not
plot(x=testscore, y=IQ)
plot(x=IQ, y=studyhrs)
#==================================================
# Predict Test Square using IQ and Study Hrs
relation <- lm(testscore ~ IQ + studyhrs)
a <- data.frame(IQ=120,studyhrs=40)
result <- predict(relation,a)
print(result)
#==================================================
# Predict IQ using Test Square and Study Hrs
relation <- lm(IQ ~ testscore + studyhrs)
a <- data.frame(testscore=50,studyhrs=25)
result <- predict(relation,a)
print(result)
#==================================================
# Predict IQ using Test Square and Study Hrs
relation <- lm(studyhrs ~ IQ + testscore )
a <- data.frame(IQ=140, testscore=90)
result <- predict(relation,a)
print(result)

More Related Content

PPTX
Linear Regression.pptx
PPTX
principalcomponentanalysis-150314161616-conversion-gate01 (1).pptx
PPTX
Principal component analysis
PPTX
Regression
PDF
Correation, Linear Regression and Multilinear Regression using R software
PDF
working with python
PDF
R nonlinear least square
PDF
R - Multiple Regression
Linear Regression.pptx
principalcomponentanalysis-150314161616-conversion-gate01 (1).pptx
Principal component analysis
Regression
Correation, Linear Regression and Multilinear Regression using R software
working with python
R nonlinear least square
R - Multiple Regression

Similar to DataScienceUsingR-Dr.P.Rajesh.PRESENTATION (20)

PDF
Linear regression [Theory and Application (In physics point of view) using py...
PPTX
unit-5 Data Wrandling weightage marks.pptx
PDF
R programming intro with examples
PDF
R linear regression
PDF
Exploring Support Vector Regression - Signals and Systems Project
PDF
SupportVectorRegression
PPTX
linear regression in machine learning.pptx
PPT
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
PPT
Corr-and-Regress (1).ppt
PPT
Corr-and-Regress.ppt
PPT
Cr-and-Regress.ppt
PPT
Correlation & Regression for Statistics Social Science
PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPTX
PPTX
Linear regression by Kodebay
DOCX
Correlation and regression in r
PPTX
Different Types of Machine Learning Algorithms
PPT
Chapter05
Linear regression [Theory and Application (In physics point of view) using py...
unit-5 Data Wrandling weightage marks.pptx
R programming intro with examples
R linear regression
Exploring Support Vector Regression - Signals and Systems Project
SupportVectorRegression
linear regression in machine learning.pptx
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Corr-and-Regress (1).ppt
Corr-and-Regress.ppt
Cr-and-Regress.ppt
Correlation & Regression for Statistics Social Science
Corr-and-Regress.ppt
Corr-and-Regress.ppt
Corr-and-Regress.ppt
Linear regression by Kodebay
Correlation and regression in r
Different Types of Machine Learning Algorithms
Chapter05
Ad

More from GayathriShiva4 (7)

PPTX
Data Analytic s (Unit -1).pRESENTATION .PPT
PPTX
Precedence of operators IN C PROGRAMMING
PPT
Operators in Computer programming presentation
PPT
lecture about visual basic studio 6.0 presentation
PPTX
ACCIDENT PREVENTING WHILE TRAVELLING IN CAR
PPT
Variables, identifiers, constants, declaration in c
PPT
270_1_ChapterIntro_Up_To_Functions (1).ppt
Data Analytic s (Unit -1).pRESENTATION .PPT
Precedence of operators IN C PROGRAMMING
Operators in Computer programming presentation
lecture about visual basic studio 6.0 presentation
ACCIDENT PREVENTING WHILE TRAVELLING IN CAR
Variables, identifiers, constants, declaration in c
270_1_ChapterIntro_Up_To_Functions (1).ppt
Ad

Recently uploaded (20)

PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Mega Projects Data Mega Projects Data
PDF
Lecture1 pattern recognition............
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Foundation of Data Science unit number two notes
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Supervised vs unsupervised machine learning algorithms
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Mega Projects Data Mega Projects Data
Lecture1 pattern recognition............
STUDY DESIGN details- Lt Col Maksud (21).pptx
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
annual-report-2024-2025 original latest.
Foundation of Data Science unit number two notes
Business Acumen Training GuidePresentation.pptx
Data_Analytics_and_PowerBI_Presentation.pptx

DataScienceUsingR-Dr.P.Rajesh.PRESENTATION

  • 1. Data Science using R P. Rajesh, Assistant Professor, PG Department of Computer Science C.Mutlur, Chidambaram
  • 2. Linear Regression Regression analysis is a very widely used statistical tool to establish a relationship model between two variables. One of these variable is called predictor variable whose value is gathered through experiments. The other variable is called response variable whose value is derived from the predictor variable. Mathematically a linear relationship represents a straight line when plotted as a graph. A non-linear relationship where the exponent of any variable is not equal to 1 creates a curve. The general mathematical equation for a linear regression is − y = ax + b Following is the description of the parameters used − y is the response variable. x is the predictor variable. a and b are constants which are called the coefficients.
  • 3. How much money should you allocate for gas? You approach this problem with a science-oriented mindset, thinking that there must be a way to estimate the amount of money needed, based on the distance you're travelling. At this point these are just numbers. It's not very easy to get any valuable information from this spreadsheet. "If I drive for 1200 miles, how much will I pay for gas?"
  • 4.           2 2 i i i i i i x x n y x y x n a       i i x a y n b 1 y = ax + b Sl.No. Total Miles (x) Total Payed (y) x*x x*y 1 390 36.66 152100 14297.4 2 403 37.05 162409 14931.15 3 396.5 34.71 157212.25 13762.52 4 383.5 32.5 147072.25 12463.75 5 321.1 32.63 103105.21 10477.49 6 391.3 34.45 153115.69 13480.29 7 386.1 36.79 149073.21 14204.62 8 371.8 37.44 138235.24 13920.19 9 404.3 38.09 163458.49 15399.79 10 392.6 38.09 154134.76 14954.13 11 386.49 38.74 149374.5201 14972.62 12 395.2 39 156183.04 15412.8 13 385.5 40 148610.25 15420 14 372 36.21 138384 13470.12 15 397 34.05 157609 13517.85 16 407 41.79 165649 17008.53 17 372.33 30.25 138629.6289 11262.98 18 375.6 38.83 141075.36 14584.55 19 399 39.66 159201 15824.34 7330.32 696.94 2834631.899 269365.1
  • 5. Visualize the Regression Graphically # Create the predictor and response variable. x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) relation <- lm(y~x) # Give the chart file a name. png(file = "linearregression.png") # Plot the chart. plot(y,x,col = "blue",main = "Height & Weight Regression",abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") # Save the file. dev.off()
  • 6. Steps to Establish a Regression A simple example of regression is predicting weight of a person when his height is known. To do this we need to have the relationship between height and weight of a person. The steps to create the relationship is Carry out the experiment of gathering a sample of observed values of height and corresponding weight. Create a relationship model using the lm() functions in R. Find the coefficients from the model created and create the mathematical equation using these Get a summary of the relationship model to know the average error in prediction. Also called residuals. To predict the weight of new persons, use the predict() function in R. Input Data Below is the sample data representing the observations − lm() Function This function creates the relationship model between the predictor and the response variable. Syntax The basic syntax for lm() function in linear regression is − lm(formula,data)
  • 7. Create Relationship Model & get the Coefficients x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) print(relation) Output Call: lm(formula = y ~ x) Coefficients: (Intercept) x -38.4551 0.6746
  • 8. predict() Function Syntax The basic syntax for predict() in linear regression is − predict(object, newdata) Following is the description of the parameters used − object is the formula which is already created using the lm() function. newdata is the vector containing the new value for predictor variable. Predict the weight of new persons # The predictor vector. x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) # The response vector. y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function relation <- lm(y~x) # Find weight of a person with height 170. a <- data.frame(x = 170) result <- predict(relation,a) print(result) When we execute the above code, it produces the following result − 1 76.22869
  • 9. In order to answer this question, you'll use the data you've been collecting so far, and use it to predict how much you are going to spend. The idea is that you can make estimated guesses about the future — your trip to Vegas — based on data from the past. You end up with a mathematical model that describes the relationship between miles driven and money spent to fill the tank. Once that model is defined, you can provide it with new information — how many miles you're driving from San Francisco to Las Vegas. The model will predict how much money you're going to need.
  • 10. Multiple Regression Multiple regression is an extension of linear regression into relationship between more than two variables. In simple linear relation we have one predictor and one response variable. But in multiple regression we have more than one predictor variable and one response variable. The general mathematical equation for multiple regression is − y = a + b1x1 + b2x2 +...bnxn Following is the description of the parameters used − y is the response variable. a, b1, b2...bn are the coefficients. x1, x2, ...xn are the predictor variables. We create the regression model using the lm() function in R. The model determines the value of the coefficients using the input data. Next we can predict the value of the response variable for a given set of predictor variables using these coefficients.
  • 11. lm() Function This function creates the relationship model between the predictor and the response variable. Syntax The basic syntax for lm() function in multiple regression is − lm(y ~ x1+x2+x3...,data) Following is the description of the parameters used − • formula is a symbol presenting the relation between the response variable and predictor variables. • data is the vector on which the formula will be applied.
  • 13. R Script Multiple Regression # Capture the in R format Year <- c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016, 2016,2016,2016,2016,2016,2016,2016,2016,2016) Month <- c(12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1) Interest_Rate <- c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75, 1.75,1.75,1.75) Unemployment_Rate <- c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1) Stock_Index_Price <- c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,97 1,949,884,866,876,822,704,719)
  • 14. # Check the Linearity the corresponding data is correct or not plot(x=Interest_Rate, y=Stock_Index_Price)
  • 16. # Capture the in R format student <- c(1,2,3,4,5,6,7,8,9,10) testscore <- c(100,95,92,90,85,80,78,75,72,65) IQ <- c(125,104,110,105,100,100,95,95,85,90) studyhrs <- c(30,40,25,20,20,20,15,10,0,5) # Check the Linearity the corresponding data is correct or not plot(x=testscore, y=IQ) plot(x=IQ, y=studyhrs) #================================================== # Predict Test Square using IQ and Study Hrs relation <- lm(testscore ~ IQ + studyhrs) a <- data.frame(IQ=120,studyhrs=40) result <- predict(relation,a) print(result) #================================================== # Predict IQ using Test Square and Study Hrs relation <- lm(IQ ~ testscore + studyhrs) a <- data.frame(testscore=50,studyhrs=25) result <- predict(relation,a) print(result) #================================================== # Predict IQ using Test Square and Study Hrs relation <- lm(studyhrs ~ IQ + testscore ) a <- data.frame(IQ=140, testscore=90) result <- predict(relation,a) print(result)