SlideShare a Scribd company logo
Regression using R
Dipika Patra
7 August 2016
Regression analysis
Data Description
The dataset prvides measurments of the grith, height and volume of timber in 31 felled black cherry trees.
Note that grith is the diameter of the tree measured at 4 ft 6 inches above the ground.
The dataset is given below:
trees
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
## 7 11.0 66 15.6
## 8 11.0 75 18.2
## 9 11.1 80 22.6
## 10 11.2 75 19.9
## 11 11.3 79 24.2
## 12 11.4 76 21.0
## 13 11.4 76 21.4
## 14 11.7 69 21.3
## 15 12.0 75 19.1
## 16 12.9 74 22.2
## 17 12.9 85 33.8
## 18 13.3 86 27.4
## 19 13.7 71 25.7
## 20 13.8 64 24.9
## 21 14.0 78 34.5
## 22 14.2 80 31.7
## 23 14.5 74 36.3
## 24 16.0 72 38.3
## 25 16.3 77 42.6
## 26 17.3 81 55.4
## 27 17.5 82 55.7
## 28 17.9 80 58.3
## 29 18.0 80 51.5
## 30 18.0 80 51.0
## 31 20.6 87 77.0
1
Correlation
cor(trees)
## Girth Height Volume
## Girth 1.0000000 0.5192801 0.9671194
## Height 0.5192801 1.0000000 0.5982497
## Volume 0.9671194 0.5982497 1.0000000
Graphical Display:
To illustrate linked two dimentional scatter plots we refer to use “pairs” comand with “panel.smooth”
argument.It shows two dimentional scatter plot of the each pairs of observations.
pairs(trees, panel = panel.smooth, main = "trees data")
Girth
65 70 75 80 85
8121620
657585
Height
8 10 12 14 16 18 20 10 20 30 40 50 60 70
10305070
Volume
trees data
As we know,
Volume of cylinder= (22/7)* (Grith)ˆ2 * Height i.e. log(Volume)= constant+log(Height)+2log(Grith)
To plot the data in logarithmic graph we refer to use “plot” command with argument “log”.
plot(Volume ~ Girth, data = trees, log = "xy")
2
10 12 14 16 18 20
102030406080
Girth
Volume
Regression Steps:
Dependent variable “volume”:
To regress the “Volume” variable with respect to the independent variable “Height” & “Grith” it is reasonable
to fit linear regression on the logarithmic value of the given data.
summary(fm1 <- lm(log(Volume) ~ log(Girth)+ log(Height), data = trees))
##
## Call:
## lm(formula = log(Volume) ~ log(Girth) + log(Height), data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.168561 -0.048488 0.002431 0.063637 0.129223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.63162 0.79979 -8.292 5.06e-09 ***
## log(Girth) 1.98265 0.07501 26.432 < 2e-16 ***
## log(Height) 1.11712 0.20444 5.464 7.81e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3
##
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared: 0.9777, Adjusted R-squared: 0.9761
## F-statistic: 613.2 on 2 and 28 DF, p-value: < 2.2e-16
library(ggplot2)
qplot(Volume,exp(predict(fm1)), data= trees,xlab = "Observed Value of Volume",ylab="Predicted value of V
20
40
60
80
20 40 60 80
Observed Value of Volume
PredictedvalueofVolume
R square and adjusted R square with interpretation:
The R-squared is the ratio of the response variable variation that is explained by a linear model.
Mathematically,R-squared = Explained variation / Total variation
R-squared is always between 0 and 1:
0 indicates that the model explains none of the variability of the response data around its mean. 1 indicates
that the model explains all the variability of the response data around its mean.
In general, the higher the R-squared, the better the model fits your data.
In the above model R square =0.9777 i.e. the model explain 97.77 % variablity of the data set.
The adjusted R-squared is a modified version of R-squared for the number of predictors in a model that
compares the explanatory power of regression models that contain different numbers of predictors.
In the above model adjusted R square = 0.9761
Fitted Model: log(Volume)=-6.6312+1.98255log(Grith)-1.11712log(Height)
4
Dependent variable Height:
To regress the “Height” variable with respect to the independent variable “Volume” & “Grith” it is reasonable
to fit linear regression on the logarithmic value of the given data.
summary(fm2 <- lm(log(Height) ~ log(Volume)+log(Girth), data = trees))
##
## Call:
## lm(formula = log(Height) ~ log(Volume) + log(Girth), data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.08628 -0.03084 -0.00146 0.02622 0.13465
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.91689 0.22497 21.856 < 2e-16 ***
## log(Volume) 0.46196 0.08454 5.464 7.81e-06 ***
## log(Girth) -0.82177 0.19043 -4.315 0.000179 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05234 on 28 degrees of freedom
## Multiple R-squared: 0.6521, Adjusted R-squared: 0.6273
## F-statistic: 26.24 on 2 and 28 DF, p-value: 3.804e-07
qplot(Height,exp(predict(fm2)), data= trees,xlab = "Observed Value of Height",ylab="Predicted value of H
5
70
75
80
85
70 80
Observed Value of Height
PredictedvalueofHeight
R -Square Interpretation:
In that model R square =0.6521 and Adjusted R square=0.6247 i.e. the model explain 65.21 % variablity of
the data set.
Fitted Model: log(Height)=4.91689+0.46196 log(volume)-0.82177 log(Grith)
Dependent variable Grith:
Next regress the “Grith” variable with respect to the independent variable “Volume” & “Height” it is
reasonable to fit linear regression on the logarithmic value of the given data.
summary(fm3 <- lm(log(Girth) ~ log(Volume)+log(Height), data = trees))
##
## Call:
## lm(formula = log(Girth) ~ log(Volume) + log(Height), data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.070123 -0.027064 0.000029 0.022232 0.079494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
6
## (Intercept) 3.07353 0.45083 6.817 2.09e-07 ***
## log(Volume) 0.48494 0.01835 26.432 < 2e-16 ***
## log(Height) -0.48606 0.11263 -4.315 0.000179 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04025 on 28 degrees of freedom
## Multiple R-squared: 0.9723, Adjusted R-squared: 0.9703
## F-statistic: 491.4 on 2 and 28 DF, p-value: < 2.2e-16
qplot(Girth,exp(predict(fm3)), data= trees,xlab = "Observed Value of Girth",ylab="Predicted value of Gir
8
12
16
20
8 12 16 20
Observed Value of Girth
PredictedvalueofGirth
R -Square Interpretation:
In that model R square =0.9723 and Adjusted R square=0.9703 i.e. the model explain 97.23 % variablity of
the data set.
Fitted Model: log(Grith)=3.0753+0.48494 log(volume)-0.48606 log(Height)
7

More Related Content

PPT
Cobb-Douglas Production Function
PDF
R nonlinear least square
PPT
Cobb Douglas production function
DOCX
Case Study 3 Production Cost Perfect Comp Answer Sheet - new
PPT
Cobb-douglas production function
DOCX
Case Study 1 Econ 310
PPTX
Alg II Unit 4-3 Modeling with Quadratic Functions
Cobb-Douglas Production Function
R nonlinear least square
Cobb Douglas production function
Case Study 3 Production Cost Perfect Comp Answer Sheet - new
Cobb-douglas production function
Case Study 1 Econ 310
Alg II Unit 4-3 Modeling with Quadratic Functions

Similar to Assignment in regression1 (20)

PPTX
R Language Introduction
DOCX
Chapter 16 Inference for RegressionClimate ChangeThe .docx
DOCX
Data visualization with R and ggplot2.docx
PPTX
Linear regression by Kodebay
PDF
Q plot tutorial
PDF
A successful maximum likelihood parameter estimation in skewed distributions ...
PDF
FINAL_TAKE_HOME
PDF
R programming intro with examples
PDF
Predicting US house prices using Multiple Linear Regression in R
PDF
Next Generation Programming in R
PDF
Simple Linear Regression with R
PDF
Data Wrangling with dplyr and tidyr Cheat Sheet
PDF
Regression and Classification with R
PDF
Data manipulation on r
PPT
Mathematics TAKS Exit Level Review
PPS
Correlation and regression
PPT
koefisienkorelasiUNTUKILMUMANAJEMENS2.ppt
PDF
manual de bateria de litio y sus riesgos
PPT
2 simple regression
PPTX
exploring Machine Learning with best way
R Language Introduction
Chapter 16 Inference for RegressionClimate ChangeThe .docx
Data visualization with R and ggplot2.docx
Linear regression by Kodebay
Q plot tutorial
A successful maximum likelihood parameter estimation in skewed distributions ...
FINAL_TAKE_HOME
R programming intro with examples
Predicting US house prices using Multiple Linear Regression in R
Next Generation Programming in R
Simple Linear Regression with R
Data Wrangling with dplyr and tidyr Cheat Sheet
Regression and Classification with R
Data manipulation on r
Mathematics TAKS Exit Level Review
Correlation and regression
koefisienkorelasiUNTUKILMUMANAJEMENS2.ppt
manual de bateria de litio y sus riesgos
2 simple regression
exploring Machine Learning with best way
Ad

More from Seth Anandaram Jaipuria College (8)

PPTX
DOCX
Binary Logistic Regression
PPTX
Factor Analysis with an Example
PPT
Multivariate analysis for 26 rice grain varieties
PPTX
Multiple reg presentation
Binary Logistic Regression
Factor Analysis with an Example
Multivariate analysis for 26 rice grain varieties
Multiple reg presentation
Ad

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Lesson notes of climatology university.
PDF
RMMM.pdf make it easy to upload and study
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
master seminar digital applications in india
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Institutional Correction lecture only . . .
PPTX
Presentation on HIE in infants and its manifestations
Cell Structure & Organelles in detailed.
Cell Types and Its function , kingdom of life
Pharmacology of Heart Failure /Pharmacotherapy of CHF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
VCE English Exam - Section C Student Revision Booklet
STATICS OF THE RIGID BODIES Hibbelers.pdf
Final Presentation General Medicine 03-08-2024.pptx
Anesthesia in Laparoscopic Surgery in India
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
FourierSeries-QuestionsWithAnswers(Part-A).pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Lesson notes of climatology university.
RMMM.pdf make it easy to upload and study
GDM (1) (1).pptx small presentation for students
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Institutional Correction lecture only . . .
Presentation on HIE in infants and its manifestations

Assignment in regression1

  • 1. Regression using R Dipika Patra 7 August 2016 Regression analysis Data Description The dataset prvides measurments of the grith, height and volume of timber in 31 felled black cherry trees. Note that grith is the diameter of the tree measured at 4 ft 6 inches above the ground. The dataset is given below: trees ## Girth Height Volume ## 1 8.3 70 10.3 ## 2 8.6 65 10.3 ## 3 8.8 63 10.2 ## 4 10.5 72 16.4 ## 5 10.7 81 18.8 ## 6 10.8 83 19.7 ## 7 11.0 66 15.6 ## 8 11.0 75 18.2 ## 9 11.1 80 22.6 ## 10 11.2 75 19.9 ## 11 11.3 79 24.2 ## 12 11.4 76 21.0 ## 13 11.4 76 21.4 ## 14 11.7 69 21.3 ## 15 12.0 75 19.1 ## 16 12.9 74 22.2 ## 17 12.9 85 33.8 ## 18 13.3 86 27.4 ## 19 13.7 71 25.7 ## 20 13.8 64 24.9 ## 21 14.0 78 34.5 ## 22 14.2 80 31.7 ## 23 14.5 74 36.3 ## 24 16.0 72 38.3 ## 25 16.3 77 42.6 ## 26 17.3 81 55.4 ## 27 17.5 82 55.7 ## 28 17.9 80 58.3 ## 29 18.0 80 51.5 ## 30 18.0 80 51.0 ## 31 20.6 87 77.0 1
  • 2. Correlation cor(trees) ## Girth Height Volume ## Girth 1.0000000 0.5192801 0.9671194 ## Height 0.5192801 1.0000000 0.5982497 ## Volume 0.9671194 0.5982497 1.0000000 Graphical Display: To illustrate linked two dimentional scatter plots we refer to use “pairs” comand with “panel.smooth” argument.It shows two dimentional scatter plot of the each pairs of observations. pairs(trees, panel = panel.smooth, main = "trees data") Girth 65 70 75 80 85 8121620 657585 Height 8 10 12 14 16 18 20 10 20 30 40 50 60 70 10305070 Volume trees data As we know, Volume of cylinder= (22/7)* (Grith)ˆ2 * Height i.e. log(Volume)= constant+log(Height)+2log(Grith) To plot the data in logarithmic graph we refer to use “plot” command with argument “log”. plot(Volume ~ Girth, data = trees, log = "xy") 2
  • 3. 10 12 14 16 18 20 102030406080 Girth Volume Regression Steps: Dependent variable “volume”: To regress the “Volume” variable with respect to the independent variable “Height” & “Grith” it is reasonable to fit linear regression on the logarithmic value of the given data. summary(fm1 <- lm(log(Volume) ~ log(Girth)+ log(Height), data = trees)) ## ## Call: ## lm(formula = log(Volume) ~ log(Girth) + log(Height), data = trees) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.168561 -0.048488 0.002431 0.063637 0.129223 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -6.63162 0.79979 -8.292 5.06e-09 *** ## log(Girth) 1.98265 0.07501 26.432 < 2e-16 *** ## log(Height) 1.11712 0.20444 5.464 7.81e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 3
  • 4. ## ## Residual standard error: 0.08139 on 28 degrees of freedom ## Multiple R-squared: 0.9777, Adjusted R-squared: 0.9761 ## F-statistic: 613.2 on 2 and 28 DF, p-value: < 2.2e-16 library(ggplot2) qplot(Volume,exp(predict(fm1)), data= trees,xlab = "Observed Value of Volume",ylab="Predicted value of V 20 40 60 80 20 40 60 80 Observed Value of Volume PredictedvalueofVolume R square and adjusted R square with interpretation: The R-squared is the ratio of the response variable variation that is explained by a linear model. Mathematically,R-squared = Explained variation / Total variation R-squared is always between 0 and 1: 0 indicates that the model explains none of the variability of the response data around its mean. 1 indicates that the model explains all the variability of the response data around its mean. In general, the higher the R-squared, the better the model fits your data. In the above model R square =0.9777 i.e. the model explain 97.77 % variablity of the data set. The adjusted R-squared is a modified version of R-squared for the number of predictors in a model that compares the explanatory power of regression models that contain different numbers of predictors. In the above model adjusted R square = 0.9761 Fitted Model: log(Volume)=-6.6312+1.98255log(Grith)-1.11712log(Height) 4
  • 5. Dependent variable Height: To regress the “Height” variable with respect to the independent variable “Volume” & “Grith” it is reasonable to fit linear regression on the logarithmic value of the given data. summary(fm2 <- lm(log(Height) ~ log(Volume)+log(Girth), data = trees)) ## ## Call: ## lm(formula = log(Height) ~ log(Volume) + log(Girth), data = trees) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.08628 -0.03084 -0.00146 0.02622 0.13465 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.91689 0.22497 21.856 < 2e-16 *** ## log(Volume) 0.46196 0.08454 5.464 7.81e-06 *** ## log(Girth) -0.82177 0.19043 -4.315 0.000179 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.05234 on 28 degrees of freedom ## Multiple R-squared: 0.6521, Adjusted R-squared: 0.6273 ## F-statistic: 26.24 on 2 and 28 DF, p-value: 3.804e-07 qplot(Height,exp(predict(fm2)), data= trees,xlab = "Observed Value of Height",ylab="Predicted value of H 5
  • 6. 70 75 80 85 70 80 Observed Value of Height PredictedvalueofHeight R -Square Interpretation: In that model R square =0.6521 and Adjusted R square=0.6247 i.e. the model explain 65.21 % variablity of the data set. Fitted Model: log(Height)=4.91689+0.46196 log(volume)-0.82177 log(Grith) Dependent variable Grith: Next regress the “Grith” variable with respect to the independent variable “Volume” & “Height” it is reasonable to fit linear regression on the logarithmic value of the given data. summary(fm3 <- lm(log(Girth) ~ log(Volume)+log(Height), data = trees)) ## ## Call: ## lm(formula = log(Girth) ~ log(Volume) + log(Height), data = trees) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.070123 -0.027064 0.000029 0.022232 0.079494 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) 6
  • 7. ## (Intercept) 3.07353 0.45083 6.817 2.09e-07 *** ## log(Volume) 0.48494 0.01835 26.432 < 2e-16 *** ## log(Height) -0.48606 0.11263 -4.315 0.000179 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.04025 on 28 degrees of freedom ## Multiple R-squared: 0.9723, Adjusted R-squared: 0.9703 ## F-statistic: 491.4 on 2 and 28 DF, p-value: < 2.2e-16 qplot(Girth,exp(predict(fm3)), data= trees,xlab = "Observed Value of Girth",ylab="Predicted value of Gir 8 12 16 20 8 12 16 20 Observed Value of Girth PredictedvalueofGirth R -Square Interpretation: In that model R square =0.9723 and Adjusted R square=0.9703 i.e. the model explain 97.23 % variablity of the data set. Fitted Model: log(Grith)=3.0753+0.48494 log(volume)-0.48606 log(Height) 7