SlideShare a Scribd company logo
The World of Linear Regression
What is regression analysis? Regression analysis  is a technique for measuring the relationship between two interval- or ratio-level variables.  The regression framework is at the heart of empirical social and political science research. Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences.
Regression models Researchers translate verbal theories, hypotheses, even hunches into models. A model shows how and under what conditions two (or more) variables are related. A regression model with a dependent variable and one independent variable is known as a  bivariate regression model. A regression model with a dependent variable and two or more independent variables and/or control variables is known as a  multivariate regression model.
Scatterplots A  scatterplot  graphs the sample observations by placing them along the X,Y axis.  The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable. X is the horizontal axis; Y is the vertical axis.
Scatterplots Scatterplots allow you to study the flow of the dots, or the relationship between the two variables Scatterplots allow political scientists to identify -- positive or negative relationships -- monotonic or linear relationships
Scatterplot
 
Regression Equation The linear equation is specified as follows: Y = a + bX Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0) b = is the slope of the regression line
Regression Equation Y = a + bX a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis. b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship. The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily.
The Regression Model The goal of regression analysis is to find an equation which “best fits” the data. In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn.
d 1  and d 2  represent the distances of observed data points from an estimated regression line. Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line.
Regression Equation The standard regression equation is the same as the linear equation with one exception: the error term. Y =  α + βX + ε Where Y = dependent variable α = constant term β = slope or regression coefficient X = independent variable ε = error term
Regression Equation This regression procedure is known as ordinary least squares (OLS).  α (the constant term) is interpreted the same as before β (the regression coefficient) tells how much Y changes if X changes by one unit. The regression coefficient indicates the direction and strength of the relationship between the two  quantitative variables.
Regression Equation The error ( ε ) indicates that observed data do not follow a neat pattern that can be summarized with a straight line.  A observation's score on Y can be broken into two parts: α + βX is due to the independent variable ε is due to error Observed value = Predicted value (α + βX) + error (ε)
Regression Equation The error is the difference between the predicted value of Y and the observed value of Y. This difference is known as the  residual .
 
 
Regression Interpretation For the data on the scatterplot:  Y (depvar) = telephone lines for 1,000 people X (indvar) =  Infant mortality We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.
Regression Interpretation In this analysis, the intercept and regression coefficient are as follows: α (or constant) = 121  Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population. β = -1.25 Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.
Regression Interpretation
Regression Interpretation These calculations can be useful because they allow you to make useful predictions about the data.   An increase from 1 to 10 deaths per 1,000 live births is associated with a decline of 119.75 – 108.5 = 11.25 telephone lines. Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean?  -- Well, it means a negative relationship between infant mortality and phone lines. -- It means for every additional infant death there is a decrease of 1.25 phone lines. This information is useful, but is there a measure that tells us how good a job we do predicting the observed values?
Scatterplot
R-squared Yes, the measure is known as  R-squared  (or R 2 ). As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance). The difference  between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares). The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation.
R-squared Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares. R 2  = Regression SS/TSS The more variance the regression model explains, the higher the R 2 .
 
 

More Related Content

PDF
Correlation and Regression
PPT
Regression analysis
PDF
Simple linear regression
PPTX
regression assumption by Ammara Aftab
PPTX
Correlation and regression
PPTX
Regression Analysis
PPS
Correlation and regression
PPTX
Correlation analysis
Correlation and Regression
Regression analysis
Simple linear regression
regression assumption by Ammara Aftab
Correlation and regression
Regression Analysis
Correlation and regression
Correlation analysis

What's hot (20)

PDF
Discrete probability distribution (complete)
PPTX
Correlation and Regression ppt
PDF
Normal and standard normal distribution
PPTX
Binomial probability distributions ppt
PPTX
Point and Interval Estimation
PPT
Probability concept and Probability distribution
PPT
Simple Linier Regression
PPT
Multiple regression presentation
PPT
Correlation and regression
PPTX
Regression Analysis
PPT
Ch4 Confidence Interval
PPTX
Correlation
PPTX
Binomial distribution
PPTX
Non Linear Equation
PPT
Regression analysis
PPTX
Regression
PPTX
Regression
PPTX
Regression analysis
PDF
Introduction to correlation and regression analysis
PPTX
Chapter 6 simple regression and correlation
Discrete probability distribution (complete)
Correlation and Regression ppt
Normal and standard normal distribution
Binomial probability distributions ppt
Point and Interval Estimation
Probability concept and Probability distribution
Simple Linier Regression
Multiple regression presentation
Correlation and regression
Regression Analysis
Ch4 Confidence Interval
Correlation
Binomial distribution
Non Linear Equation
Regression analysis
Regression
Regression
Regression analysis
Introduction to correlation and regression analysis
Chapter 6 simple regression and correlation
Ad

Viewers also liked (13)

PDF
Regression Analysis
PPT
Regression analysis ppt
PPTX
Regression
PPTX
Regression analysis
PPTX
Linear regression(probabilistic interpretation)
PPTX
Multiple Linear Regression
PPTX
Regression analysis
PPT
Hypothesis
PDF
Linear regression without tears
PPTX
Regression analysis
PPT
Simple linear regression (final)
ODP
Multiple linear regression
PPTX
Presentation On Regression
Regression Analysis
Regression analysis ppt
Regression
Regression analysis
Linear regression(probabilistic interpretation)
Multiple Linear Regression
Regression analysis
Hypothesis
Linear regression without tears
Regression analysis
Simple linear regression (final)
Multiple linear regression
Presentation On Regression
Ad

Similar to Regression (20)

PPTX
Stats 3000 Week 2 - Winter 2011
PPTX
regression.pptx
PPTX
Unit-III Correlation and Regression.pptx
PPTX
Simple linear regression (Updated).pptx
PPTX
Simple egression.pptx
PDF
9. parametric regression
PPT
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
PPTX
Simple Linear Regression.pptx
PPT
correlation in Marketing research uses..
PPT
2-20-04.ppt
PPT
Intro to corhklloytdeb koptrcb k & reg.ppt
PDF
linear_regression_notes.pdf
DOCX
For this assignment, use the aschooltest.sav dataset.The d
PPTX
Linear regression
PPTX
Correlation and regression
PPTX
LINEAR REGRESSION ANALYSIS.pptx
PPT
regression and correlation
PPTX
Correlation and regression
PPT
Regression and Co-Relation
PDF
Linear regression model in econometrics undergraduate
Stats 3000 Week 2 - Winter 2011
regression.pptx
Unit-III Correlation and Regression.pptx
Simple linear regression (Updated).pptx
Simple egression.pptx
9. parametric regression
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
Simple Linear Regression.pptx
correlation in Marketing research uses..
2-20-04.ppt
Intro to corhklloytdeb koptrcb k & reg.ppt
linear_regression_notes.pdf
For this assignment, use the aschooltest.sav dataset.The d
Linear regression
Correlation and regression
LINEAR REGRESSION ANALYSIS.pptx
regression and correlation
Correlation and regression
Regression and Co-Relation
Linear regression model in econometrics undergraduate

More from mandrewmartin (20)

PPT
Regression
PPT
Diffmeans
PPT
More tabs
PPT
Crosstabs
PPT
Statisticalrelationships
PPT
Statistics 091208004734-phpapp01 (1)
PPT
Morestatistics22 091208004743-phpapp01
PPT
Week 7 - sampling
PPT
Research design pt. 2
PPT
Research design
PPT
Measurement pt. 2
PPT
Measurement
PPT
Introduction
PPT
Building blocks of scientific research
PPT
Studying politics scientifically
PPT
Berry et al
PPT
Chapter 11 Psrm
PPT
Week 7 Sampling
PPT
Stats Intro Ps 372
PPT
Statistics
Regression
Diffmeans
More tabs
Crosstabs
Statisticalrelationships
Statistics 091208004734-phpapp01 (1)
Morestatistics22 091208004743-phpapp01
Week 7 - sampling
Research design pt. 2
Research design
Measurement pt. 2
Measurement
Introduction
Building blocks of scientific research
Studying politics scientifically
Berry et al
Chapter 11 Psrm
Week 7 Sampling
Stats Intro Ps 372
Statistics

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced IT Governance
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced IT Governance
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...

Regression

  • 1. The World of Linear Regression
  • 2. What is regression analysis? Regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables. The regression framework is at the heart of empirical social and political science research. Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences.
  • 3. Regression models Researchers translate verbal theories, hypotheses, even hunches into models. A model shows how and under what conditions two (or more) variables are related. A regression model with a dependent variable and one independent variable is known as a bivariate regression model. A regression model with a dependent variable and two or more independent variables and/or control variables is known as a multivariate regression model.
  • 4. Scatterplots A scatterplot graphs the sample observations by placing them along the X,Y axis. The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable. X is the horizontal axis; Y is the vertical axis.
  • 5. Scatterplots Scatterplots allow you to study the flow of the dots, or the relationship between the two variables Scatterplots allow political scientists to identify -- positive or negative relationships -- monotonic or linear relationships
  • 7.  
  • 8. Regression Equation The linear equation is specified as follows: Y = a + bX Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0) b = is the slope of the regression line
  • 9. Regression Equation Y = a + bX a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis. b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship. The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily.
  • 10. The Regression Model The goal of regression analysis is to find an equation which “best fits” the data. In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn.
  • 11. d 1 and d 2 represent the distances of observed data points from an estimated regression line. Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line.
  • 12. Regression Equation The standard regression equation is the same as the linear equation with one exception: the error term. Y = α + βX + ε Where Y = dependent variable α = constant term β = slope or regression coefficient X = independent variable ε = error term
  • 13. Regression Equation This regression procedure is known as ordinary least squares (OLS). α (the constant term) is interpreted the same as before β (the regression coefficient) tells how much Y changes if X changes by one unit. The regression coefficient indicates the direction and strength of the relationship between the two quantitative variables.
  • 14. Regression Equation The error ( ε ) indicates that observed data do not follow a neat pattern that can be summarized with a straight line. A observation's score on Y can be broken into two parts: α + βX is due to the independent variable ε is due to error Observed value = Predicted value (α + βX) + error (ε)
  • 15. Regression Equation The error is the difference between the predicted value of Y and the observed value of Y. This difference is known as the residual .
  • 16.  
  • 17.  
  • 18. Regression Interpretation For the data on the scatterplot: Y (depvar) = telephone lines for 1,000 people X (indvar) = Infant mortality We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.
  • 19. Regression Interpretation In this analysis, the intercept and regression coefficient are as follows: α (or constant) = 121 Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population. β = -1.25 Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.
  • 21. Regression Interpretation These calculations can be useful because they allow you to make useful predictions about the data. An increase from 1 to 10 deaths per 1,000 live births is associated with a decline of 119.75 – 108.5 = 11.25 telephone lines. Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean? -- Well, it means a negative relationship between infant mortality and phone lines. -- It means for every additional infant death there is a decrease of 1.25 phone lines. This information is useful, but is there a measure that tells us how good a job we do predicting the observed values?
  • 23. R-squared Yes, the measure is known as R-squared (or R 2 ). As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance). The difference between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares). The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation.
  • 24. R-squared Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares. R 2 = Regression SS/TSS The more variance the regression model explains, the higher the R 2 .
  • 25.  
  • 26.