SlideShare a Scribd company logo
1
Terry Chaney
PA 5033 Multivariate Techniques
1. Terminating employees based solely upon an individual’s age is illegal under the Age
Discrimination Act. However, it is often difficult to separate the effects of an employee’s job
performance from that person’s age when dealing with age discrimination cases. In order to
determine if employees have been wrongfully terminated based on age, a number of binary
logistic regressions were done to test if age alone factored into the firing decisions. A limited
dependent variable model was used because the dependent variable (job termination) is a dummy
variable with only two values (0 = not fired, 1 = fired). This model was initially used to test job
termination as a function of age; the same model was used to regress job termination on age and
performance.
 N = 319; 53 were fired (P = .166)
 Fired = f(+age): ageFired 146.958.8 

)1( PP 

 = .146(.166).834 = .020 = 2%
R2
p = .85 Wald = 37.998
 Fired = f(+age, -perf): perfageFired 635.130.843.5 

R2
p = .868 Wald = 28.679; 6.432
Age = )1( PP 

 = .130(.166).834 = .0179 = 1.8%
Perf = )1( PP 

 = -.635(.166).834 = -.088 = -8.8%
Omitted variable bias occurs when there is a relevant independent variable omitted from
the specification of the equation. This will cause bias in the estimates of the coefficients on the
variables that are included in the equation. Omitted variable bias is shown in the difference
between our first two limited dependent variable models. The first model is simply a regression
of termination as a function of age. The second model adds performance as an additional
2
explanatory variable. The addition of performance results in a reduced value for the coefficient
on age (from a 2% change per year to a 1.8% change). This is because age and performance
share explanatory “space” and without performance in the equation, the coefficient on age
absorbs some of the impact of performance. By including performance in the second model we
are eliminating some of the omitted variable bias in the original equation.
Age and performance are also statistically linked. In a regression of performance as a
function of age, the relationship between age and performance was statistically significant at the
1% level. The high F-statistic suggests that we are creating a good instrument.
 ageperf 055.552.6 

adjusted-R2 = .112 t = -6.414 F-statistic = 41.14
This result means that with a model of termination as a function of age and performance
it will be difficult to separate out the effects of increasing age and performance. In order to
separate the effects of age and performance on termination, we need to “cleanse” the
performance variable of the effects of age. Using the residuals of the above equation we were
able to create a new variable representing performance without the effect of age, or real
performance. Using this new variable we estimated termination as a function of age and real
performance. We then used a binomial logit model in order to determine the effects of age and
real performance. We chose this model because a linear probability model in OLS would result
in estimates outside the 0 and 1 bounds of the expected values, producing nonsensical results.
The binomial logit model will provide results within the bounds of the probability interval. The
binomial logit model also addresses the problem of heteroskedasticity in the linear probability
model. The most significant disadvantages to the binomial logit model is that it requires a large
sample size (the 319 observations in our sample may not be enough) and also requires high
representation of both groups (those fired and those not fired) in the sample. Weighing these
3
issues, we chose a two-stage binomial logit model as our best model to explain the effect of age
and real performance on the termination of an employee. The model we used is as follows:
Fired = f(age, real_perf)
2. Running the model above yields the following results:
 Fired = f(age, real_perf)
= -10.002 + 0.165 age – 0.635 real_perf
SE: 0.026 0.250
N = 319 R2
p = 0.868
Both age and real performance (performance cleansed of the effects of age) are highly significant
in predicting whether an individual is fired. Calculating the change in probability of each (as
*P(1-P)) yields an increase in the probability of being fired of 23% for each additional ten years
of age, and a decrease of 8.8% for each one-unit increase in worker performance.
Overall, this model accurately predicts whether a worker will be fired 86.8% of the time,
compared to a random sample accuracy of 83.4% (that is, the prediction that no individual will
be fired, giving no consideration to age or performance, is correct 83.4% of the time). The
model, however, continues to do a poor job of predicting correctly for those that were fired and
does a much better job of predicting those that were not fired.
Based on the strength of these results, it is clear that both age and performance are
important determiners of whether an individual is fired. What is not clear is whether
discrimination is occurring. The improvement in prediction of only 3.5 percentage points
suggests that there may still be other considerations that are not captured here – for example,
union membership. In addition, this model accurately predicts firings (as opposed to all
outcomes) only 30% of the time – 37 of the 53 terminations were not predicted by the model.
On the other hand, the fact that age is a highly significant predictor even after controlling for
workers’ performance strongly suggests that something shady is going on.
4
The law specifies that age discrimination applies to workers over 40. Substituting the age
variable for a dummy indicating whether a worker is over age 40 does not provide very
satisfying results – neither performance nor being over 40 is statistically significant, and the
model’s predictions are no better than the random sample. However, changing the variable to
represent whether a worker is over 50 gives age much more significance and reduces the
significance of performance to below the 95% level of certainty. While it is tempting to believe
that this means workers over 50 are being discriminated against, it is important to note that this
model also predicts firings no better than a random sample.
Based solely on the predictions of the model, then, it is difficult to be certain whether age
discrimination has occurred. The 86.8% accuracy of the best model considered is difficult, on its
own, to draw definite conclusions from. However, there are a few additional considerations that
push us towards the determination that discrimination has occurred.
The first of these is the fact that no workers under age 40 were terminated. This would
not be a concern if the young workers were all high performers. However, 54 of the 66 workers
younger than 40 had performance scores of 5.1 or below – 5.1 being the highest score of any
worker who was fired. If performance is indeed a negative function of age, these young
workers’ performance could only be expected to decline in the future – but they were kept on
despite their poor performance.
This fact, in combination with the predictions of the model that increasing age plays a
significant role in determining firings, leads us to conclude that age discrimination was indeed a
factor in termination decisions. While workers’ performance was also clearly an important
factor, it appears that once the poorly-performing workers were identified, the preference was to
dismiss the older workers and retain younger ones.
5
Appendix – StatisticalOutput
1. Binary Logistic Regressionof Job Termination on Age
2. Binary Logistic Regressionof Job Termination on Age and Performance
3. OLS Regression: Regressionof performance on age
Classification Tablea
264 2 99.2
46 7 13.2
85.0
Observed
.00
1.00
fired
Overall Percentage
Step 1
.00 1.00
fired Percentage
Correct
Predicted
The cut value is .500a.
Variables in the Equation
.146 .024 37.998 1 .000 1.158
-8.958 1.263 50.336 1 .000 .000
age
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: age.a.
Classification Tablea
261 5 98.1
37 16 30.2
86.8
Observed
.00
1.00
fired
Overall Percentage
Step 1
.00 1.00
fired Percentage
Correct
Predicted
The cut value is .500a.
Variables in the Equation
.130 .024 28.679 1 .000 1.139
-.635 .250 6.432 1 .011 .530
-5.843 1.653 12.492 1 .000 .003
age
perf
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: age, perf.a.
Model Summ ary
.339a
.115 .112 1.31881
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), agea.
6
4. Binary Logistic Regressionof Job Termination on age and real performance
5. Binary Logistic Regressionof Job Termination on age over 40 and real performance
ANOVAb
71.559 1 71.559 41.143 .000a
551.349 317 1.739
622.908 318
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), agea.
Dependent Variable: perfb.
Coefficientsa
6.552 .411 15.924 .000
-.055 .009 -.339 -6.414 .000
(Constant)
age
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: perfa.
Classification Tablea
261 5 98.1
37 16 30.2
86.8
Observed
.00
1.00
fired
Overall Percentage
Step 1
.00 1.00
fired Percentage
Correct
Predicted
The cut value is .500a.
Variables in the Equation
.165 .026 39.787 1 .000 1.180
-.635 .250 6.432 1 .011 .530
-10.002 1.404 50.714 1 .000 .000
age
RES_1
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: age, RES_1.a.
Classification Tablea
266 0 100.0
53 0 .0
83.4
Observed
.00
1.00
fired
Overall Percentage
Step 1
.00 1.00
fired Percentage
Correct
Predicted
The cut value is .500a.
7
6. Binary Logistic Regressionof Job Termination on age over 50 and real performance
Variables in the Equation
-.238 .193 1.527 1 .217 .788
19.871 4887.791 .000 1 .997 4.3E+08
-21.227 4887.791 .000 1 .997 .000
RES_1
over40
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: RES_1, over40.a.
Classification Tablea
266 0 100.0
53 0 .0
83.4
Observed
.00
1.00
fired
Overall Percentage
Step 1
.00 1.00
fired Percentage
Correct
Predicted
The cut value is .500a.
Variables in the Equation
-.376 .214 3.103 1 .078 .686
1.773 .350 25.668 1 .000 5.888
-2.597 .292 79.337 1 .000 .074
RES_1
over50
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: RES_1, over50.a.

More Related Content

PDF
A study of erp system for employee satisfaction in an it organization
PDF
Data Queries And Visualisation Basics Project Presentation
PPT
Aspr 2009 Presentation (Tony Machin)
PDF
Keith whitfield
PDF
Poster justice slutversion
DOCX
Dissertation Paper
PPT
Chapter05
PDF
SAPC 2009 - Patient satisfaction with Primary Care
A study of erp system for employee satisfaction in an it organization
Data Queries And Visualisation Basics Project Presentation
Aspr 2009 Presentation (Tony Machin)
Keith whitfield
Poster justice slutversion
Dissertation Paper
Chapter05
SAPC 2009 - Patient satisfaction with Primary Care

Viewers also liked (14)

DOCX
A dissertation report on analysis of patient satisfaction max polyclinic by ...
PDF
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
PDF
Multinomial Logistic Regression with Apache Spark
PPT
Logistic regression
PDF
Multivariate Analysis
PPTX
Logistic regression with SPSS examples
PDF
Intro to Classification: Logistic Regression & SVM
DOCX
Methods of multivariate analysis
PDF
Logistic regression
PPSX
Multivariate Analysis An Overview
PPT
Multivariate Analysis Techniques
ODP
Multiple linear regression
PPTX
Logistic regression
PPT
Slideshare Powerpoint presentation
A dissertation report on analysis of patient satisfaction max polyclinic by ...
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Multinomial Logistic Regression with Apache Spark
Logistic regression
Multivariate Analysis
Logistic regression with SPSS examples
Intro to Classification: Logistic Regression & SVM
Methods of multivariate analysis
Logistic regression
Multivariate Analysis An Overview
Multivariate Analysis Techniques
Multiple linear regression
Logistic regression
Slideshare Powerpoint presentation
Ad

Similar to Multivariate Techniques (20)

PDF
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
DOCX
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DOCX
Chi-square tests are great to show if distributions differ or i.docx
DOCX
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
PDF
Predicting Employee Attrition
PDF
Moving Beyond Linearity (Article 7 - Practical Exercises)
DOCX
Final Exam Due Friday, Week EightInstructions  Each response is.docx
PDF
Can CEO compensation be justified, at least statistically?
PDF
Can CEO compensation be justified, at least statistically?
PPTX
Med day presentation
DOCX
Points 100Assignment 3 Part II PowerPoint PresentationCriter.docx
PDF
MSc Finance_EF_0853352_Kartik Malla
PDF
2. Using our sample data, construct a 95 confidence interval for th.pdf
PPTX
Employee Termination
DOCX
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
PDF
Multiple regression in spss
PPT
Software Defect Repair Times: A Multiplicative Model
DOC
Regression Analysis of NBA Points Final
DOCX
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
PPT
Data Analysis for Graduate Studies Summary
Supuestos Actuariales en tasas contingentes- versión inglés (3).pdf
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
Chi-square tests are great to show if distributions differ or i.docx
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
Predicting Employee Attrition
Moving Beyond Linearity (Article 7 - Practical Exercises)
Final Exam Due Friday, Week EightInstructions  Each response is.docx
Can CEO compensation be justified, at least statistically?
Can CEO compensation be justified, at least statistically?
Med day presentation
Points 100Assignment 3 Part II PowerPoint PresentationCriter.docx
MSc Finance_EF_0853352_Kartik Malla
2. Using our sample data, construct a 95 confidence interval for th.pdf
Employee Termination
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
Multiple regression in spss
Software Defect Repair Times: A Multiplicative Model
Regression Analysis of NBA Points Final
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
Data Analysis for Graduate Studies Summary
Ad

Multivariate Techniques

  • 1. 1 Terry Chaney PA 5033 Multivariate Techniques 1. Terminating employees based solely upon an individual’s age is illegal under the Age Discrimination Act. However, it is often difficult to separate the effects of an employee’s job performance from that person’s age when dealing with age discrimination cases. In order to determine if employees have been wrongfully terminated based on age, a number of binary logistic regressions were done to test if age alone factored into the firing decisions. A limited dependent variable model was used because the dependent variable (job termination) is a dummy variable with only two values (0 = not fired, 1 = fired). This model was initially used to test job termination as a function of age; the same model was used to regress job termination on age and performance.  N = 319; 53 were fired (P = .166)  Fired = f(+age): ageFired 146.958.8   )1( PP    = .146(.166).834 = .020 = 2% R2 p = .85 Wald = 37.998  Fired = f(+age, -perf): perfageFired 635.130.843.5   R2 p = .868 Wald = 28.679; 6.432 Age = )1( PP    = .130(.166).834 = .0179 = 1.8% Perf = )1( PP    = -.635(.166).834 = -.088 = -8.8% Omitted variable bias occurs when there is a relevant independent variable omitted from the specification of the equation. This will cause bias in the estimates of the coefficients on the variables that are included in the equation. Omitted variable bias is shown in the difference between our first two limited dependent variable models. The first model is simply a regression of termination as a function of age. The second model adds performance as an additional
  • 2. 2 explanatory variable. The addition of performance results in a reduced value for the coefficient on age (from a 2% change per year to a 1.8% change). This is because age and performance share explanatory “space” and without performance in the equation, the coefficient on age absorbs some of the impact of performance. By including performance in the second model we are eliminating some of the omitted variable bias in the original equation. Age and performance are also statistically linked. In a regression of performance as a function of age, the relationship between age and performance was statistically significant at the 1% level. The high F-statistic suggests that we are creating a good instrument.  ageperf 055.552.6   adjusted-R2 = .112 t = -6.414 F-statistic = 41.14 This result means that with a model of termination as a function of age and performance it will be difficult to separate out the effects of increasing age and performance. In order to separate the effects of age and performance on termination, we need to “cleanse” the performance variable of the effects of age. Using the residuals of the above equation we were able to create a new variable representing performance without the effect of age, or real performance. Using this new variable we estimated termination as a function of age and real performance. We then used a binomial logit model in order to determine the effects of age and real performance. We chose this model because a linear probability model in OLS would result in estimates outside the 0 and 1 bounds of the expected values, producing nonsensical results. The binomial logit model will provide results within the bounds of the probability interval. The binomial logit model also addresses the problem of heteroskedasticity in the linear probability model. The most significant disadvantages to the binomial logit model is that it requires a large sample size (the 319 observations in our sample may not be enough) and also requires high representation of both groups (those fired and those not fired) in the sample. Weighing these
  • 3. 3 issues, we chose a two-stage binomial logit model as our best model to explain the effect of age and real performance on the termination of an employee. The model we used is as follows: Fired = f(age, real_perf) 2. Running the model above yields the following results:  Fired = f(age, real_perf) = -10.002 + 0.165 age – 0.635 real_perf SE: 0.026 0.250 N = 319 R2 p = 0.868 Both age and real performance (performance cleansed of the effects of age) are highly significant in predicting whether an individual is fired. Calculating the change in probability of each (as *P(1-P)) yields an increase in the probability of being fired of 23% for each additional ten years of age, and a decrease of 8.8% for each one-unit increase in worker performance. Overall, this model accurately predicts whether a worker will be fired 86.8% of the time, compared to a random sample accuracy of 83.4% (that is, the prediction that no individual will be fired, giving no consideration to age or performance, is correct 83.4% of the time). The model, however, continues to do a poor job of predicting correctly for those that were fired and does a much better job of predicting those that were not fired. Based on the strength of these results, it is clear that both age and performance are important determiners of whether an individual is fired. What is not clear is whether discrimination is occurring. The improvement in prediction of only 3.5 percentage points suggests that there may still be other considerations that are not captured here – for example, union membership. In addition, this model accurately predicts firings (as opposed to all outcomes) only 30% of the time – 37 of the 53 terminations were not predicted by the model. On the other hand, the fact that age is a highly significant predictor even after controlling for workers’ performance strongly suggests that something shady is going on.
  • 4. 4 The law specifies that age discrimination applies to workers over 40. Substituting the age variable for a dummy indicating whether a worker is over age 40 does not provide very satisfying results – neither performance nor being over 40 is statistically significant, and the model’s predictions are no better than the random sample. However, changing the variable to represent whether a worker is over 50 gives age much more significance and reduces the significance of performance to below the 95% level of certainty. While it is tempting to believe that this means workers over 50 are being discriminated against, it is important to note that this model also predicts firings no better than a random sample. Based solely on the predictions of the model, then, it is difficult to be certain whether age discrimination has occurred. The 86.8% accuracy of the best model considered is difficult, on its own, to draw definite conclusions from. However, there are a few additional considerations that push us towards the determination that discrimination has occurred. The first of these is the fact that no workers under age 40 were terminated. This would not be a concern if the young workers were all high performers. However, 54 of the 66 workers younger than 40 had performance scores of 5.1 or below – 5.1 being the highest score of any worker who was fired. If performance is indeed a negative function of age, these young workers’ performance could only be expected to decline in the future – but they were kept on despite their poor performance. This fact, in combination with the predictions of the model that increasing age plays a significant role in determining firings, leads us to conclude that age discrimination was indeed a factor in termination decisions. While workers’ performance was also clearly an important factor, it appears that once the poorly-performing workers were identified, the preference was to dismiss the older workers and retain younger ones.
  • 5. 5 Appendix – StatisticalOutput 1. Binary Logistic Regressionof Job Termination on Age 2. Binary Logistic Regressionof Job Termination on Age and Performance 3. OLS Regression: Regressionof performance on age Classification Tablea 264 2 99.2 46 7 13.2 85.0 Observed .00 1.00 fired Overall Percentage Step 1 .00 1.00 fired Percentage Correct Predicted The cut value is .500a. Variables in the Equation .146 .024 37.998 1 .000 1.158 -8.958 1.263 50.336 1 .000 .000 age Constant Step 1 a B S.E. Wald df Sig. Exp(B) Variable(s) entered on step 1: age.a. Classification Tablea 261 5 98.1 37 16 30.2 86.8 Observed .00 1.00 fired Overall Percentage Step 1 .00 1.00 fired Percentage Correct Predicted The cut value is .500a. Variables in the Equation .130 .024 28.679 1 .000 1.139 -.635 .250 6.432 1 .011 .530 -5.843 1.653 12.492 1 .000 .003 age perf Constant Step 1 a B S.E. Wald df Sig. Exp(B) Variable(s) entered on step 1: age, perf.a. Model Summ ary .339a .115 .112 1.31881 Model 1 R R Square Adjusted R Square Std. Error of the Estimate Predictors: (Constant), agea.
  • 6. 6 4. Binary Logistic Regressionof Job Termination on age and real performance 5. Binary Logistic Regressionof Job Termination on age over 40 and real performance ANOVAb 71.559 1 71.559 41.143 .000a 551.349 317 1.739 622.908 318 Regression Residual Total Model 1 Sum of Squares df Mean Square F Sig. Predictors: (Constant), agea. Dependent Variable: perfb. Coefficientsa 6.552 .411 15.924 .000 -.055 .009 -.339 -6.414 .000 (Constant) age Model 1 B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: perfa. Classification Tablea 261 5 98.1 37 16 30.2 86.8 Observed .00 1.00 fired Overall Percentage Step 1 .00 1.00 fired Percentage Correct Predicted The cut value is .500a. Variables in the Equation .165 .026 39.787 1 .000 1.180 -.635 .250 6.432 1 .011 .530 -10.002 1.404 50.714 1 .000 .000 age RES_1 Constant Step 1 a B S.E. Wald df Sig. Exp(B) Variable(s) entered on step 1: age, RES_1.a. Classification Tablea 266 0 100.0 53 0 .0 83.4 Observed .00 1.00 fired Overall Percentage Step 1 .00 1.00 fired Percentage Correct Predicted The cut value is .500a.
  • 7. 7 6. Binary Logistic Regressionof Job Termination on age over 50 and real performance Variables in the Equation -.238 .193 1.527 1 .217 .788 19.871 4887.791 .000 1 .997 4.3E+08 -21.227 4887.791 .000 1 .997 .000 RES_1 over40 Constant Step 1 a B S.E. Wald df Sig. Exp(B) Variable(s) entered on step 1: RES_1, over40.a. Classification Tablea 266 0 100.0 53 0 .0 83.4 Observed .00 1.00 fired Overall Percentage Step 1 .00 1.00 fired Percentage Correct Predicted The cut value is .500a. Variables in the Equation -.376 .214 3.103 1 .078 .686 1.773 .350 25.668 1 .000 5.888 -2.597 .292 79.337 1 .000 .074 RES_1 over50 Constant Step 1 a B S.E. Wald df Sig. Exp(B) Variable(s) entered on step 1: RES_1, over50.a.