1
Caleb Engelbourg
Stat 525
IE Project
The Gender Pay Gap: A Statistical Analysis
1. Introduction
1.1 Background
In the early 1960’s in the United States there was a large feminist movement to
promote women in the work place. Legislation was passed making it illegal to
discriminate on a basis of gender. Despite the laws passed by congress, there seems to
have been a discernable pay gap between men and women in the same position. We
aim to determine whether wages are based solely on gender or if other characteristics
play a role in pay discrepancy.
The Equal Pay Act of 1963 requires that organizations pay men and women the
same amount. According to the law, the organization is required to have equal pay
across genders and cannot bypass the law by making different job titles for the same
occupation. However, if there is a legitimate reason for a pay discrepancy, such as merit
or seniority, this does not go against the law (DeNisi and Griffin 2014).
There have been many cases dealing with Equal Pay Act discrimination; one
interesting case is Stanley v University of Southern California. In 1999, Marianne
Stanley, the coach of the women’s basketball team at USC, refused to take a salary
lower than the men’s coach. When the university would not match the salary, she sued
the school under the Equal Pay Act of 1963. In the ruling for her case, the court
determined that the women’s head coaching job was not substantially the same as the
men’s head coaching job. The court determined that the men’s head coach had better
2
skills, qualifications, and experience. In addition, men’s coaches have greater
responsibility in speaking and fundraising engagements (Sharp, Moore and Claussen,
2007). The court ruled in favor of the university and Stanley did not remain as the coach
of the women’s team. There have been similar cases in the United States where women
have brought up lawsuits regarding compensation issues in positions where they were
of the same status as men.
1.2 Data
In order to determine whether wages are based solely on gender or if other
characteristics play a role in pay discrepancy we will use data that was collected from
the Current Population Survey (CPS) from 1985 with a random sample of 534 people.
Information was collected on wages, sex, years of education, years of work experience,
occupational category, region of residence, marital status, union membership, age,
race, and sector. This data will be used to analyze the determining factors in pay
discrepancies.
1.3 Goals and Hypotheses
Our main goal is to determine whether or not there is a pay difference between
men and women. We also wanted to see if any other variables had influence on the
wage of an employee. We predict that there will in fact be a significant gender pay gap,
and that it will be the biggest factor for determining wage.
1.4 Model
We expect that our final model will be linear in the form shown below.
3
2. Methods / Results
2.1 Dependent Variable Selection
When looking at the data for wages, we noticed a strong right skewness in the
data. Therefore we decided that a natural log-transformation would be appropriate for
the data. As seen in the summary charts below (Figure 1), the data has a more
centered distribution after the transformation, although there are still long tails in the
distribution. Before the transformation there are many outliers with one very extreme
outlier on the high end. After the transformation there is one outlier on the high end and
one on the low end, but the data is very symmetrical. Additionally, the log transformation
allows for an implicit interpretation of the regression coefficients.
Figure 1
2.2 Independent Variable Selection
After reviewing the data from the CPS in 1985, we eventually decided to include
seven independent variables in our initial model: education, experience, sex, union,
4
south, and occupation. To select these variables, we used the best subset method, as
described below.
Since the goal of our project is to see what the effect is of wages by gender, we
made sure to include gender in every subset of variables we selected from. Gender did
not always show up in the best subset outputs, but our final model includes it as an
independent variable. Race did show up in our best subset, but we later dropped it
because we found it was not significant, as we will explain later.
When selecting our best subset, we looked for a higher adjusted R-squared, a
C(p) close to p, and a minimized SBC and AIC. When comparing best subset outputs, it
was important for us to consider that for our categorical variables, there are many
different variable names; therefore, if we wanted to select the occupation variable, we
needed to put all 5 occupation categories into the model. When comparing the top
subsets based on adjusted R-squared, C(p), SBC, and AIC, our best models included
only some of the occupation variables, so we could not choose the models with the best
adjusted R-squared, C(p), SBC, and AIC values. The results for our best subsets are
shown below, with our model in the first row (Table 1).
Table 1
5
For our categorical variables, we needed to create n-1 variables, where n is the
number of categories. Occupation was split into five separate variables: Management,
Sales, Clerical, Service and Professional, for any other sector all five of these variables
will be equal to zero. Race was separated into two variables, white and Hispanic. Any
other race was indicated when both of these variables were equal to zero.
Overall, after our exploration of the independent variables, we kept education,
experience, sex, union, race, south, and occupation.
2.3 The Model
With our predictors selected, we ran the regression in SAS; however, we found
that the race variables were not significant, so we decided to remove the two categorical
race variables from our final model. Additionally, the coefficient for gender was changed
by less than 0.003 when we removed race and our adjusted R-squared changed by less
than 0.01 for our final model, so we were comfortable with removing race from our final
model. We then decided to check interaction terms to see if sex and other variables
combined to have a significant effect on wages, but we found that no interaction term
was statistically significant in our model.
Our final model included the variables education, experience, sex, union, south,
and occupation as predictors. The fitted model is shown below:
Table 2 shows our F-value and corresponding p-value and the R-square and
Adjusted R-Square values. Table 3 shows our parameter estimates, corresponding
6
standard error, t and p-values and a 95% Confidence interval for each predictor in our
final model.
F-Value Pr > F R-Square Adj R-Square
27.97 <.0001 0.3485 0.3360
Table 2
Table 3
2.4 Residual Interpretation
For our final model, the normal Q-Q plot (Figure 2) shows good linearity with a
couple possible outliers on the tails. However, our residuals versus predicted values
(Figure 3) may be showing increasing variance. We further investigated the residuals for
each predictor and identified that education was the likely cause because its residuals
also showed potential increasing variance (Figure 4). In order to remedy the residual
7
plot, we tried different transformations, but our transformations actually made the
residual plot for education worse, as evidenced by our log transformation (Figure 5). We
therefore, decided to proceed with our model even though there is possible increasing
variance in our residual versus predicted values plot.
Figure 2 Figure 3
Figure 4 Figure 5
2.5 Model Validation
In conducting our influential cases analysis, we found no influential cases for
DFFITS or COOKS. However, there was a total of 115 influential cases for DFBETAS.
Since the influential cases for DFBETAS made up 20% of our observations, and we
8
found no influential cases with the other methods we decided to keep all of our data in
our final model.
In our final model, we found that we did not have any multicollinearity issues. All
of our predictors have VIF values close to 1 as seen in Table 4. Our average VIF is
1.47, indicating no multicollinearity issues.
Table 4
2.6 Statistical Inference
For our final model, our F-value is 27.97 with a corresponding p-value very close
to 0, so we reject the null hypothesis that all the b(i)=0. Therefore, we can conclude that
there is a regression relationship. Additionally, Our final model has an R-squared value
of 0.3485, meaning that 34.85% of the variation in the data can be explained by the
regression model.
Since our variable of interest is sex, we ran a t-test to ensure that it is statistically
significant. Our t-value is -4.99 with a corresponding p-value very close to 0, so we
reject the null hypothesis that b(1)=0. Additionally we use a 95% confidence interval to
conclude that the true coefficient of b(1) is in the interval [-0.290, -0.126].
Because we used a log transformation on Y, we interpret that as each b(i)
increases by 1, the average wages change by 100*b(j)%, all else constant. For
categorical predictors, 100*b(j) represents the average percent difference for the
average wage in that category. Therefore for our variable of interest sex, we conclude
that women make 20.8% less on average than men, all else equal. Additionally, we are
9
95% confident that the true mean difference in men’s and women’s wages is in the
interval [-29%, -12.6%].
3. Conclusion
Based off of the 1985 Current Population Survey data, we found there to be a
significant difference in the wages of men and women. Women make on average 20.8%
less on average than men. The pay gap for the different genders was the highest of any
of our categorical predictors.
Although the wage gap for gender was the largest, there are additional factors
that explain wage differences. Interestingly enough an occupation in management leads
to an average of 20.5% increase in wages, while an occupation in service leads to an
average decrease of 20% in wages. In addition, membership in a union also leads to
average of 20.6% increase in wages. Two other noteworthy factors that affects both
genders is that for every additional year of experience, average wage increases by 1%,
and an additional year of education increases average wages by 6.9%.
Although the Equal Pay Act was passed in 1963, the legislation did not end the
issue of the gender pay gap. Our results show that in 1985, there was still a significant
pay gap between the genders, with women making 20.8% less on average than men.
This problem is ongoing, as legislature is still attempting to accomplish the goal of the
original Equal Pay Act of 1963.
One such recent law is the Lilly Ledbetter Fair Pay Act of 2009. Lilly Ledbetter
was a production supervisor at Goodyear that was paid 40% less than her lowest paid
male counterpart. Her case made it all the way to the Supreme Court, where Goodyear
10
argued, “Ms. Ledbetter had been discriminated against BUT that the discrimination took
place more than 180 days before the charges were filed. Thus, the case could not be
raised because there was a 180 day limitation as part of the law” (DeNisi and Griffin).
The Supreme Court agreed with Goodyear’s defense and took away the money
awarded to her at lower court levels.
This ruling provided evidence of a flaw in the current legislation, so in 2009,
President Obama signed the Ledbetter Fair Pay Act into legislation. The new law states
that the 180-day statute of limitations restarts with each paycheck that the employee
receives, allowing damages to be paid long past when the discrimination occurs (DeNisi
and Griffin). This provided a much-needed update to the Equal Pay Act and will
hopefully help to influence payment practices of employers
The gender pay gap is an ongoing issue in the United States, with legislation
struggling to change actual practices of employers. Data analysis of the 1985 CPS
reinforces this point, as women were still paid 20.8% less on average than men 22
years after the Equal Pay Act was passed. Further research into more current data is
necessary to see if this trend is a continuing issue in the United States.
11
Works Cited
DeNisi, Angelo S., and Ricky W. Griffin. HR2. Mason: South Western Cengage
Learning, 2014. Print.
Sharp, Linda A., Anita M. Moorman, and Cathryn L. Claussen. Sport Law: A Managerial
Approach. Scottsdale, AZ: Holcomb Hathaway, 2007. Print.

More Related Content

PDF
Econometrics and statistics mcqs part 2
DOCX
MLR Project (Onion)
DOCX
8225 project report (2) (1)
PPT
Lecture8 Applied Econometrics and Economic Modeling
PPT
PPTX
Tempest in teapot
PPTX
Correlational
PDF
Correlational 160426022825
Econometrics and statistics mcqs part 2
MLR Project (Onion)
8225 project report (2) (1)
Lecture8 Applied Econometrics and Economic Modeling
Tempest in teapot
Correlational
Correlational 160426022825

Viewers also liked (9)

PDF
Golsan Scruggs RIASure Risk Guide
ODP
Gilvicenteacabou4
PPSX
World of warcraft
PPTX
Copia di towards an ecology 2
PPT
Office Environment & Employee motivation by syed yeasin arafat
PDF
Achieving sustainable cities in Saudi Arabia-Juggling the competing urbanizat...
PPS
Zitronenfestin Mentor
PPTX
Kaleidoscope Careers PPTs
PDF
Successful writing proficiency_teachers
Golsan Scruggs RIASure Risk Guide
Gilvicenteacabou4
World of warcraft
Copia di towards an ecology 2
Office Environment & Employee motivation by syed yeasin arafat
Achieving sustainable cities in Saudi Arabia-Juggling the competing urbanizat...
Zitronenfestin Mentor
Kaleidoscope Careers PPTs
Successful writing proficiency_teachers
Ad

Similar to ie project final (20)

DOCX
Week 3 Lecture 11 Regression Analysis Regression analy.docx
DOC
project econ 436511
PDF
Dummyvariable1
DOCX
1Create a correlation table for the variables in our data set. (Us.docx
PDF
SDM Mini Project Group F
DOCX
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
DOCX
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
PPTX
dummy variableregressionPPT HIMANSHU 22347.pptx
DOCX
Gender Differences in Returns on Education  I. Int.docx
DOCX
The Racial Gap on Wages
DOCX
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
DOC
Ash bus 308 week 2 problem set new
DOC
Ash bus 308 week 2 problem set new
DOCX
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DOCX
Chi-square tests are great to show if distributions differ or i.docx
DOC
Ash bus 308 week 2 problem set new
DOC
Ash bus 308 week 2 problem set new
DOC
Ash bus 308 week 2 problem set new
DOC
Ash bus 308 week 2 problem set new
DOCX
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
Week 3 Lecture 11 Regression Analysis Regression analy.docx
project econ 436511
Dummyvariable1
1Create a correlation table for the variables in our data set. (Us.docx
SDM Mini Project Group F
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
dummy variableregressionPPT HIMANSHU 22347.pptx
Gender Differences in Returns on Education  I. Int.docx
The Racial Gap on Wages
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
Chi-square tests are great to show if distributions differ or i.docx
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
Ad

ie project final

  • 1. 1 Caleb Engelbourg Stat 525 IE Project The Gender Pay Gap: A Statistical Analysis 1. Introduction 1.1 Background In the early 1960’s in the United States there was a large feminist movement to promote women in the work place. Legislation was passed making it illegal to discriminate on a basis of gender. Despite the laws passed by congress, there seems to have been a discernable pay gap between men and women in the same position. We aim to determine whether wages are based solely on gender or if other characteristics play a role in pay discrepancy. The Equal Pay Act of 1963 requires that organizations pay men and women the same amount. According to the law, the organization is required to have equal pay across genders and cannot bypass the law by making different job titles for the same occupation. However, if there is a legitimate reason for a pay discrepancy, such as merit or seniority, this does not go against the law (DeNisi and Griffin 2014). There have been many cases dealing with Equal Pay Act discrimination; one interesting case is Stanley v University of Southern California. In 1999, Marianne Stanley, the coach of the women’s basketball team at USC, refused to take a salary lower than the men’s coach. When the university would not match the salary, she sued the school under the Equal Pay Act of 1963. In the ruling for her case, the court determined that the women’s head coaching job was not substantially the same as the men’s head coaching job. The court determined that the men’s head coach had better
  • 2. 2 skills, qualifications, and experience. In addition, men’s coaches have greater responsibility in speaking and fundraising engagements (Sharp, Moore and Claussen, 2007). The court ruled in favor of the university and Stanley did not remain as the coach of the women’s team. There have been similar cases in the United States where women have brought up lawsuits regarding compensation issues in positions where they were of the same status as men. 1.2 Data In order to determine whether wages are based solely on gender or if other characteristics play a role in pay discrepancy we will use data that was collected from the Current Population Survey (CPS) from 1985 with a random sample of 534 people. Information was collected on wages, sex, years of education, years of work experience, occupational category, region of residence, marital status, union membership, age, race, and sector. This data will be used to analyze the determining factors in pay discrepancies. 1.3 Goals and Hypotheses Our main goal is to determine whether or not there is a pay difference between men and women. We also wanted to see if any other variables had influence on the wage of an employee. We predict that there will in fact be a significant gender pay gap, and that it will be the biggest factor for determining wage. 1.4 Model We expect that our final model will be linear in the form shown below.
  • 3. 3 2. Methods / Results 2.1 Dependent Variable Selection When looking at the data for wages, we noticed a strong right skewness in the data. Therefore we decided that a natural log-transformation would be appropriate for the data. As seen in the summary charts below (Figure 1), the data has a more centered distribution after the transformation, although there are still long tails in the distribution. Before the transformation there are many outliers with one very extreme outlier on the high end. After the transformation there is one outlier on the high end and one on the low end, but the data is very symmetrical. Additionally, the log transformation allows for an implicit interpretation of the regression coefficients. Figure 1 2.2 Independent Variable Selection After reviewing the data from the CPS in 1985, we eventually decided to include seven independent variables in our initial model: education, experience, sex, union,
  • 4. 4 south, and occupation. To select these variables, we used the best subset method, as described below. Since the goal of our project is to see what the effect is of wages by gender, we made sure to include gender in every subset of variables we selected from. Gender did not always show up in the best subset outputs, but our final model includes it as an independent variable. Race did show up in our best subset, but we later dropped it because we found it was not significant, as we will explain later. When selecting our best subset, we looked for a higher adjusted R-squared, a C(p) close to p, and a minimized SBC and AIC. When comparing best subset outputs, it was important for us to consider that for our categorical variables, there are many different variable names; therefore, if we wanted to select the occupation variable, we needed to put all 5 occupation categories into the model. When comparing the top subsets based on adjusted R-squared, C(p), SBC, and AIC, our best models included only some of the occupation variables, so we could not choose the models with the best adjusted R-squared, C(p), SBC, and AIC values. The results for our best subsets are shown below, with our model in the first row (Table 1). Table 1
  • 5. 5 For our categorical variables, we needed to create n-1 variables, where n is the number of categories. Occupation was split into five separate variables: Management, Sales, Clerical, Service and Professional, for any other sector all five of these variables will be equal to zero. Race was separated into two variables, white and Hispanic. Any other race was indicated when both of these variables were equal to zero. Overall, after our exploration of the independent variables, we kept education, experience, sex, union, race, south, and occupation. 2.3 The Model With our predictors selected, we ran the regression in SAS; however, we found that the race variables were not significant, so we decided to remove the two categorical race variables from our final model. Additionally, the coefficient for gender was changed by less than 0.003 when we removed race and our adjusted R-squared changed by less than 0.01 for our final model, so we were comfortable with removing race from our final model. We then decided to check interaction terms to see if sex and other variables combined to have a significant effect on wages, but we found that no interaction term was statistically significant in our model. Our final model included the variables education, experience, sex, union, south, and occupation as predictors. The fitted model is shown below: Table 2 shows our F-value and corresponding p-value and the R-square and Adjusted R-Square values. Table 3 shows our parameter estimates, corresponding
  • 6. 6 standard error, t and p-values and a 95% Confidence interval for each predictor in our final model. F-Value Pr > F R-Square Adj R-Square 27.97 <.0001 0.3485 0.3360 Table 2 Table 3 2.4 Residual Interpretation For our final model, the normal Q-Q plot (Figure 2) shows good linearity with a couple possible outliers on the tails. However, our residuals versus predicted values (Figure 3) may be showing increasing variance. We further investigated the residuals for each predictor and identified that education was the likely cause because its residuals also showed potential increasing variance (Figure 4). In order to remedy the residual
  • 7. 7 plot, we tried different transformations, but our transformations actually made the residual plot for education worse, as evidenced by our log transformation (Figure 5). We therefore, decided to proceed with our model even though there is possible increasing variance in our residual versus predicted values plot. Figure 2 Figure 3 Figure 4 Figure 5 2.5 Model Validation In conducting our influential cases analysis, we found no influential cases for DFFITS or COOKS. However, there was a total of 115 influential cases for DFBETAS. Since the influential cases for DFBETAS made up 20% of our observations, and we
  • 8. 8 found no influential cases with the other methods we decided to keep all of our data in our final model. In our final model, we found that we did not have any multicollinearity issues. All of our predictors have VIF values close to 1 as seen in Table 4. Our average VIF is 1.47, indicating no multicollinearity issues. Table 4 2.6 Statistical Inference For our final model, our F-value is 27.97 with a corresponding p-value very close to 0, so we reject the null hypothesis that all the b(i)=0. Therefore, we can conclude that there is a regression relationship. Additionally, Our final model has an R-squared value of 0.3485, meaning that 34.85% of the variation in the data can be explained by the regression model. Since our variable of interest is sex, we ran a t-test to ensure that it is statistically significant. Our t-value is -4.99 with a corresponding p-value very close to 0, so we reject the null hypothesis that b(1)=0. Additionally we use a 95% confidence interval to conclude that the true coefficient of b(1) is in the interval [-0.290, -0.126]. Because we used a log transformation on Y, we interpret that as each b(i) increases by 1, the average wages change by 100*b(j)%, all else constant. For categorical predictors, 100*b(j) represents the average percent difference for the average wage in that category. Therefore for our variable of interest sex, we conclude that women make 20.8% less on average than men, all else equal. Additionally, we are
  • 9. 9 95% confident that the true mean difference in men’s and women’s wages is in the interval [-29%, -12.6%]. 3. Conclusion Based off of the 1985 Current Population Survey data, we found there to be a significant difference in the wages of men and women. Women make on average 20.8% less on average than men. The pay gap for the different genders was the highest of any of our categorical predictors. Although the wage gap for gender was the largest, there are additional factors that explain wage differences. Interestingly enough an occupation in management leads to an average of 20.5% increase in wages, while an occupation in service leads to an average decrease of 20% in wages. In addition, membership in a union also leads to average of 20.6% increase in wages. Two other noteworthy factors that affects both genders is that for every additional year of experience, average wage increases by 1%, and an additional year of education increases average wages by 6.9%. Although the Equal Pay Act was passed in 1963, the legislation did not end the issue of the gender pay gap. Our results show that in 1985, there was still a significant pay gap between the genders, with women making 20.8% less on average than men. This problem is ongoing, as legislature is still attempting to accomplish the goal of the original Equal Pay Act of 1963. One such recent law is the Lilly Ledbetter Fair Pay Act of 2009. Lilly Ledbetter was a production supervisor at Goodyear that was paid 40% less than her lowest paid male counterpart. Her case made it all the way to the Supreme Court, where Goodyear
  • 10. 10 argued, “Ms. Ledbetter had been discriminated against BUT that the discrimination took place more than 180 days before the charges were filed. Thus, the case could not be raised because there was a 180 day limitation as part of the law” (DeNisi and Griffin). The Supreme Court agreed with Goodyear’s defense and took away the money awarded to her at lower court levels. This ruling provided evidence of a flaw in the current legislation, so in 2009, President Obama signed the Ledbetter Fair Pay Act into legislation. The new law states that the 180-day statute of limitations restarts with each paycheck that the employee receives, allowing damages to be paid long past when the discrimination occurs (DeNisi and Griffin). This provided a much-needed update to the Equal Pay Act and will hopefully help to influence payment practices of employers The gender pay gap is an ongoing issue in the United States, with legislation struggling to change actual practices of employers. Data analysis of the 1985 CPS reinforces this point, as women were still paid 20.8% less on average than men 22 years after the Equal Pay Act was passed. Further research into more current data is necessary to see if this trend is a continuing issue in the United States.
  • 11. 11 Works Cited DeNisi, Angelo S., and Ricky W. Griffin. HR2. Mason: South Western Cengage Learning, 2014. Print. Sharp, Linda A., Anita M. Moorman, and Cathryn L. Claussen. Sport Law: A Managerial Approach. Scottsdale, AZ: Holcomb Hathaway, 2007. Print.