SlideShare a Scribd company logo
Insu Paek*
Department of Educational Psychology & Learning Systems, USA
*Corresponding author: Insu Paek, Associate professor, Measurement & Statistics Program, Educational Psychology & Learning Systems, Florida
State University, Tallahassee, USA, Tel: 850-644-3064; Email:
Submission: March 09, 2018; Published: May 27, 2018
Understanding Differential Item Functioning
and Item bias In Psychological Instruments
Introduction
For a psychological test or instrument to function properly
as intended, items in the test should measure respondents’
performance fairly across different groups of respondents such
as male and female. In psychometric literature, the concept of
differential item functioning (DIF) has been introduced to address
the differential group performance on an item when the groups are
equated at the same level of ability or latent trait status. This article
introduces the concept of DIF while making a clear distinction of
DIF from item bias and simple group performance difference.
Since the civil rights era of the 1960’s in the United States,
inequity has become a critical social issue. The area of educational
and psychological testing is no exception. The use of testing as a
sorting mechanism [1] has brought equity concerns to many
people, specifically the testing enterprise. Academic research on
group differences and public awareness of them has resulted in
the examination of whether tests in educational and psychological
testing are disadvantaging minority groups. A well-known incident
about bias issue and group differences is “Golden Rule” settlement
in 1984. The Golden Rule insurance company in 1976 filed a
lawsuit against Illinois Department of Insurance and Educational
Testing Service, charging racial bias in Illinois insurance licensing
exams. The lawsuit led to an out-of-court settlement, ending the
8-year-old suit. The gist of the settlement was elimination of any
items showing different item proportion correct (i.e., proportion of
yes/correct answers in an item which is called “item p-value” or
“marginal item proportion-correct”) across the compared groups.
(see for detail, e.g., [2])
Even before the Golden Rule settlement, there was a claim in the
academiccommunitythatsometests(e.g.,IQtest)arebiasedagainst
minority groups. Some researchers investigated item p-value and
considered an item to be biased if it showed a big difference in the
item p-value between the compared groups (e.g., white majority
group vs. black minority group). This approach is consistent with
the solution suggested by the Golden Rule settlement. However,
this approach of using the marginal item proportion-correct is
flawed because it does not distinguish the true group difference
and the true bias. This drawback of the Golden Rule settlement
procedure has been pointed out by many academic researchers.
For example, Gregory R. Anrig, the president of Educational Testing
Service announced that the Golden Rule settlement was “an error
of judgment” (see also for the side effect of executing the Golden
Rule procedure, e.g., (3)). One could ask “Is it right to make group
differences negligible by manipulating the test items (by excluding
and revising items) if there is actually a real group difference
possibly created by past or present social inequity?”.
Technically the major drawback of this marginal proportion
correct approach is the confounding of group difference and real
bias. The marginal probability of item correct is affected by the
population distribution - related to group mean difference - and
by the item response function - related to item bias. That is, the
marginal probability (observed proportion correct or incorrect) is
represented as
(1 )
( ) ( ) ( ) ( )x x
p x P Q dFθ θ θ−
= ∫
where p(x) is a marginal probability of either x=Yes/correct or
x=No/incorrect, θ is a person latent trait (or ability), P(θ) is the item
response function, Q(θ) =1-P(θ), and F(θ) is the distribution of θ. In
the above presentation, one can see that person latent trait/ability
and item characteristics are confounded in the observed proportion
of x. (Note that a similar equation can be expressed for the Likert
style response items or graded response items, showing that the
observed marginal score is based on both item responses function
and the latent trait distribution). If we see a large difference in the
proportion correct between the two groups, we cannot draw the
conclusion that the item is really biased. The large difference could
be due to a real group ability difference between the two groups, a
bias factor disadvantaging one group in the item, or both, which is
probably the case in many real-world applications.
In subsequent years, the definition of bias and the methodology
of its detection have been refined. The word “bias” is now replaced
by a term, “Differential Item Functioning” (DIF), at least in
academia. Because of the social connotation of the word, “bias”,
Mini Review
1/2Copyright © All rights are reserved by Insu Paek.
Volume 1 - Issue - 3
Psychology and Psychotherapy:
Research StudyC CRIMSON PUBLISHERS
Wings to the Research
ISSN 2639-0612
Psychol Psychother Res Stud
2/2How to cite this article: Insu P. Understanding Differential Item Functioning and Item bias In Psychological Instruments. Psychol Psychother Res Stud .1(3).
PPRS.000514.2018. DOI: 10.31031/PPRS.2018.01.000514
Volume 1 - Issue - 3
Copyright © Insu Paek
Holland & Thayer in 1988 [4]. suggested the alternative term DIF in
place of “bias”. The complexity of the usage of these terms has been
a source of confusion in the communication between the technical
measurement community and the public [5]. DIF is a neutral term,
indicating the magnitude of advantage or disadvantage presented
by an item to a group, which is usually estimated through statistical
analysis. In recent years, identifying DIF items and classifying some
(or all) of those DIF items as biased items are considered separate.
The former is a statistical concept while the latter is more than
statistical including the interpretation of the identified DIF in the
context of social justice.
A formal definition of no DIF [6-9] can be given as follows.
,( ) ,( )E X G E Xθ θ=
where E is the expectation operator, X is a categorical ordinal
item response (e.g., X=1 (strongly disagree), 2 (disagree), 3 (agree),
or 4 (strongly agree) in the 4-option Likert style item test), G is a
group indicator (e.g., 1=Female and 0=Male; 1=African American
and 0=White), and θ is person latent trait/ ability. Sometimes, no
DIF is expressed using an observed variable Z instead of θ, which
is a proxy for θ. The above definition of no DIF states, in words,
that there is no DIF if the expected item score for one group and
the expected item score for the other group are the same when
the latent trait/ability scores are equated. Again, DIF is about a
conditional comparison between the two compared groups on the
same trait/ability level, not a marginal comparison. Those who
would like to know more about the methods of DIF detection are
referred to [10].
From the test validity point of view, DIF and its detection are of
importance and the existence of DIF calls into question the fairness
of testing. Although a test constructed without DIF cannot undo
the past inequalities, it can reveal the inequalities which may have
been created by past and existing inequity, thereby giving people a
chance to think of the source of such a difference.
References
1.	 Glaser R (1981) The future of testing. American Psychologist 36(9): 923-
936.
2.	 Faggen J (1987) Golden rule revisited: Introduction. Educational Mea-
surement: Issues and Practice 6(2): 5-8.
3.	 Linn RL, Drasgow F (1987) Implications of the golden rule settlement
for test construction. Educational Measurement: Issues and Practice
6(2): 13-17.
4.	 Holland PW, Thayer DT (1988) Differential item performance and the
Mantel Haenszel procedure. In: Wainer H, Braun HI (Eds.), Test Validity,
Lawrence Erlbaum Associates, Hillsdlae, NJ, USA, pp. 129-145.
5.	 Cole NS (1983) History and development of DIF. In: Holland PW & Wain-
er H (Eds.), Differential Item Functioning, Lawrence Erlbaum Associate,
Hillsdale, NJ, USA, pp. 25-29.
6.	 Chang H, Mazzeo J, Roussos L (1996) Detecting DIF for polytomously
scored items: an adaption of the SIBTEST procedure. Journal of Educa-
tional Measurement 33(3): 333-353.
7.	 Lord FM (1977) A study of item bias, using item characteristic curve the-
ory. In: Portinga YH (Ed.), Basic problems in cross-cultural psychology,
Swets and Zeitlinger, Amsterdam, Netherlands, pp. 19-29.
8.	 Shealy R, Stout W (1993) A model-based standardization approach that
separates true bias/DIF from group ability differences and detects test
bias/DTF as well as item bias/DIF. Psychometrika 58(2): 159-194.
9.	 Thissen D, Steinberg L, Wainer H (1993) Detection of differential item
functioning using the parameters of item response models. In: Holland
PW, Wainer H (Eds.), Differential Item Functioning, Lawrence Erlbaum
Associates, Hillsdale, NJ, USA, pp. 67-113.
10.	Holland PW, Wainer H. (Eds.). (1993) Differential item functioning. Law-
rence Erbaum Assoicates, Hillsdale, NJ, USA.
For possible submissions Click Here Submit Article
Creative Commons Attribution 4.0
International License
Psychol Psychother Res Stud
Benefits of Publishing with us
•	 High-level peer review and editorial services
•	 Freely accessible online immediately upon publication
•	 Authors retain the copyright to their work
•	 Licensing it under a Creative Commons license
•	 Visibility through different online platforms

More Related Content

PDF
Selection of a survey research instrument impediments of personality inventor...
PPTX
Article Analysis - Language Testing
DOCX
test construction project paper
PDF
Relationships Among Classical Test Theory and Item Response Theory Frameworks...
PDF
Graduate Research Poster_Extraverion_EWhite_ACavazos
PDF
Luis Hernandez Gender Stereotype Research Paper
PDF
Graduate Paper--Hierarchical clustring and topology for psychometrics paper
DOCX
Exposing Gender Bias When Considering Male and Female Authors
Selection of a survey research instrument impediments of personality inventor...
Article Analysis - Language Testing
test construction project paper
Relationships Among Classical Test Theory and Item Response Theory Frameworks...
Graduate Research Poster_Extraverion_EWhite_ACavazos
Luis Hernandez Gender Stereotype Research Paper
Graduate Paper--Hierarchical clustring and topology for psychometrics paper
Exposing Gender Bias When Considering Male and Female Authors

What's hot (7)

PPT
The application of irt using the rasch model presnetation1
DOCX
Single Spaced Article Format of Lit Review
PPS
Lesson 20
PPTX
Research Method
PPT
Thesis Presentation
DOCX
Classical Test Theory and Item Response Theory
PPTX
Bradford mvsu fall 2012 lecture 3 methods
The application of irt using the rasch model presnetation1
Single Spaced Article Format of Lit Review
Lesson 20
Research Method
Thesis Presentation
Classical Test Theory and Item Response Theory
Bradford mvsu fall 2012 lecture 3 methods
Ad

Similar to Understanding Differential Item Functioning and Item bias In Psychological Instruments | Crimson Publishers (19)

PDF
Fawcett86
PDF
Essay On Juvenile Incarceration
DOCX
Discussion 1 Group Research Designs (Du
DOCX
Example 1Methodological Design PhenomenologyResearch Questi.docx
DOCX
Example 1Methodological Design PhenomenologyResearch Questi.docx
DOCX
Running head A PBR APPLICATION FOR AN INTERIOR SURFACE COATING FA.docx
DOCX
Module 3 - CasePERFORM THE RESEARCHCase AssignmentThe Situatio.docx
PDF
A COGNITIVE ANALYSIS OF SOLUTIONS FOR VERBAL, INFORMAL, AND FORMAL-DEDUCTIVE ...
PPT
PRACTICAL RESEARCH 2 nATURE PF QUANTITATIVE RESEARCH
DOCX
10Assignment Sampling, Article Revie.docx
PDF
Using transformed item difficulty procedure to assess
PDF
Social Psychology
DOCX
Respond to at least two of your classmates postings.  BERTHA.docx
DOCX
29510Nominal Data and the Chi-Square TestsJupiterimag.docx
DOCX
Converging on IHRM Best Practices • 123CONVERGING .docx
DOCX
Human Resource Management Review 19 (2009) 117–133Contents
DOCX
Statistics in Art Education
DOCX
PSY 294 RESEARCH DESIGN &ANALYSIS IILECTURE 4Research Pr.docx
PDF
Org Behavior Case Study
Fawcett86
Essay On Juvenile Incarceration
Discussion 1 Group Research Designs (Du
Example 1Methodological Design PhenomenologyResearch Questi.docx
Example 1Methodological Design PhenomenologyResearch Questi.docx
Running head A PBR APPLICATION FOR AN INTERIOR SURFACE COATING FA.docx
Module 3 - CasePERFORM THE RESEARCHCase AssignmentThe Situatio.docx
A COGNITIVE ANALYSIS OF SOLUTIONS FOR VERBAL, INFORMAL, AND FORMAL-DEDUCTIVE ...
PRACTICAL RESEARCH 2 nATURE PF QUANTITATIVE RESEARCH
10Assignment Sampling, Article Revie.docx
Using transformed item difficulty procedure to assess
Social Psychology
Respond to at least two of your classmates postings.  BERTHA.docx
29510Nominal Data and the Chi-Square TestsJupiterimag.docx
Converging on IHRM Best Practices • 123CONVERGING .docx
Human Resource Management Review 19 (2009) 117–133Contents
Statistics in Art Education
PSY 294 RESEARCH DESIGN &ANALYSIS IILECTURE 4Research Pr.docx
Org Behavior Case Study
Ad

More from CrimsonpublishersPPrs (20)

DOCX
Happy Thanksgiving day
PDF
Enuresis: The Hidden Problem_Crimson Publishers
PDF
Is it Possible to Think the Research in Childhood from Psychoanalysis?_Crimso...
PDF
Perceived Stress among Medical Students: Prevalence, Source and Severity_Crim...
PDF
Obstetrical Complications and Reproductive Outcomes of Laparoscopic Myomectom...
PDF
The First Appearance of Persistent Dementia and Psychosis after a Generalized...
PDF
Effectiveness of Teacher Plus Psychosocial Model on Socio-Emotional Well-bein...
PDF
The Effect of Psychological Conditions on Sexuality: A Review_Crimson Publishers
PDF
Rape day’-A virtual Reality Video Game Causes Outrage_Crimson Publishers
PDF
Ethical Questions Arising from A Psychiatrist Suicide_Crimson Publishers
PDF
Is Magic a Serious Research Topic? Reflexions On Some French Students’ Remark...
PDF
Dialectic Approach in the Psychology by Jose RP in Psychology and Psychothera...
PDF
Minding a Healthy Body: Clarifying Media Roles as Primers in the Rating of Bo...
PDF
Two Examples of Simulations being used to Change Attitudes Towards Parenting_...
PDF
Recovery-Oriented Risk Assessment and Shared Decision Making. Mapping the Pro...
PDF
A Psychological Accounting of a Modern Luddite: Ted Kaczynski AKA the Unabomb...
PDF
A Pilot Study on Functional Analytic Psychotherapy Group Treatment for Border...
PDF
The «System 3 + 3» in a Problem of Searching of a New Paradigm in Psychiatry_...
PDF
How are Love, Loneliness, and Health Related?_Crimson Publishers
PDF
Parenting Styles, Academic Achievement and the Influence of Culture | Crimson...
Happy Thanksgiving day
Enuresis: The Hidden Problem_Crimson Publishers
Is it Possible to Think the Research in Childhood from Psychoanalysis?_Crimso...
Perceived Stress among Medical Students: Prevalence, Source and Severity_Crim...
Obstetrical Complications and Reproductive Outcomes of Laparoscopic Myomectom...
The First Appearance of Persistent Dementia and Psychosis after a Generalized...
Effectiveness of Teacher Plus Psychosocial Model on Socio-Emotional Well-bein...
The Effect of Psychological Conditions on Sexuality: A Review_Crimson Publishers
Rape day’-A virtual Reality Video Game Causes Outrage_Crimson Publishers
Ethical Questions Arising from A Psychiatrist Suicide_Crimson Publishers
Is Magic a Serious Research Topic? Reflexions On Some French Students’ Remark...
Dialectic Approach in the Psychology by Jose RP in Psychology and Psychothera...
Minding a Healthy Body: Clarifying Media Roles as Primers in the Rating of Bo...
Two Examples of Simulations being used to Change Attitudes Towards Parenting_...
Recovery-Oriented Risk Assessment and Shared Decision Making. Mapping the Pro...
A Psychological Accounting of a Modern Luddite: Ted Kaczynski AKA the Unabomb...
A Pilot Study on Functional Analytic Psychotherapy Group Treatment for Border...
The «System 3 + 3» in a Problem of Searching of a New Paradigm in Psychiatry_...
How are Love, Loneliness, and Health Related?_Crimson Publishers
Parenting Styles, Academic Achievement and the Influence of Culture | Crimson...

Recently uploaded (20)

PPT
Obstructive sleep apnea in orthodontics treatment
PPTX
CHEM421 - Biochemistry (Chapter 1 - Introduction)
PPT
Breast Cancer management for medicsl student.ppt
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
Acid Base Disorders educational power point.pptx
PPT
HIV lecture final - student.pptfghjjkkejjhhge
PPTX
neonatal infection(7392992y282939y5.pptx
PPTX
Note on Abortion.pptx for the student note
PPTX
antibiotics rational use of antibiotics.pptx
PDF
شيت_عطا_0000000000000000000000000000.pdf
PDF
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
PPTX
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPT
ASRH Presentation for students and teachers 2770633.ppt
PDF
Human Health And Disease hggyutgghg .pdf
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
Obstructive sleep apnea in orthodontics treatment
CHEM421 - Biochemistry (Chapter 1 - Introduction)
Breast Cancer management for medicsl student.ppt
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
Acid Base Disorders educational power point.pptx
HIV lecture final - student.pptfghjjkkejjhhge
neonatal infection(7392992y282939y5.pptx
Note on Abortion.pptx for the student note
antibiotics rational use of antibiotics.pptx
شيت_عطا_0000000000000000000000000000.pdf
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
surgery guide for USMLE step 2-part 1.pptx
ASRH Presentation for students and teachers 2770633.ppt
Human Health And Disease hggyutgghg .pdf
OPIOID ANALGESICS AND THEIR IMPLICATIONS
MENTAL HEALTH - NOTES.ppt for nursing students
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...

Understanding Differential Item Functioning and Item bias In Psychological Instruments | Crimson Publishers

  • 1. Insu Paek* Department of Educational Psychology & Learning Systems, USA *Corresponding author: Insu Paek, Associate professor, Measurement & Statistics Program, Educational Psychology & Learning Systems, Florida State University, Tallahassee, USA, Tel: 850-644-3064; Email: Submission: March 09, 2018; Published: May 27, 2018 Understanding Differential Item Functioning and Item bias In Psychological Instruments Introduction For a psychological test or instrument to function properly as intended, items in the test should measure respondents’ performance fairly across different groups of respondents such as male and female. In psychometric literature, the concept of differential item functioning (DIF) has been introduced to address the differential group performance on an item when the groups are equated at the same level of ability or latent trait status. This article introduces the concept of DIF while making a clear distinction of DIF from item bias and simple group performance difference. Since the civil rights era of the 1960’s in the United States, inequity has become a critical social issue. The area of educational and psychological testing is no exception. The use of testing as a sorting mechanism [1] has brought equity concerns to many people, specifically the testing enterprise. Academic research on group differences and public awareness of them has resulted in the examination of whether tests in educational and psychological testing are disadvantaging minority groups. A well-known incident about bias issue and group differences is “Golden Rule” settlement in 1984. The Golden Rule insurance company in 1976 filed a lawsuit against Illinois Department of Insurance and Educational Testing Service, charging racial bias in Illinois insurance licensing exams. The lawsuit led to an out-of-court settlement, ending the 8-year-old suit. The gist of the settlement was elimination of any items showing different item proportion correct (i.e., proportion of yes/correct answers in an item which is called “item p-value” or “marginal item proportion-correct”) across the compared groups. (see for detail, e.g., [2]) Even before the Golden Rule settlement, there was a claim in the academiccommunitythatsometests(e.g.,IQtest)arebiasedagainst minority groups. Some researchers investigated item p-value and considered an item to be biased if it showed a big difference in the item p-value between the compared groups (e.g., white majority group vs. black minority group). This approach is consistent with the solution suggested by the Golden Rule settlement. However, this approach of using the marginal item proportion-correct is flawed because it does not distinguish the true group difference and the true bias. This drawback of the Golden Rule settlement procedure has been pointed out by many academic researchers. For example, Gregory R. Anrig, the president of Educational Testing Service announced that the Golden Rule settlement was “an error of judgment” (see also for the side effect of executing the Golden Rule procedure, e.g., (3)). One could ask “Is it right to make group differences negligible by manipulating the test items (by excluding and revising items) if there is actually a real group difference possibly created by past or present social inequity?”. Technically the major drawback of this marginal proportion correct approach is the confounding of group difference and real bias. The marginal probability of item correct is affected by the population distribution - related to group mean difference - and by the item response function - related to item bias. That is, the marginal probability (observed proportion correct or incorrect) is represented as (1 ) ( ) ( ) ( ) ( )x x p x P Q dFθ θ θ− = ∫ where p(x) is a marginal probability of either x=Yes/correct or x=No/incorrect, θ is a person latent trait (or ability), P(θ) is the item response function, Q(θ) =1-P(θ), and F(θ) is the distribution of θ. In the above presentation, one can see that person latent trait/ability and item characteristics are confounded in the observed proportion of x. (Note that a similar equation can be expressed for the Likert style response items or graded response items, showing that the observed marginal score is based on both item responses function and the latent trait distribution). If we see a large difference in the proportion correct between the two groups, we cannot draw the conclusion that the item is really biased. The large difference could be due to a real group ability difference between the two groups, a bias factor disadvantaging one group in the item, or both, which is probably the case in many real-world applications. In subsequent years, the definition of bias and the methodology of its detection have been refined. The word “bias” is now replaced by a term, “Differential Item Functioning” (DIF), at least in academia. Because of the social connotation of the word, “bias”, Mini Review 1/2Copyright © All rights are reserved by Insu Paek. Volume 1 - Issue - 3 Psychology and Psychotherapy: Research StudyC CRIMSON PUBLISHERS Wings to the Research ISSN 2639-0612
  • 2. Psychol Psychother Res Stud 2/2How to cite this article: Insu P. Understanding Differential Item Functioning and Item bias In Psychological Instruments. Psychol Psychother Res Stud .1(3). PPRS.000514.2018. DOI: 10.31031/PPRS.2018.01.000514 Volume 1 - Issue - 3 Copyright © Insu Paek Holland & Thayer in 1988 [4]. suggested the alternative term DIF in place of “bias”. The complexity of the usage of these terms has been a source of confusion in the communication between the technical measurement community and the public [5]. DIF is a neutral term, indicating the magnitude of advantage or disadvantage presented by an item to a group, which is usually estimated through statistical analysis. In recent years, identifying DIF items and classifying some (or all) of those DIF items as biased items are considered separate. The former is a statistical concept while the latter is more than statistical including the interpretation of the identified DIF in the context of social justice. A formal definition of no DIF [6-9] can be given as follows. ,( ) ,( )E X G E Xθ θ= where E is the expectation operator, X is a categorical ordinal item response (e.g., X=1 (strongly disagree), 2 (disagree), 3 (agree), or 4 (strongly agree) in the 4-option Likert style item test), G is a group indicator (e.g., 1=Female and 0=Male; 1=African American and 0=White), and θ is person latent trait/ ability. Sometimes, no DIF is expressed using an observed variable Z instead of θ, which is a proxy for θ. The above definition of no DIF states, in words, that there is no DIF if the expected item score for one group and the expected item score for the other group are the same when the latent trait/ability scores are equated. Again, DIF is about a conditional comparison between the two compared groups on the same trait/ability level, not a marginal comparison. Those who would like to know more about the methods of DIF detection are referred to [10]. From the test validity point of view, DIF and its detection are of importance and the existence of DIF calls into question the fairness of testing. Although a test constructed without DIF cannot undo the past inequalities, it can reveal the inequalities which may have been created by past and existing inequity, thereby giving people a chance to think of the source of such a difference. References 1. Glaser R (1981) The future of testing. American Psychologist 36(9): 923- 936. 2. Faggen J (1987) Golden rule revisited: Introduction. Educational Mea- surement: Issues and Practice 6(2): 5-8. 3. Linn RL, Drasgow F (1987) Implications of the golden rule settlement for test construction. Educational Measurement: Issues and Practice 6(2): 13-17. 4. Holland PW, Thayer DT (1988) Differential item performance and the Mantel Haenszel procedure. In: Wainer H, Braun HI (Eds.), Test Validity, Lawrence Erlbaum Associates, Hillsdlae, NJ, USA, pp. 129-145. 5. Cole NS (1983) History and development of DIF. In: Holland PW & Wain- er H (Eds.), Differential Item Functioning, Lawrence Erlbaum Associate, Hillsdale, NJ, USA, pp. 25-29. 6. Chang H, Mazzeo J, Roussos L (1996) Detecting DIF for polytomously scored items: an adaption of the SIBTEST procedure. Journal of Educa- tional Measurement 33(3): 333-353. 7. Lord FM (1977) A study of item bias, using item characteristic curve the- ory. In: Portinga YH (Ed.), Basic problems in cross-cultural psychology, Swets and Zeitlinger, Amsterdam, Netherlands, pp. 19-29. 8. Shealy R, Stout W (1993) A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 58(2): 159-194. 9. Thissen D, Steinberg L, Wainer H (1993) Detection of differential item functioning using the parameters of item response models. In: Holland PW, Wainer H (Eds.), Differential Item Functioning, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, pp. 67-113. 10. Holland PW, Wainer H. (Eds.). (1993) Differential item functioning. Law- rence Erbaum Assoicates, Hillsdale, NJ, USA. For possible submissions Click Here Submit Article Creative Commons Attribution 4.0 International License Psychol Psychother Res Stud Benefits of Publishing with us • High-level peer review and editorial services • Freely accessible online immediately upon publication • Authors retain the copyright to their work • Licensing it under a Creative Commons license • Visibility through different online platforms