CHAPTER 6 Assessment of Learning 1

CHAPTER 6
ITEM ANALYSIS AND VALIDATION

TOPICS
Validation and Validity Reliability
Items Analysis: Difficulty
Index and Discrimination
Index

LEARNING OUTCOMES
At the end of the lesson, you should be able
to:
1. Explain the meaning of item analysis, item validity,
item difficulty, and discrimination index.
2. Determine the validity and reliability of given test
items.
3. Determine the quality of a test by its difficulty
index, discrimination index and the plausibility of
options.

Topic 1: ITEM ANALYSIS: DIFFICULTY INDEX AND
DISCRIMINATION INDEX
There are two importance characteristic of an item
that will be of interest to the teacher. These are: a) item
difficulty and b) discrimination index.
The difficulty of an item or item difficulty is defined
as the number of students who are able to answer the item
correctly divided by the total number of students.

Item difficulty = Number of student with correct answer
Total number of students
The item difficulty is usually expressed in percentage.
Example: What is the item difficulty index of an item if 25 students are unable to
answer it correctly while 75 answered it correctly?
Here, the total number of students is 1oo, hence the item difficulty index is 75/100 or
75%.
The following arbitrary rule is often used in the literature:
Range of Difficulty Index Interpretation Action
0 – 0.25 Difficulty Revise or discard
0.26 – 0.75 Right Difficulty Retain
0.76 – above Easy Revise or Discard

Index of discrimination = DU – DL
Where DU = Difficulty index of the upper 25% of the class
DL = Difficulty index of the lower 25% of the class
Example: Obtain the index of discrimination of an item if the upper 25% of the
class had a difficulty index of 0.60 (i.e 60% of the upper 25% got the correct
answer) while the lower 25% of the class had a difficulty index of 0.20.
Here, DU = 0.60 while DL = 0.20,
thus index discrimination = 0.60 – 0.20 = 0.40
Discrimination index is the difference between the proportion of the top
scorers who got an item correct and the proportion of the lowest scores who
got the item right. The discrimination index range is between -1 and +1.

Theoretically, the index of discrimination can range from -1.0 (when DU=O
and DL=1) to 1.0 (when DU=1 and D=O)
Example: Consider a multiple choice type of test of which the following
data were obtained.
Item Options
1 A B* C D
0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
Index Range Interpretation Action
-1.0 - -0.50 Can discriminate but item is
questionable
Discard
-0.51 – 0.45 Non-discriminating item Revise
0.46 – 1.0 Discriminating item Include

The correct answer is B. Let us compute the difficulty index and index of
discrimination:
Difficulty index = no. of students getting correct response
Total
= 40/100 = 40%, within a range of “good item”
The discrimination index can similarly be computed:
DU = no. of students in upper 25% with correct response
no. students in the upper 25%
= 15/20 = 0.75 or 75%
DL = no. of students in lower 25% with correct response
no. of students in the lower 25%
= 5/20 = 0.25% or 25%
Discrimination index = DU – DL = 0.75 – 0.25 = 0.50 or 50%
Thus, the item also has a “good discriminating power.”

More Sophisticated Discrimination Index
Item discrimination refers to the ability of an item to differentiate
among students on the basis of how well they know the material being
tested. Various hand calculation procedures have traditionally been used to
compare item responses to total test scores using high and low scoring
groups of students.
A good item is one that has good discriminating ability and has
sufficient level of difficult (not too difficult nor too easy).
At the end of the Item Analysis report, test items are listed according
to their degrees of difficulty (easy, medium, hard) and discrimination
(good, fair, poor). These distributions provide a quick overview of the test,
and can be used to identify items which are not performing well and which
can perhaps be improved od discarded.

The Item-Analysis Procedure
for Norm-provides the following
information:
1. The difficulty of the item
2. The discriminating power of
the item
3. The effectiveness of each
alternative
Some benefits derived from
Item Analysis are:
1. It provides useful
information for class
discussion of the test.
2. It provides data which help
students improve their
learning.
3. It provides insights and skills
that lead to the preparation
for better tests in the future.

TOPIC 2
VALIDATION AND VALIDITY
VALIDATION
Is the process of collecting and analyzing evidence to support the
meaningfulness and usefulness of the test.
VALIDITY
Is the extent to which a test measures what it purports to measure or as
referring to the appropriateness, correctness, meaningfulness, and
usefulness of the specific decisions a teacher makes based on the test
results.

TOPIC 2
VALIDATION AND VALIDITY
Content related evidence of validity
Refers to the content and format of the instrument.
Criterion-related evidence of validity
Refers to the relationship between scores obtained using the instrument
and scores obtained using one or more other tests (often called criterion).
Construct-related evidence of validity
Refers to the nature of the psychological construct or characteristic
being measured by the test.

TOPIC 3: REALIBILITY
Reliability refers to the consistency of the scores
obtained how consistent they are for reach individual
from one administration of an instrument to another
and from one set of items to another.

Reliability and validity are related concepts. If an
instrument is unreliable, it cannot get valid outcomes.
As reliability improves, validity may improve (or it
may not). However, if an instrument is shown
scientifically to be valid the it is almost certain that it
is also reliable.

Predictive validity compares the question with an outcome
assessed at a later time. An example of predictive validity is a
comparison of scores in the comparison of scores in the
National Achievement Test (NAT) with first semester grade
point average (GPA) in the college. Do NAT scores predict
college performance?

Construct validity refers to the ability of a test to
measure what it is supposed to measure. As
researcher, you intend to measure depression but you
actually measure anxiety so your research gets
comprised.

Reliability Interpretation
0.90 and above Excellent Reliability; at the level of the best standardized
tests
0.80 – 90 Very good for classroom test
0.70 – 0.80 Good for classroom test; in the range of the most. There
are probably a few items which could be improved.
0.60 – 0.70 Somewhat low. This test needs to be supplemented by
other measures (e.g. more tests) to determine grades.
There are probably some items which could be improved.
0.50 – 0.60 Suggests need for revision of test, unless it is quite short
(ten or fewer items). The test definitely needs to be
supplemented by other measures (e.g. more tests) for
grading.
0.50 or below Questionable reliability. This test should not contribute
heavily to the course grade, and it needs revision.
The following
table is a
standard
followed
almost
universally in
educational
test and
measurement.

CHAPTER 6 Assessment of Learning 1

More Related Content

What's hot (20)

Similar to CHAPTER 6 Assessment of Learning 1 (20)

Recently uploaded (20)

CHAPTER 6 Assessment of Learning 1