TEST SCORES
C O L L E C T E D A N D P R E S E N T E D B Y :
E M A N A W A D E L- S A W Y
Ch. 4
What is the meaning of Instrumentation?
It is the process of selecting or
developing measuring devices and
methods appropriate to a given
evaluation problem. (p. 101)
WHAT’S THE DEAL WITH TESTING?
As a society, we like numbers. If something can be
quantified, it is viewed as valid or more scientific.
Machine scoring of a test is fast, efficient, and cheap.
Hand scoring of a test is slow, time consuming, and
very expensive.
INTERPRETING TEST SCORES:
To interpret a test score ,two things must be known:
1. The nature of the score itself (what kind of scoring or scaling system
was used in the calculations?
2. The basis of comparison underlying the score (what reference
population or norm group does it present?)
Types of
test scores
1. Raw
score
2. Percentile
(centile ranks) and
percentiles
(centiles)
3. Stanine
scores
(standard nine)
4.
Standard
scores
Z-scores
T-scores
Other
standard
scores
5. Grade
level
scores
1. RAW SCORES
the actual score made on a test.
Simply the total number of points an individual gets on a test before it is
converted to any formal or standardized scoring system .
Limitations: A raw score by itself is uninterpretable since there is no way
of knowing how it compares with anything else.
2. PERCENTILE (CENTILE RANKS) AND PERCENTILES
(CENTILES)
Raw scores begin to have meaning when they are ranked
from high to low.
A convenient solution is to convert the scores into
percentage values.
Two statistics used for this purpose:
A. The Percentile (Centile Rank): a number between 0 & 100
indicating percent of cases in a norm group falling at or below
that score.
B. the percentile (centile): a point on a scale of scores at or below
which a given percent of the cases falls.
 Strengths:
 They are easily understood by lay people.
 They allow exact interpretation.
 They are appropriate for markedly skewed data than scores based
on the normal probability curve.
 Weaknesses
 Confusion with a “percentage-right score”
 Inequality of units.
 It is misleading to report results in percentage terms when the
sample size is under 100.
 They only permit statements about rank (greater than, equal to ,
less than).
 The intervals between units are not equal (between 60th and 70th #
between 80th and 90th).
3. STANINES
CONTRACTION OF “STANDARD NINE”
 Stanines divide the normal distribution into 9 units each of which
cover the same length along the base of the normal curve (except
the units which cover the two tails). Stanines have a M = 5 and SD =
2 and range 1 (lowest) – 9 (highest).
 They combine the understandability of percentages with the features
of the normal curve of probability.
Stanine scores are useful in comparing a student's performance
across different content areas. For example, a 6 in Mathematics
and an 8 in Reading generally indicate a meaningful difference
in a student's learning for the two respective content areas.
Advantages: Stanine score are coming into increasing use
because of their simplicity and utility.
9
10
4. STANDARD SCORES
The standard scores indicate a student’s relative
position in a group. It expresses test
performance in terms of standard deviation
units from the mean.
They are derived from the properties of the
normal probability curve and preserving the
absolute differences between scores.
Disadvantages:
1. They are inappropriate if data are markedly
skewed.
2. They are difficult to explain to lay audience.
5. GRADE LEVEL SCORES:
They are based on the relationship between
scores on a test and the average performance
of children at each of a series of grade levels.
However, developmental characteristics of certain
age levels may be due to maturity rather than
instruction.
They are most relevant in elementary schools
where subject matter tends to be more
continuous.
Beyond the sixth grade they lose meaning.
STANDARDIZED TESTS:
They report score based on a norm group representing a defined
population .
Until this comparison group is clearly known , a satisfactory interpretation
of the score is not possible.
1. Norm-referenced tests.
2. Criterion-referenced tests.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTINGMany educators and members of the public fail to
grasp the distinctions between criterion-
referenced and norm-referenced testing. It is
common to hear the two types of testing referred
to as if they serve the same purposes, or shared
the same characteristics. Much confusion can be
eliminated if the basic differences are
understood.
The following is adapted from: Popham, J. W.
(1975). Educational evaluation. Englewood Cliffs,
New Jersey: Prentice-Hall, Inc.
STANDARDIZED TESTS:
Criterion-Referenced Test
Criterion-referenced tests, also called mastery tests,
compare a person's performance to a set of objectives.
Anyone who meets the criterion can get a high score.
Everyone knows what the benchmarks / objectives are and
can attain mastery to meet them.
It is possible for ALL the test takers to achieve 100%
mastery.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Purpose To determine whether each
student has achieved specific
skills or concepts.
To find out how much
students know before
instruction begins and after
it has finished.
To rank each student with
respect to the
achievement of others in
broad areas of knowledge.
To discriminate between high
and low achievers.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Content Measures specific
skills which make up a
designated curriculum.
These skills are
identified by teachers
and curriculum
experts.
Each skill is expressed
as an instructional
objective.
Measures broad skill areas
sampled from a variety of
textbooks, syllabi, and the
judgments of curriculum
experts.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Item
Characteristics
Each skill is tested by at
least four items in order to
obtain an adequate sample
of student performance and
to minimize the effect of
guessing.
The items which test any
given skill are parallel in
difficulty.
Each skill is usually tested by
less than four items.
Items vary in difficulty.
Items are selected that
discriminate between high
and low achievers.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Score
Interpretation
Each individual is
compared with a preset
standard for acceptable
achievement. The
performance of other
examinees is irrelevant.
A student's score is usually
expressed as a percentage.
Student achievement is
reported for individual
skills.
Each individual is compared
with other examinees and
assigned a score--usually
expressed as a percentile, a
grade equivalent score, or a
stanine.
Student achievement is
reported for broad skill
areas, although some norm-
referenced tests do report
student achievement for
individual skills.

More Related Content

PPT
Aptitude test
PDF
PPTX
Validity in Assessment
PPTX
Item analysis
PPSX
2. writing learning-objectives
PPTX
good test Characteristics
PPTX
Norm Referenced and Criterion Referenced
PPTX
Reliability bachman 1990 chapter 6
Aptitude test
Validity in Assessment
Item analysis
2. writing learning-objectives
good test Characteristics
Norm Referenced and Criterion Referenced
Reliability bachman 1990 chapter 6

What's hot (20)

PPTX
psychological assessment standardization, evaluation etc
PPTX
Strong Interest Inventory
PPTX
Validity of test
PPTX
Item analysis with spss software
PPTX
Attitude scale construction by sakshi shastri
PDF
Item Analysis, Difficulty Index, Discrimination Index,ExamAnalysis
PPTX
Individual vs group test
PPTX
Blooms taxonomy
PPTX
Item analysis report
PPTX
Intelligence and aptitude tests
PPTX
Analysis of test formats, poor and good items
PPT
Behavior modification
PPTX
Assessment of cognitive abilities
PPTX
Standardization
DOCX
Rating scale ppt
PPTX
sociometry
PPTX
Item analysis
PPTX
Assessment in Education
PPT
Testing and Test Construction
psychological assessment standardization, evaluation etc
Strong Interest Inventory
Validity of test
Item analysis with spss software
Attitude scale construction by sakshi shastri
Item Analysis, Difficulty Index, Discrimination Index,ExamAnalysis
Individual vs group test
Blooms taxonomy
Item analysis report
Intelligence and aptitude tests
Analysis of test formats, poor and good items
Behavior modification
Assessment of cognitive abilities
Standardization
Rating scale ppt
sociometry
Item analysis
Assessment in Education
Testing and Test Construction
Ad

Similar to Measurement and instrumentaion (20)

PPTX
Six steps for avoiding misinterpretations
PPTX
Administering,scoring and reporting a test ppt
PPTX
Psychological testing report sample.pptx
DOCX
Adapted from Assessment in Special and incl.docx
PPTX
educatiinar.pptx
PPTX
Practical Language Testing by Fulcher (2010)
DOCX
PPTX
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
PPT
Analyzing-Test-items-of-assessmentsg.ppt
PDF
INTERPRETING TEST SCORES ............................
PPT
Reliability and validity
PPTX
Test norms.pptx
PDF
psychological-assessment-summaries-pdf.pdf
PPT
Norms[1]
PDF
Copy of TOPIC 2.pdfàsfcabcdetghijklmnopqr
PDF
UNIT 6_ Reliability-psychtesting-psychassess.pdf
PPTX
Administration of the Test and Analysis of Students’ Performance
PPT
EEX 501 Assessment
PPT
EEX 501 Assess Ch4,5,6,7,All
PPT
Lecture 07
Six steps for avoiding misinterpretations
Administering,scoring and reporting a test ppt
Psychological testing report sample.pptx
Adapted from Assessment in Special and incl.docx
educatiinar.pptx
Practical Language Testing by Fulcher (2010)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
Analyzing-Test-items-of-assessmentsg.ppt
INTERPRETING TEST SCORES ............................
Reliability and validity
Test norms.pptx
psychological-assessment-summaries-pdf.pdf
Norms[1]
Copy of TOPIC 2.pdfàsfcabcdetghijklmnopqr
UNIT 6_ Reliability-psychtesting-psychassess.pdf
Administration of the Test and Analysis of Students’ Performance
EEX 501 Assessment
EEX 501 Assess Ch4,5,6,7,All
Lecture 07
Ad

More from ahmedabbas1121 (20)

PPT
Survey design and techniques for school related use
PPTX
Suggestopedia  ppt
PPTX
Storytelling
PPTX
Shared reading strategy
PPTX
Remedial teaching strategies
PPT
Presentation skills
PDF
Presentation skills
PPTX
The definition of curriculum
PPT
The input hypothesis
PPT
The natural order hypothesis
PPTX
The monitor model
PPTX
Theory based models of curriculum development
PPT
Transformation model
PPTX
Tyler model
PPTX
Nunan curriculum model_1
PPTX
Multiple intelligences ppt
PPT
curriculum Mapping: the big picture
PPT
Learning styles
PPTX
Kinds of tests_commonly_used
PDF
How to give agood presentation?
Survey design and techniques for school related use
Suggestopedia  ppt
Storytelling
Shared reading strategy
Remedial teaching strategies
Presentation skills
Presentation skills
The definition of curriculum
The input hypothesis
The natural order hypothesis
The monitor model
Theory based models of curriculum development
Transformation model
Tyler model
Nunan curriculum model_1
Multiple intelligences ppt
curriculum Mapping: the big picture
Learning styles
Kinds of tests_commonly_used
How to give agood presentation?

Recently uploaded (20)

PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
Climate Change and Its Global Impact.pptx
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
Module on health assessment of CHN. pptx
PDF
Empowerment Technology for Senior High School Guide
A powerpoint presentation on the Revised K-10 Science Shaping Paper
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Hazard Identification & Risk Assessment .pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
What’s under the hood: Parsing standardized learning content for AI
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
CRP102_SAGALASSOS_Final_Projects_2025.pdf
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Climate Change and Its Global Impact.pptx
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
My India Quiz Book_20210205121199924.pdf
Module on health assessment of CHN. pptx
Empowerment Technology for Senior High School Guide

Measurement and instrumentaion

  • 1. TEST SCORES C O L L E C T E D A N D P R E S E N T E D B Y : E M A N A W A D E L- S A W Y Ch. 4
  • 2. What is the meaning of Instrumentation? It is the process of selecting or developing measuring devices and methods appropriate to a given evaluation problem. (p. 101)
  • 3. WHAT’S THE DEAL WITH TESTING? As a society, we like numbers. If something can be quantified, it is viewed as valid or more scientific. Machine scoring of a test is fast, efficient, and cheap. Hand scoring of a test is slow, time consuming, and very expensive.
  • 4. INTERPRETING TEST SCORES: To interpret a test score ,two things must be known: 1. The nature of the score itself (what kind of scoring or scaling system was used in the calculations? 2. The basis of comparison underlying the score (what reference population or norm group does it present?)
  • 5. Types of test scores 1. Raw score 2. Percentile (centile ranks) and percentiles (centiles) 3. Stanine scores (standard nine) 4. Standard scores Z-scores T-scores Other standard scores 5. Grade level scores
  • 6. 1. RAW SCORES the actual score made on a test. Simply the total number of points an individual gets on a test before it is converted to any formal or standardized scoring system . Limitations: A raw score by itself is uninterpretable since there is no way of knowing how it compares with anything else.
  • 7. 2. PERCENTILE (CENTILE RANKS) AND PERCENTILES (CENTILES) Raw scores begin to have meaning when they are ranked from high to low. A convenient solution is to convert the scores into percentage values. Two statistics used for this purpose: A. The Percentile (Centile Rank): a number between 0 & 100 indicating percent of cases in a norm group falling at or below that score. B. the percentile (centile): a point on a scale of scores at or below which a given percent of the cases falls.
  • 8.  Strengths:  They are easily understood by lay people.  They allow exact interpretation.  They are appropriate for markedly skewed data than scores based on the normal probability curve.  Weaknesses  Confusion with a “percentage-right score”  Inequality of units.  It is misleading to report results in percentage terms when the sample size is under 100.  They only permit statements about rank (greater than, equal to , less than).  The intervals between units are not equal (between 60th and 70th # between 80th and 90th).
  • 9. 3. STANINES CONTRACTION OF “STANDARD NINE”  Stanines divide the normal distribution into 9 units each of which cover the same length along the base of the normal curve (except the units which cover the two tails). Stanines have a M = 5 and SD = 2 and range 1 (lowest) – 9 (highest).  They combine the understandability of percentages with the features of the normal curve of probability. Stanine scores are useful in comparing a student's performance across different content areas. For example, a 6 in Mathematics and an 8 in Reading generally indicate a meaningful difference in a student's learning for the two respective content areas. Advantages: Stanine score are coming into increasing use because of their simplicity and utility. 9
  • 10. 10
  • 11. 4. STANDARD SCORES The standard scores indicate a student’s relative position in a group. It expresses test performance in terms of standard deviation units from the mean. They are derived from the properties of the normal probability curve and preserving the absolute differences between scores. Disadvantages: 1. They are inappropriate if data are markedly skewed. 2. They are difficult to explain to lay audience.
  • 12. 5. GRADE LEVEL SCORES: They are based on the relationship between scores on a test and the average performance of children at each of a series of grade levels. However, developmental characteristics of certain age levels may be due to maturity rather than instruction. They are most relevant in elementary schools where subject matter tends to be more continuous. Beyond the sixth grade they lose meaning.
  • 13. STANDARDIZED TESTS: They report score based on a norm group representing a defined population . Until this comparison group is clearly known , a satisfactory interpretation of the score is not possible. 1. Norm-referenced tests. 2. Criterion-referenced tests.
  • 14. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTINGMany educators and members of the public fail to grasp the distinctions between criterion- referenced and norm-referenced testing. It is common to hear the two types of testing referred to as if they serve the same purposes, or shared the same characteristics. Much confusion can be eliminated if the basic differences are understood. The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc.
  • 15. STANDARDIZED TESTS: Criterion-Referenced Test Criterion-referenced tests, also called mastery tests, compare a person's performance to a set of objectives. Anyone who meets the criterion can get a high score. Everyone knows what the benchmarks / objectives are and can attain mastery to meet them. It is possible for ALL the test takers to achieve 100% mastery.
  • 16. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Purpose To determine whether each student has achieved specific skills or concepts. To find out how much students know before instruction begins and after it has finished. To rank each student with respect to the achievement of others in broad areas of knowledge. To discriminate between high and low achievers.
  • 17. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Content Measures specific skills which make up a designated curriculum. These skills are identified by teachers and curriculum experts. Each skill is expressed as an instructional objective. Measures broad skill areas sampled from a variety of textbooks, syllabi, and the judgments of curriculum experts.
  • 18. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Item Characteristics Each skill is tested by at least four items in order to obtain an adequate sample of student performance and to minimize the effect of guessing. The items which test any given skill are parallel in difficulty. Each skill is usually tested by less than four items. Items vary in difficulty. Items are selected that discriminate between high and low achievers.
  • 19. MEASUREMENT AND EVALUATION: CRITERION- VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Score Interpretation Each individual is compared with a preset standard for acceptable achievement. The performance of other examinees is irrelevant. A student's score is usually expressed as a percentage. Student achievement is reported for individual skills. Each individual is compared with other examinees and assigned a score--usually expressed as a percentile, a grade equivalent score, or a stanine. Student achievement is reported for broad skill areas, although some norm- referenced tests do report student achievement for individual skills.