Designing an Assessment System
Richard P. Phelps
International Research-to-Practice Conference
Nazarbayev Intellectual Schools AEO
Astana, Kazakhstan
October, 2016
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 1
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 2
“If a thing exists, it
exists in some
amount. If it exists in
some amount, then it
is capable of being
measured.”
−−René Descartes,
Principles of
Philosophy, 1664
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 3
Image of Protein Molecules Forming Memories
Albert Einstein College of Medicine, New York, January 2014
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 4
Image of Protein Molecules Forming Memories
Albert Einstein College of Medicine, New York, January 2014
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 5
Learning Curve
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 6
Forgetting Curve (1870s)
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 7
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 8
Ebbinghaus:
“Learning usually
requires rehearsal
or repetition”
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 9
Cognitive Load Theory
John Sweller, 1980s
Working Memory Capacity
George Miller, 1950s
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 10
Working Memory:
Ability to temorarily hold and
manipulate information for
cognitive tasks
Working Memory is challenged by:
new, unfamiliar information and
quantity of discrete bits of information
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 11
I am thinking of a type of object, what is it?
They are shapes, geometric plane figures,
polygons, quadrilaterals, and parallelograms
with opposite equal acute angles, opposite
equal obtuse angles, and four equal sides
Description 1:
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 12
I am thinking of a type of object, what is it?
Description 2:
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 13
Two centuries of research on learning concludes…
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 14
“…repeated retrieval during learning is the key to
long-term retention.”
— Henry L. “Roddy” Roediger
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 15
Cognitive Scientists’ 6 Strategies for Effective Learning
Retrieval Practice
Spaced Practice
Dual Coding
Interleaving
Concrete Examples
Elaboration
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 16
Retrieval Practice
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 17
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 18
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 19
Implications for Teachers 1
Most teachers should test more
frequently, …with smaller,
shorter, low-stakes tests
Understand that useful
assessment can be short and
simple.
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 20
Implications for Teachers 2
Does the test format
matter?
• multiple-choice?
• essay?
• short answer?
• oral?
• demonstration?
• …etc.?
Not so much.
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 21
Tests provide
feedback to teachers
about what works
and what does not
Implications for Teachers 3
Just like students can learn by testing each other;
teachers can help each other by reviewing each
others’ tests.
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 22
Cognitive Psychology
experiments were
conducted with
“formative” tests in
schools and classrooms
What about systemwide, large-scale tests?
First priority:
do no harm to the
formative testing
programs in schools
and classrooms
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 24
The effect of testing on student learning
• 12-year study, read >3,000 documents
• analyzed close to 700 separate studies, and
more than 1,600 separate effects
• 2,000 other studies were reviewed and
found incomplete or inappropriate
• hundreds of other studies remain to be
reviewed
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 25
245 Qualitative studies
813 Surveys or Polls
640 Quantitative Studies:
Experiments:
School- and classroom-level
Multivariate studies:
Large-scale testing programs
The effect of testing on student learning
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 26
Meta-analysis
A method for
summarizing a large
research literature, with
a single, comparable
measure.
( 0.5 effect size ≈ 1 grade level of learning )
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 27
Findings from Phelps (2012):
• Survey study effect sizes average >1.0
• Over 90% of qualitative studies positive
• For quantitative studies, univariate effect sizes positive and
stronger when:
– Testing more frequently
– Testing with feedback
– Testing with stakes
28
Findings from Phelps & Silva (2015)
For quantitative studies, effect sizes vary
between 0.55 and 0.88:
+++ testing more frequently
++ testing with stakes
+ testing with feedback
International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016© 2016, Richard P PHELPS
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 29
• size of study population
• small +0.34 over large
• scale of test administration
• small-scale +0.14 over large-scale
• responsible level of government
• local tests +0.29 over state tests
Effect of scale on testing benefits
Large-scale test, tight security
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 30
Large-scale test, lax security
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 31
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 32
Besides, systemwide tests are needed for
other purposes, such as…
…selection to programs with limited number of places
…monitoring and system diagnosis
…workforce planning
…accountability
…credentialing
That’s enough!
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 33
Some large-scale test advantages
On per-student basis, inexpensive
Cognitive laboratory pre-testing possible
Standardization offers comparisons across schools and regions.
May produce high-quality items that schools and teachers can use.
MOST IMPORTANT:
provides reliable, comparative information to all those not involved in
a particular school
The more systemwide decision points, the better ?
Figure 1: Average TIMSS Score and Number of Quality Control
Measures Used, by Country
0
10
20
30
40
50
60
70
80
0 5 10 15 20
Number of Quality Control Measures Used
AveragePercentCorrect(grades7&8)
Top-Performing Countries Bottom-Performing Countries
SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 34
Quality control has proportionally greater effect in poorer countries
Figure 2: Average TIMSS Score and Number of Quality Control
Measures Used (each adjusted for GDP/capita), by Country
Number of Quality Control Measures Used (per GDP/capita)
AveragePercentCorrect(grades7&8)
(perGDP/capita)
SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 35
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 36
TIMSS, PIRLS, CIVED, SITES, ICILS,
PPP, ECES, TEDS
IEA:
OECD PISA:
World Bank:
PISA, PISA for schools
PISA for development
READ, SABER
…provides funding for PISA
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 37
The effect of international testing programs
Freedomtodesignyourtesting
school
tests
international
tests
state and national tests
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 38
OECD and World Bank are run by economists
How well do economists understand PSYCH-ometrics?
Some interesting examples:
Chile’s national testing
program, funded by the
World Bank
OECD’s “Synergies for
Better Learning” project
© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 39
Some interesting oddities:
World Bank educational
assessment chiefs are always
Irish nationals affiliated with
Boston College in the USA.
PISA is universally interpreted as
an achievement test, even by
the OECD. In reality, it has been
an unvalidated aptitude test.
Designing an Assessment System
richard {at} nonpartisaneducation {dot} org

More Related Content

PDF
LITERATURE REVIEWING WITH RESEARCH TOOLS, Part 1: Systematic Review
PDF
Electrical Installation and Maintenance for grades_7 to 10
PPTX
Fire Safety & Steel Structures - October 2015
PDF
2013 06 04 highrise buildings fire safety - apfes
PDF
K to 12 electrical teacher's guide
PPTX
Classroom testing: Using tests to promote learning
PDF
The Ingredients for Great Teaching, Wellington Version
LITERATURE REVIEWING WITH RESEARCH TOOLS, Part 1: Systematic Review
Electrical Installation and Maintenance for grades_7 to 10
Fire Safety & Steel Structures - October 2015
2013 06 04 highrise buildings fire safety - apfes
K to 12 electrical teacher's guide
Classroom testing: Using tests to promote learning
The Ingredients for Great Teaching, Wellington Version

Similar to Designing an Assessment System (20)

PPTX
22 January 2018 HEFCE open event “Using data to increase learning gains and t...
PPTX
Yeilding Rives Inservice/Workshop EPR688
PPT
PPTX
Testing language skills chapter one
PPTX
Arkansas common core presentation
PPTX
1SBA_SCERT Gurugram.pptx
PDF
educ-18-LESSON-3.pdf
PPTX
Measurement & Evaluation pptx
PPTX
Standardized & Non Standardized Tests-2nd-ppt.pptx
PPT
Lesssons from reform_around_the_world_0
PPTX
ASSESSMENT.pptx FOR EDUCATIONA L PURPOSES
PDF
Assessment for learning b.ed unit 4&5.pdf
PPTX
NYSCOSS Conference Superintendents Training on Assessment 9 14
PPTX
Bringing PBL to Scale
DOCX
The Use of human Assessment Tests in Educational Settings..docx
PPTX
Educational testing and assessment
PDF
What Works
PPTX
Review Refresher-Assessment-in-Learning.pptx
DOCX
Assessment resource list
PDF
Re-balancing Assessment_CEPA whitepaper_HofmanGoodwinKahl_Feb2015 (1)
22 January 2018 HEFCE open event “Using data to increase learning gains and t...
Yeilding Rives Inservice/Workshop EPR688
Testing language skills chapter one
Arkansas common core presentation
1SBA_SCERT Gurugram.pptx
educ-18-LESSON-3.pdf
Measurement & Evaluation pptx
Standardized & Non Standardized Tests-2nd-ppt.pptx
Lesssons from reform_around_the_world_0
ASSESSMENT.pptx FOR EDUCATIONA L PURPOSES
Assessment for learning b.ed unit 4&5.pdf
NYSCOSS Conference Superintendents Training on Assessment 9 14
Bringing PBL to Scale
The Use of human Assessment Tests in Educational Settings..docx
Educational testing and assessment
What Works
Review Refresher-Assessment-in-Learning.pptx
Assessment resource list
Re-balancing Assessment_CEPA whitepaper_HofmanGoodwinKahl_Feb2015 (1)
Ad

More from Richard P Phelps (17)

PPTX
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
PPTX
The Successful Degradation of Evidence on Educational Testing in the United S...
PPTX
Comparing achievement and aptitude tests for university admission
PPTX
Boarding School: Benefits and Drawbacks
PPT
It's a myth: High stakes cause test score inflation
PPT
It's a myth: High stakes cause test score inflation
PPTX
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
PPTX
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
PPTX
University Admission Testing in Chile: The PSU
PPT
Test benefits slide show
PPT
Forty years of polls on standardized tests in education
PPT
Economic perspectives on testing
PPT
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
PPT
The effect of testing on student achievement: 1910-2010
PPT
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
PPT
Source of Lake Wobegon
PPT
Worse Than Plagiarism: Dismissive Reviews
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
The Successful Degradation of Evidence on Educational Testing in the United S...
Comparing achievement and aptitude tests for university admission
Boarding School: Benefits and Drawbacks
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
University Admission Testing in Chile: The PSU
Test benefits slide show
Forty years of polls on standardized tests in education
Economic perspectives on testing
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
The effect of testing on student achievement: 1910-2010
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
Source of Lake Wobegon
Worse Than Plagiarism: Dismissive Reviews
Ad

Recently uploaded (20)

PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Virtual and Augmented Reality in Current Scenario
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
My India Quiz Book_20210205121199924.pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Weekly quiz Compilation Jan -July 25.pdf
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Computer Architecture Input Output Memory.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
Hazard Identification & Risk Assessment .pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
LDMMIA Reiki Yoga Finals Review Spring Summer
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Virtual and Augmented Reality in Current Scenario
A powerpoint presentation on the Revised K-10 Science Shaping Paper
My India Quiz Book_20210205121199924.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf

Designing an Assessment System

  • 1. Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan October, 2016 © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 1
  • 2. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 2 “If a thing exists, it exists in some amount. If it exists in some amount, then it is capable of being measured.” −−René Descartes, Principles of Philosophy, 1664
  • 3. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 3 Image of Protein Molecules Forming Memories Albert Einstein College of Medicine, New York, January 2014
  • 4. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 4 Image of Protein Molecules Forming Memories Albert Einstein College of Medicine, New York, January 2014
  • 5. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 5 Learning Curve
  • 6. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 6 Forgetting Curve (1870s)
  • 7. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 7
  • 8. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 8 Ebbinghaus: “Learning usually requires rehearsal or repetition”
  • 9. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 9 Cognitive Load Theory John Sweller, 1980s Working Memory Capacity George Miller, 1950s
  • 10. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 10 Working Memory: Ability to temorarily hold and manipulate information for cognitive tasks Working Memory is challenged by: new, unfamiliar information and quantity of discrete bits of information
  • 11. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 11 I am thinking of a type of object, what is it? They are shapes, geometric plane figures, polygons, quadrilaterals, and parallelograms with opposite equal acute angles, opposite equal obtuse angles, and four equal sides Description 1:
  • 12. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 12 I am thinking of a type of object, what is it? Description 2:
  • 13. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 13
  • 14. Two centuries of research on learning concludes… © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 14 “…repeated retrieval during learning is the key to long-term retention.” — Henry L. “Roddy” Roediger
  • 15. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 15 Cognitive Scientists’ 6 Strategies for Effective Learning Retrieval Practice Spaced Practice Dual Coding Interleaving Concrete Examples Elaboration
  • 16. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 16 Retrieval Practice
  • 17. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 17
  • 18. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 18
  • 19. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 19 Implications for Teachers 1 Most teachers should test more frequently, …with smaller, shorter, low-stakes tests Understand that useful assessment can be short and simple.
  • 20. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 20 Implications for Teachers 2 Does the test format matter? • multiple-choice? • essay? • short answer? • oral? • demonstration? • …etc.? Not so much.
  • 21. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 21 Tests provide feedback to teachers about what works and what does not Implications for Teachers 3 Just like students can learn by testing each other; teachers can help each other by reviewing each others’ tests.
  • 22. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 22 Cognitive Psychology experiments were conducted with “formative” tests in schools and classrooms
  • 23. What about systemwide, large-scale tests? First priority: do no harm to the formative testing programs in schools and classrooms
  • 24. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 24 The effect of testing on student learning • 12-year study, read >3,000 documents • analyzed close to 700 separate studies, and more than 1,600 separate effects • 2,000 other studies were reviewed and found incomplete or inappropriate • hundreds of other studies remain to be reviewed
  • 25. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 25 245 Qualitative studies 813 Surveys or Polls 640 Quantitative Studies: Experiments: School- and classroom-level Multivariate studies: Large-scale testing programs The effect of testing on student learning
  • 26. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 26 Meta-analysis A method for summarizing a large research literature, with a single, comparable measure. ( 0.5 effect size ≈ 1 grade level of learning )
  • 27. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 27 Findings from Phelps (2012): • Survey study effect sizes average >1.0 • Over 90% of qualitative studies positive • For quantitative studies, univariate effect sizes positive and stronger when: – Testing more frequently – Testing with feedback – Testing with stakes
  • 28. 28 Findings from Phelps & Silva (2015) For quantitative studies, effect sizes vary between 0.55 and 0.88: +++ testing more frequently ++ testing with stakes + testing with feedback International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016© 2016, Richard P PHELPS
  • 29. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 29 • size of study population • small +0.34 over large • scale of test administration • small-scale +0.14 over large-scale • responsible level of government • local tests +0.29 over state tests Effect of scale on testing benefits
  • 30. Large-scale test, tight security © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 30
  • 31. Large-scale test, lax security © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 31
  • 32. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 32 Besides, systemwide tests are needed for other purposes, such as… …selection to programs with limited number of places …monitoring and system diagnosis …workforce planning …accountability …credentialing That’s enough!
  • 33. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 33 Some large-scale test advantages On per-student basis, inexpensive Cognitive laboratory pre-testing possible Standardization offers comparisons across schools and regions. May produce high-quality items that schools and teachers can use. MOST IMPORTANT: provides reliable, comparative information to all those not involved in a particular school
  • 34. The more systemwide decision points, the better ? Figure 1: Average TIMSS Score and Number of Quality Control Measures Used, by Country 0 10 20 30 40 50 60 70 80 0 5 10 15 20 Number of Quality Control Measures Used AveragePercentCorrect(grades7&8) Top-Performing Countries Bottom-Performing Countries SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001 © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 34
  • 35. Quality control has proportionally greater effect in poorer countries Figure 2: Average TIMSS Score and Number of Quality Control Measures Used (each adjusted for GDP/capita), by Country Number of Quality Control Measures Used (per GDP/capita) AveragePercentCorrect(grades7&8) (perGDP/capita) SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001 © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 35
  • 36. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 36 TIMSS, PIRLS, CIVED, SITES, ICILS, PPP, ECES, TEDS IEA: OECD PISA: World Bank: PISA, PISA for schools PISA for development READ, SABER …provides funding for PISA
  • 37. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 37 The effect of international testing programs Freedomtodesignyourtesting school tests international tests state and national tests
  • 38. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 38 OECD and World Bank are run by economists How well do economists understand PSYCH-ometrics? Some interesting examples: Chile’s national testing program, funded by the World Bank OECD’s “Synergies for Better Learning” project
  • 39. © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 39 Some interesting oddities: World Bank educational assessment chiefs are always Irish nationals affiliated with Boston College in the USA. PISA is universally interpreted as an achievement test, even by the OECD. In reality, it has been an unvalidated aptitude test.
  • 40. Designing an Assessment System richard {at} nonpartisaneducation {dot} org