SlideShare a Scribd company logo
QUALITY ASSURANCE ON
INTERNAL ATTRIBUTES OF A
GOOD MEASUREING LANGUAGE
DEVICES (RELIABILITY)
AMIRUL FAISAL RIZZA
TESTS
TOOLS /
INSTRUMENTS
English test
tools / instruments
to draw out evidence
of the existance of
English abilities
1.Good instruments :
2.The hidden English abilities
are guaranteed to be
observable.
1.Bad instruments :
2.1. Damage measurements and
evaluation.
3.2. Can not describe the real
language ablities of the test takers.
1. Reability
2. Validity
3. Practicallity/usability
4. Andeconomy
Requirements for a good instrument
RELIABLE = STABLE = CONSISTENT
Reliability
• Reliable test is a test that can produce stable scores or
consistent scores.
• Test scores demonstate consistency or stability no
matter who administers the test (Rater or Interrater).
• The scores consistent no matter when or where the test
is administrated.
Reliability
– Observed Score is the data gathered by the researcher
– True Score is the actual unknown values that correspond to the
construct of interest
– Error
– Systematic Error is variations that results from constructs of disinterest
– Unsystematic / Random Error is nonsystematic variations in the observed
scores
Observed Score = True Score + (Measurement) Error
Students Rater 1 Rater 2
A 8 8
B 8.6 8.6
C 9 9
D 8 8
E 9.4 9.4
Perfect Consistency
Consistency between Raters or interraters
Students Rater 1 Rater 2
A 8.2 8
B 8.6 8.8
C 8.9 9
D 8 8.1
E 9.3 9.4
Consistent Test
Students Administrated on
Wednesday
Administrated on
Friday
A 7.8 8
B 8.6 8.3
C 9.1 9
D 8 8.2
E 9.3 9.4
Consistent across time
Inconsistency between raters or interraters
Students Administrated on
Wednesday
Administrated on
Friday
A 3 8
B 8.6 2
C 3.1 9
D 8 8.2
E 9.3 5
Students Administrated on
Wednesday
Administrated on
Friday
A 4 8
B 8.6 2
C 5 9
D 6 8.2
E 8 8
Inconsistency across time
How do we determine whether a
measurement is reliable?
The principles of reliability estimation utilizing
these APPROACHES:
Test Retest
Parrarel Forms
Internal Consistency
TEST RETEST
Uses the same test twice to the same group of
subjects on different testing occasions.
There is a repetation on the use of the same
instrument and the invovements of the same
subjects.
The repetetion is done on different day.
TIME ACTIVITY PARTICIPANTS RESULT
Wednesday, 3/10/18 Vocabulary test 50 students of SMA
1 Bangsal
Result 1
Wednesday,
10/10/18
The same
vocabulary test
50 students of SMA
1 Bangsal
Result 2
Advantanges Disadvantages
• We only need one set of a test. • Requires of testing occasions.
• It is not easy to create a similar
condition on different testing
occasions.
• Too close time for the test
administration makes the test
takers still remember the
content of the test.
• Far too long of the second test
may affect the test takers’
performance.
• Cause boredom, ailment and
the like.
Back
Parrarel Forms
Requires two or more sets of tests.
Each set of test is made equal in every aspect
of the test with other test.
Equal in :
Test format
Test lenght
The level of difficulty
Discrimination indexes used
Time allocation
Test content
SET A
(administrated on
Tuesday)
SET B
(administrated on
Friday)
Administered
to a group of
students
Scores
produced from
completing Set
A
Scores
produced from
completing Set
B
Correlational
analysis
Advantanges Disadvantages
• Has more variations in sets of
the tests.
• Time consuming to make two
more sets of the tests.
• Not easy to keep the students’
motivation in doing the second
test.
Back
Internal Consistency
Based on the logic that if the items in the test are highly
correlated, the test is said to be reliable.
Before develop a test, it should be built a theoritical
ability that would be measured by the test.
The items of the test should be constructed to measure
a single ability (technically it is calaed as “construct”).
Tests with higher internal consistency more accurately
measure the intended construct of the test developers.
Vocabulary ability
Synonym
Indicator 1 Indicator 2
Antonym
Indicator 1 Indicator 2
Item 1
Item 2
Item 3
Item 1
Item 2
Item 3
Item 4
Item 1
Item 2
Item 3
Item 1
Item 2
Item 3
Item 4
Concept level
Dimension level
Indicator level
Test item
level
A tetst of vocabulary containing assembles & selected items Test level
Students item Total score
A 1 1 0 1 0 1 1 0 1 0 6
B 1 0 1 0 1 0 1 1 1 0 6
C 0 1 1 1 1 1 1 1 1 1 9
D 1 0 1 0 0 0 1 1 1 0 4
E 1 0 0 0 1 1 1 1 1 1 7
Test takers’ scores
(Hypothetical dichotoumous scoring on 10 items)
F 1 1 0 1 1 1 1 1 1 1 9
Total score 5 3 3 3 4 4 6 5 6 3
Approaches to perform internal consistency
 Split half : split the scores based on the test achievement in the first half of
the items and those on the second items.
 The split can be half of the total items or based on the odd or even numbers.
 Some drawbacks of split-half are:
 Inter-item estimation : the test scores are correlated with themselves
within the same test.
It is called as inter-item correlation.
Obtained scores in each item are correlated with one another.
Item 1 is correlated with item 2, 3, 4, 5, 6, 7, 8, 9, 10 or
Item 2 is correlated with item 1, 3, 4, 5, 6, 7, 8, 9, 10.
Examples :
Split-half
Test
takers’
identity
Item Total
(set A)1 2 3 4 5
A 1 1 0 1 0 3
B 0 1 1 0 1 3
C 1 1 1 1 1 5
D 1 1 0 0 0 2
E 1 0 1 1 0 3
F 1 0 1 0 1 3
TOTAL
SCORE
5 4 4 3 3
1ST half
Split-half
Test
takers’
identity
Item Total
(set A)6 7 8 9 10
A 1 1 0 1 0 3
B 0 1 1 0 1 3
C 1 0 1 1 1 4
D 1 1 0 0 0 2
E 1 0 1 1 1 4
F 0 0 1 0 1 2
TOTAL
SCORE
4 3 4 3 4
2nd half
Back
Does not fully reflect the true value of reliablity of the
test (Kline, 1993:11)
Different split may cause different result of reliability
(Cronbach, 1951)
Test lenght affects the reliability of the test. The more
items in the test, the reliable the test is (Wiersma and
Jurs, 1990:163).
DRAWBACKS OF SPLIT-HALF
Back
Example of inter-item estimation
There two or more raters evaluate students speaking
skills.
The scoring may be based on some several aspects.a
statistical analysis may be used to analyze the data,
usually uses t-test.
A correlational analysis may be applied to examine the
closeness of the scores got by the two rates.
Quality assurance on internal attributes of a good

More Related Content

PPT
Test Construction
PPT
Ahmad measuremenr reliability and validity
PPTX
Characteristics of a Good Test
PPT
Test development
PDF
Characteristics of a Good Test
PPTX
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
PPTX
Qualitative Analysis of Test Items
PPTX
Language testing and evaluation validity and reliability.
Test Construction
Ahmad measuremenr reliability and validity
Characteristics of a Good Test
Test development
Characteristics of a Good Test
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
Qualitative Analysis of Test Items
Language testing and evaluation validity and reliability.

What's hot (19)

PPT
7.1 ealta guidelines
PPT
Reliability and validity
PPTX
Unit 5 validity and reliability
PDF
Blended Learning System Design Model
PPTX
Assessment in Learning
PPTX
Reliability for testing and assessment
PPT
Reliability
PPTX
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
PPTX
Characteristics of a good test
PPTX
Language testing
PPTX
Reliability bachman 1990 chapter 6
PPT
Characteristics of effective tests and hiring
PPT
Seminar gre 2014
DOCX
Certamen item corrected
DOCX
Norm tets
PPTX
Instrumentation
PPTX
Steps fo test constructions
PPTX
Kinds of Tests and Testing
DOC
Assessment
7.1 ealta guidelines
Reliability and validity
Unit 5 validity and reliability
Blended Learning System Design Model
Assessment in Learning
Reliability for testing and assessment
Reliability
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
Characteristics of a good test
Language testing
Reliability bachman 1990 chapter 6
Characteristics of effective tests and hiring
Seminar gre 2014
Certamen item corrected
Norm tets
Instrumentation
Steps fo test constructions
Kinds of Tests and Testing
Assessment
Ad

Similar to Quality assurance on internal attributes of a good (20)

PPTX
Developing instruments for research
PDF
Assessment in Learning ( Characteristics)
PPTX
Module-4-Priciples-of-High-Quality-Assessment-and-Methods-of-Estimating-Relia...
PPTX
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
PPT
Louzel Report - Reliability & validity
PDF
MyMathTest La Trobe case study
PPT
Testing in language programs (chapter 8)
PPT
Characteristics of a good test
PPTX
Test construction
PPT
Quality test construction 1
PPT
Stages of Development............................
PPT
Ch05 instrumentation
PPTX
7.1 assessment and the cefr (1)
PPTX
Test Construction, drawing up test Specifications.
PPTX
Characteristics of Assessment
PPTX
Unit 2.pptx
PPTX
Basic Principles of Assessment
PPTX
Validity and Reliability
PPTX
Item analysis
PPTX
Standardized testing.pptx 2
Developing instruments for research
Assessment in Learning ( Characteristics)
Module-4-Priciples-of-High-Quality-Assessment-and-Methods-of-Estimating-Relia...
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
Louzel Report - Reliability & validity
MyMathTest La Trobe case study
Testing in language programs (chapter 8)
Characteristics of a good test
Test construction
Quality test construction 1
Stages of Development............................
Ch05 instrumentation
7.1 assessment and the cefr (1)
Test Construction, drawing up test Specifications.
Characteristics of Assessment
Unit 2.pptx
Basic Principles of Assessment
Validity and Reliability
Item analysis
Standardized testing.pptx 2
Ad

Recently uploaded (20)

PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Pharma ospi slides which help in ospi learning
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Lesson notes of climatology university.
PPTX
Presentation on HIE in infants and its manifestations
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Computing-Curriculum for Schools in Ghana
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
A systematic review of self-coping strategies used by university students to ...
Pharma ospi slides which help in ospi learning
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Cell Structure & Organelles in detailed.
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Anesthesia in Laparoscopic Surgery in India
Lesson notes of climatology university.
Presentation on HIE in infants and its manifestations
GDM (1) (1).pptx small presentation for students
Computing-Curriculum for Schools in Ghana
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
STATICS OF THE RIGID BODIES Hibbelers.pdf
Supply Chain Operations Speaking Notes -ICLT Program

Quality assurance on internal attributes of a good

  • 1. QUALITY ASSURANCE ON INTERNAL ATTRIBUTES OF A GOOD MEASUREING LANGUAGE DEVICES (RELIABILITY) AMIRUL FAISAL RIZZA
  • 3. English test tools / instruments to draw out evidence of the existance of English abilities
  • 4. 1.Good instruments : 2.The hidden English abilities are guaranteed to be observable. 1.Bad instruments : 2.1. Damage measurements and evaluation. 3.2. Can not describe the real language ablities of the test takers.
  • 5. 1. Reability 2. Validity 3. Practicallity/usability 4. Andeconomy Requirements for a good instrument
  • 6. RELIABLE = STABLE = CONSISTENT Reliability • Reliable test is a test that can produce stable scores or consistent scores. • Test scores demonstate consistency or stability no matter who administers the test (Rater or Interrater). • The scores consistent no matter when or where the test is administrated.
  • 7. Reliability – Observed Score is the data gathered by the researcher – True Score is the actual unknown values that correspond to the construct of interest – Error – Systematic Error is variations that results from constructs of disinterest – Unsystematic / Random Error is nonsystematic variations in the observed scores Observed Score = True Score + (Measurement) Error
  • 8. Students Rater 1 Rater 2 A 8 8 B 8.6 8.6 C 9 9 D 8 8 E 9.4 9.4 Perfect Consistency Consistency between Raters or interraters
  • 9. Students Rater 1 Rater 2 A 8.2 8 B 8.6 8.8 C 8.9 9 D 8 8.1 E 9.3 9.4 Consistent Test
  • 10. Students Administrated on Wednesday Administrated on Friday A 7.8 8 B 8.6 8.3 C 9.1 9 D 8 8.2 E 9.3 9.4 Consistent across time
  • 11. Inconsistency between raters or interraters Students Administrated on Wednesday Administrated on Friday A 3 8 B 8.6 2 C 3.1 9 D 8 8.2 E 9.3 5
  • 12. Students Administrated on Wednesday Administrated on Friday A 4 8 B 8.6 2 C 5 9 D 6 8.2 E 8 8 Inconsistency across time
  • 13. How do we determine whether a measurement is reliable? The principles of reliability estimation utilizing these APPROACHES: Test Retest Parrarel Forms Internal Consistency
  • 14. TEST RETEST Uses the same test twice to the same group of subjects on different testing occasions. There is a repetation on the use of the same instrument and the invovements of the same subjects. The repetetion is done on different day.
  • 15. TIME ACTIVITY PARTICIPANTS RESULT Wednesday, 3/10/18 Vocabulary test 50 students of SMA 1 Bangsal Result 1 Wednesday, 10/10/18 The same vocabulary test 50 students of SMA 1 Bangsal Result 2
  • 16. Advantanges Disadvantages • We only need one set of a test. • Requires of testing occasions. • It is not easy to create a similar condition on different testing occasions. • Too close time for the test administration makes the test takers still remember the content of the test. • Far too long of the second test may affect the test takers’ performance. • Cause boredom, ailment and the like. Back
  • 17. Parrarel Forms Requires two or more sets of tests. Each set of test is made equal in every aspect of the test with other test. Equal in : Test format Test lenght The level of difficulty Discrimination indexes used Time allocation Test content
  • 18. SET A (administrated on Tuesday) SET B (administrated on Friday) Administered to a group of students Scores produced from completing Set A Scores produced from completing Set B Correlational analysis
  • 19. Advantanges Disadvantages • Has more variations in sets of the tests. • Time consuming to make two more sets of the tests. • Not easy to keep the students’ motivation in doing the second test. Back
  • 20. Internal Consistency Based on the logic that if the items in the test are highly correlated, the test is said to be reliable. Before develop a test, it should be built a theoritical ability that would be measured by the test. The items of the test should be constructed to measure a single ability (technically it is calaed as “construct”). Tests with higher internal consistency more accurately measure the intended construct of the test developers.
  • 21. Vocabulary ability Synonym Indicator 1 Indicator 2 Antonym Indicator 1 Indicator 2 Item 1 Item 2 Item 3 Item 1 Item 2 Item 3 Item 4 Item 1 Item 2 Item 3 Item 1 Item 2 Item 3 Item 4 Concept level Dimension level Indicator level Test item level A tetst of vocabulary containing assembles & selected items Test level
  • 22. Students item Total score A 1 1 0 1 0 1 1 0 1 0 6 B 1 0 1 0 1 0 1 1 1 0 6 C 0 1 1 1 1 1 1 1 1 1 9 D 1 0 1 0 0 0 1 1 1 0 4 E 1 0 0 0 1 1 1 1 1 1 7 Test takers’ scores (Hypothetical dichotoumous scoring on 10 items) F 1 1 0 1 1 1 1 1 1 1 9 Total score 5 3 3 3 4 4 6 5 6 3
  • 23. Approaches to perform internal consistency  Split half : split the scores based on the test achievement in the first half of the items and those on the second items.  The split can be half of the total items or based on the odd or even numbers.  Some drawbacks of split-half are:  Inter-item estimation : the test scores are correlated with themselves within the same test. It is called as inter-item correlation. Obtained scores in each item are correlated with one another. Item 1 is correlated with item 2, 3, 4, 5, 6, 7, 8, 9, 10 or Item 2 is correlated with item 1, 3, 4, 5, 6, 7, 8, 9, 10. Examples :
  • 24. Split-half Test takers’ identity Item Total (set A)1 2 3 4 5 A 1 1 0 1 0 3 B 0 1 1 0 1 3 C 1 1 1 1 1 5 D 1 1 0 0 0 2 E 1 0 1 1 0 3 F 1 0 1 0 1 3 TOTAL SCORE 5 4 4 3 3 1ST half
  • 25. Split-half Test takers’ identity Item Total (set A)6 7 8 9 10 A 1 1 0 1 0 3 B 0 1 1 0 1 3 C 1 0 1 1 1 4 D 1 1 0 0 0 2 E 1 0 1 1 1 4 F 0 0 1 0 1 2 TOTAL SCORE 4 3 4 3 4 2nd half Back
  • 26. Does not fully reflect the true value of reliablity of the test (Kline, 1993:11) Different split may cause different result of reliability (Cronbach, 1951) Test lenght affects the reliability of the test. The more items in the test, the reliable the test is (Wiersma and Jurs, 1990:163). DRAWBACKS OF SPLIT-HALF Back
  • 27. Example of inter-item estimation There two or more raters evaluate students speaking skills. The scoring may be based on some several aspects.a statistical analysis may be used to analyze the data, usually uses t-test. A correlational analysis may be applied to examine the closeness of the scores got by the two rates.