SlideShare a Scribd company logo
Use of the correlation coefficient as a measure of effectiveness of a scoring
system

Suppose we are given a sample of n student essays.
Let X be the human score assigned to a student with x = 1,2,…,6 are the possible ratings. The
 x
idea of ‘true score’ is that the normal score  1   is closer than x to the ‘true’ score.  1 is the
7
inverse standard normal distribution.
Let Y1 , Y2 , ..., Yn be the machine scores assigned to the n essays.

 yj 
Y j  1,2,...,6 ;  1   is the normal score; j  1,2,..., n
 7 
 
How related is Y  Y1 , Y2 ,..., Yn  to X ?
One can calculate the correlation coefficient between Y and X , and test the hypothesis H o : no
relation?
How well can we predict the human score? We need measures of uncertainty, confidence
intervals, etc.
Ho well can the machine score predict the ‘true’ score S ? Does it have a distribution? Can we
predict ES over population of human raters?
Given a bivariate random variable  X , Y  with X denoting the human score and Y machine
score:
If one has a reason to require that a machine score Y be some function of X , then it is well
2
known (e.g., Wilks, 1962 ) that the f (X ) minimizing EY  f (X ) is given by

f ( X )  EY | X  , for any function f with Ef 2 ( X )  

Furthermore (Brillinger, 1966), the correlation squared, r 2 , between Y and f (X ) is maximized
Y
by choosing f ( X )  a  bE | X  , where a and b are constants.
Let’s apply this last result to the construction of three scoring systems that make use of r 2 as a
measure of the effectiveness of a scoring system.

Application 1
Consider the following scoring system.
Assign the ith essay the score y j if X i (its human score) = the jth largest of the n X , s
Let r = the correlation between the scores x and the variate values y
Let’s determine the y , s , the machine scores, by requiring that r 2 be as large as possible.
It turns out that the solution is that one should take, up to an arbitrary linear transformation,
y j  EX  j  ,

where X  j  = jth largest of the n X , s .
That is, in fact, the often used scoring procedure of giving the jth largest observation the score of
the expected value of the jth order statistics of a standard distribution.

Application 2
Let C1 , C2 , ... C6 be a fixed set of mutually exclusive and exhaustive regions.
Consider the scoring system:
Assign the ith essay the machine score y j if X i falls in cluster C j .
If we want to determine the y , s by the requirement that the correlation between the scores x ,
and the variate value, Y ,be as large as possible, then the y , s are given by
y j  EX | X is in class C j 

Application 3
Consider the situation where we have n independent observations on the vector variate
 X , Y , Z1 , Z 2 , ... Z k 
X  x
where
Y  y 
and y  a  bx is the machine score and x the human score
with x, y,  ,  random variables and cov , z j = 0 = cov , z j  for all j .


,

In this situation the Z j s are known as instrumental variates, and the estimates

ˆ
bj 

 Y
 X

j
j

 Y Z ji  Z j 

 X Z ji  Z j 

of the parameter b have been proposed.
Let’s choose a function f Z1,
optimum manner.

..., Z k  , which is also an instrumental variable in some

Under regularity conditions, it may be established that
ˆ
b

 Y
 X

j
j

 Y  f i  f 

 X  f i  f 

is asymptotically normal with mean b and variance
var  b 

2
nr varx

(1)

where is the correlation between X and f .

ˆ
Expression (1) is also the asymptotic mean-squared error of b .
We see that choosing f to minimize (1) is equivalent to choosing f to maximize r 2 .
We know from the theorm that this last occurs when, up to a linear transformation,
f Z1,

..., Z k  = EX | Z1,

Z 2 ,... Z k . (2)

Thus our a priori procedure is to select that function of Z1,
related to X, the human scores, as given by (2).

..., Z k  that may be linearly

This third application indicates that, asymptotically, when one is using instrumental variables as
an aid to estimation one should employ that function of the instrumental variables that may prove
linearly related to the primary variables.
The analysis given above may be seen to provide a justification of the often used procedure of
employing the correlation coefficient between machine and human scores as a measure of the
effectiveness of a machine scoring system.

More Related Content

PDF
Chapter 4 part3- Means and Variances of Random Variables
PPT
Chapter 2 discrete_random_variable_2009
PPTX
Chap05 continuous random variables and probability distributions
PPTX
PPTX
random variables-descriptive and contincuous
PPT
Probability distribution
PPTX
Probability distribution
PPTX
Mean, variance, and standard deviation of a Discrete Random Variable
Chapter 4 part3- Means and Variances of Random Variables
Chapter 2 discrete_random_variable_2009
Chap05 continuous random variables and probability distributions
random variables-descriptive and contincuous
Probability distribution
Probability distribution
Mean, variance, and standard deviation of a Discrete Random Variable

What's hot (20)

PDF
Chapter6
PDF
Estimationtheory2
PPTX
Discrete Random Variables And Probability Distributions
PDF
Estimation theory 1
PPTX
PDF
Chapter4
PDF
R normal distribution
PDF
Chapter4
PPTX
Random variables
PDF
Linear Regression With One or More Variables
PPT
Probability 4.1
PPTX
Continous random variable.
PPTX
Random Variables for discrete case
PPTX
Probability Assignment Help
PPTX
PPTX
Lagrange’s interpolation formula
PPTX
PPTX
Qt random variables notes
PPTX
Statistics and Probability-Random Variables and Probability Distribution
PPTX
1.1 mean, variance and standard deviation
Chapter6
Estimationtheory2
Discrete Random Variables And Probability Distributions
Estimation theory 1
Chapter4
R normal distribution
Chapter4
Random variables
Linear Regression With One or More Variables
Probability 4.1
Continous random variable.
Random Variables for discrete case
Probability Assignment Help
Lagrange’s interpolation formula
Qt random variables notes
Statistics and Probability-Random Variables and Probability Distribution
1.1 mean, variance and standard deviation
Ad

Similar to Use of the correlation coefficient as a measure of effectiveness of a scoring system (20)

PDF
Paper 7 (s.k. ashour)
PPTX
ISM_Session_5 _ 23rd and 24th December.pptx
PDF
Prob review
PDF
Lesson 26
PDF
AI Lesson 26
PPTX
Statistics Homework Help
PDF
Some Unbiased Classes of Estimators of Finite Population Mean
PPTX
REGRESSION ANALYSIS THEORY EXPLAINED HERE
PPT
lectr16.ppt
PDF
Moment-Generating Functions and Reproductive Properties of Distributions
PDF
IJSRED-V2I5P56
PDF
A bit about мcmc
PPTX
this materials is useful for the students who studying masters level in elect...
PPTX
PDF
E028047054
PDF
PPTX
Excel Homework Help
PDF
Linear_Models_with_R_----_(2._Estimation).pdf
Paper 7 (s.k. ashour)
ISM_Session_5 _ 23rd and 24th December.pptx
Prob review
Lesson 26
AI Lesson 26
Statistics Homework Help
Some Unbiased Classes of Estimators of Finite Population Mean
REGRESSION ANALYSIS THEORY EXPLAINED HERE
lectr16.ppt
Moment-Generating Functions and Reproductive Properties of Distributions
IJSRED-V2I5P56
A bit about мcmc
this materials is useful for the students who studying masters level in elect...
E028047054
Excel Homework Help
Linear_Models_with_R_----_(2._Estimation).pdf
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
A comparative analysis of optical character recognition models for extracting...
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Assigned Numbers - 2025 - Bluetooth® Document
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation

Use of the correlation coefficient as a measure of effectiveness of a scoring system

  • 1. Use of the correlation coefficient as a measure of effectiveness of a scoring system Suppose we are given a sample of n student essays. Let X be the human score assigned to a student with x = 1,2,…,6 are the possible ratings. The  x idea of ‘true score’ is that the normal score  1   is closer than x to the ‘true’ score.  1 is the 7 inverse standard normal distribution. Let Y1 , Y2 , ..., Yn be the machine scores assigned to the n essays.  yj  Y j  1,2,...,6 ;  1   is the normal score; j  1,2,..., n  7    How related is Y  Y1 , Y2 ,..., Yn  to X ? One can calculate the correlation coefficient between Y and X , and test the hypothesis H o : no relation? How well can we predict the human score? We need measures of uncertainty, confidence intervals, etc. Ho well can the machine score predict the ‘true’ score S ? Does it have a distribution? Can we predict ES over population of human raters? Given a bivariate random variable  X , Y  with X denoting the human score and Y machine score: If one has a reason to require that a machine score Y be some function of X , then it is well 2 known (e.g., Wilks, 1962 ) that the f (X ) minimizing EY  f (X ) is given by f ( X )  EY | X  , for any function f with Ef 2 ( X )   Furthermore (Brillinger, 1966), the correlation squared, r 2 , between Y and f (X ) is maximized Y by choosing f ( X )  a  bE | X  , where a and b are constants. Let’s apply this last result to the construction of three scoring systems that make use of r 2 as a measure of the effectiveness of a scoring system. Application 1
  • 2. Consider the following scoring system. Assign the ith essay the score y j if X i (its human score) = the jth largest of the n X , s Let r = the correlation between the scores x and the variate values y Let’s determine the y , s , the machine scores, by requiring that r 2 be as large as possible. It turns out that the solution is that one should take, up to an arbitrary linear transformation, y j  EX  j  , where X  j  = jth largest of the n X , s . That is, in fact, the often used scoring procedure of giving the jth largest observation the score of the expected value of the jth order statistics of a standard distribution. Application 2 Let C1 , C2 , ... C6 be a fixed set of mutually exclusive and exhaustive regions. Consider the scoring system: Assign the ith essay the machine score y j if X i falls in cluster C j . If we want to determine the y , s by the requirement that the correlation between the scores x , and the variate value, Y ,be as large as possible, then the y , s are given by y j  EX | X is in class C j  Application 3 Consider the situation where we have n independent observations on the vector variate  X , Y , Z1 , Z 2 , ... Z k  X  x where Y  y  and y  a  bx is the machine score and x the human score with x, y,  ,  random variables and cov , z j = 0 = cov , z j  for all j .  
  • 3. , In this situation the Z j s are known as instrumental variates, and the estimates ˆ bj   Y  X j j  Y Z ji  Z j   X Z ji  Z j  of the parameter b have been proposed. Let’s choose a function f Z1, optimum manner. ..., Z k  , which is also an instrumental variable in some Under regularity conditions, it may be established that ˆ b  Y  X j j  Y  f i  f   X  f i  f  is asymptotically normal with mean b and variance var  b   2 nr varx (1) where is the correlation between X and f . ˆ Expression (1) is also the asymptotic mean-squared error of b . We see that choosing f to minimize (1) is equivalent to choosing f to maximize r 2 . We know from the theorm that this last occurs when, up to a linear transformation, f Z1, ..., Z k  = EX | Z1, Z 2 ,... Z k . (2) Thus our a priori procedure is to select that function of Z1, related to X, the human scores, as given by (2). ..., Z k  that may be linearly This third application indicates that, asymptotically, when one is using instrumental variables as an aid to estimation one should employ that function of the instrumental variables that may prove linearly related to the primary variables. The analysis given above may be seen to provide a justification of the often used procedure of employing the correlation coefficient between machine and human scores as a measure of the effectiveness of a machine scoring system.