Use of the correlation coefficient as a measure of effectiveness of a scoring system

Use of the correlation coefficient as a measure of effectiveness of a scoring
system

Suppose we are given a sample of n student essays.
Let X be the human score assigned to a student with x = 1,2,…,6 are the possible ratings. The
 x
idea of ‘true score’ is that the normal score  1   is closer than x to the ‘true’ score.  1 is the
7
inverse standard normal distribution.
Let Y1 , Y2 , ..., Yn be the machine scores assigned to the n essays.

 yj 
Y j  1,2,...,6 ;  1   is the normal score; j  1,2,..., n
 7 
 
How related is Y  Y1 , Y2 ,..., Yn  to X ?
One can calculate the correlation coefficient between Y and X , and test the hypothesis H o : no
relation?
How well can we predict the human score? We need measures of uncertainty, confidence
intervals, etc.
Ho well can the machine score predict the ‘true’ score S ? Does it have a distribution? Can we
predict ES over population of human raters?
Given a bivariate random variable  X , Y  with X denoting the human score and Y machine
score:
If one has a reason to require that a machine score Y be some function of X , then it is well
2
known (e.g., Wilks, 1962 ) that the f (X ) minimizing EY  f (X ) is given by

f ( X )  EY | X  , for any function f with Ef 2 ( X )  

Furthermore (Brillinger, 1966), the correlation squared, r 2 , between Y and f (X ) is maximized
Y
by choosing f ( X )  a  bE | X  , where a and b are constants.
Let’s apply this last result to the construction of three scoring systems that make use of r 2 as a
measure of the effectiveness of a scoring system.

Application 1

Consider the following scoring system.
Assign the ith essay the score y j if X i (its human score) = the jth largest of the n X , s
Let r = the correlation between the scores x and the variate values y
Let’s determine the y , s , the machine scores, by requiring that r 2 be as large as possible.
It turns out that the solution is that one should take, up to an arbitrary linear transformation,
y j  EX  j  ,

where X  j  = jth largest of the n X , s .
That is, in fact, the often used scoring procedure of giving the jth largest observation the score of
the expected value of the jth order statistics of a standard distribution.

Application 2
Let C1 , C2 , ... C6 be a fixed set of mutually exclusive and exhaustive regions.
Consider the scoring system:
Assign the ith essay the machine score y j if X i falls in cluster C j .
If we want to determine the y , s by the requirement that the correlation between the scores x ,
and the variate value, Y ,be as large as possible, then the y , s are given by
y j  EX | X is in class C j 

Application 3
Consider the situation where we have n independent observations on the vector variate
 X , Y , Z1 , Z 2 , ... Z k 
X  x
where
Y  y 
and y  a  bx is the machine score and x the human score
with x, y,  ,  random variables and cov , z j = 0 = cov , z j  for all j .



,

In this situation the Z j s are known as instrumental variates, and the estimates

ˆ
bj 

 Y
 X

j
j

 Y Z ji  Z j 

 X Z ji  Z j 

of the parameter b have been proposed.
Let’s choose a function f Z1,
optimum manner.

..., Z k  , which is also an instrumental variable in some

Under regularity conditions, it may be established that
ˆ
b

 Y
 X

j
j

 Y  f i  f 

 X  f i  f 

is asymptotically normal with mean b and variance
var  b 

2
nr varx

(1)

where is the correlation between X and f .

ˆ
Expression (1) is also the asymptotic mean-squared error of b .
We see that choosing f to minimize (1) is equivalent to choosing f to maximize r 2 .
We know from the theorm that this last occurs when, up to a linear transformation,
f Z1,

..., Z k  = EX | Z1,

Z 2 ,... Z k . (2)

Thus our a priori procedure is to select that function of Z1,
related to X, the human scores, as given by (2).

..., Z k  that may be linearly

This third application indicates that, asymptotically, when one is using instrumental variables as
an aid to estimation one should employ that function of the instrumental variables that may prove
linearly related to the primary variables.
The analysis given above may be seen to provide a justification of the often used procedure of
employing the correlation coefficient between machine and human scores as a measure of the
effectiveness of a machine scoring system.

Use of the correlation coefficient as a measure of effectiveness of a scoring system

More Related Content

What's hot (20)

Similar to Use of the correlation coefficient as a measure of effectiveness of a scoring system (20)

Recently uploaded (20)

Use of the correlation coefficient as a measure of effectiveness of a scoring system