InnerSoft STATS - Methods and formulas help

InnerSoft STATS
Methods and Formulas Help

METHODS AND FORMULAS HELP V2.1 InnerSoft STATS
2
Mean
The arithmetic mean is the sum of a collection of numbers divided by the number of numbers in the
collection.
Sample Variance
The estimator of population variance, also called the unbiased sample variance, is:
𝑆2
=
∑ (𝑥𝑖 − 𝑥̅)2𝑛
𝑖=1
𝑛 − 1
Source: http://en.wikipedia.org/wiki/Variance
Sample Kurtosis
The estimators of population kurtosis is:
𝐺2 =
𝑘4
𝑘2
2 =
(𝑛 + 1)𝑛
(𝑛 − 1)(𝑛 − 2)(𝑛 − 3)
∗
∑ (𝑥𝑖 − 𝑥̅)4𝑛
𝑖=1
𝑘2
2 − 3
(𝑛 − 1)2
(𝑛 − 2)(𝑛 − 3)
The standard error of the sample kurtosis of a sample of size n from the normal distribution is:
𝐾 𝑆𝑡𝑑. 𝐸𝑟𝑟𝑜𝑟 = √
4[6𝑛(𝑛 − 1)2(𝑛 + 1)]
(𝑛 − 3)(𝑛 − 2)(𝑛 + 1)(𝑛 + 3)(𝑛 + 5)
Source: http://en.wikipedia.org/wiki/Kurtosis#Estimators_of_population_kurtosis
Sample Skewness
Skewness of a population sample is estimated by the adjusted Fisher–Pearson standardized moment
coefficient:
𝐺 =
𝑛
(𝑛 − 1)(𝑛 − 2)
∑ (
𝑥𝑖 − 𝑥̅
𝑠
)
3𝑛
𝑖=1
where n is the sample size and s is the sample standard deviation.
The standard error of the skewness of a sample of size n from a normal distribution is:
𝐺 𝑆𝑡𝑑. 𝐸𝑟𝑟𝑜𝑟 = √
6𝑛(𝑛 − 1)
(𝑛 − 2)(𝑛 + 1)(𝑛 + 3)
Source: https://en.wikipedia.org/wiki/Skewness#Sample_skewness
Total Variance
Variance of the entire population is:
𝜎2
=
∑ (𝑥𝑖 − 𝑥̅)2𝑛
𝑖=1
𝑛

3
Source: http://en.wikipedia.org/wiki/Variance
Total Kurtosis
Kurtosis of the entire population is:
𝐺2 =
∑ (𝑥𝑖 − 𝑥̅)4𝑛
𝑖=1
𝑛
𝜎4
− 3
where n is the sample size and σ is the total standard deviation.
Source: http://en.wikipedia.org/wiki/Kurtosis
Total Skewness
Skewness of the entire population is:
𝐺 =
∑ (𝑥𝑖 − 𝑥̅)3𝑛
𝑖=1
𝑛
𝜎3
where n is the sample size and σ is the total standard deviation.
Source: https://en.wikipedia.org/wiki/Skewness
Quantiles of a population
ISSTATS uses the same method as R–7, Excel CUARTIL.INC function, SciPy–(1,1), SPSS and Minitab.
Qp, the estimate for the kth
q–quantile, where p = k/q and h = (N–1)*p + 1, is computing by
Qp =
Linear interpolation of the modes for the order statistics for the uniform distribution on [0, 1]. When p =
1, use xN.
Source: http://en.wikipedia.org/wiki/Quantile#Estimating_the_quantiles_of_a_population
MSSD (Mean of the squared successive differences)
It is calculated by taking the sum of the differences between consecutive observations squared, then
taking the mean of that sum and dividing by two.
𝑀𝑆𝑆𝐷 =
∑ (𝑥𝑖+1 − 𝑥𝑖)2𝑛
𝑖=1
2(𝑛 − 1)
The MSSD has the desirable property that one half the MSSD is an unbiased estimator of true variance.
Pearson Chi Square Test
The value of the test-statistic is

4
𝜒2
= ∑
(𝑂𝑖 − 𝐸𝑖)2
𝐸𝑖
𝑛
𝑖=1
Where
 𝜒2
is the Pearson's cumulative test statistic, which asymptotically approaches a 𝜒2
distribution
with (r - 1)(c - 1) degrees of freedom.
 𝑂𝑖 is the number of observations of type i.
 𝐸𝑖 is the expected (theoretical) frequency of type i
Yates's Continuity Correction
𝜒2
= ∑
(𝑚𝑎𝑥{0, |𝑂𝑖 − 𝐸𝑖| − 0.5})2
𝐸𝑖
𝑛
𝑖=1
When |𝑂𝑖 − 𝐸𝑖| − 0.5 is below zero, the null value is computed. The effect of Yates' correction is to
prevent overestimation of statistical significance for small data. This formula is chiefly used when at least
one cell of the table has an expected count smaller than 5.
Likelihood Ratio G-Test
𝐺 = 2 (∑ ∑ 𝑂𝑖𝑗 ∗ 𝑙𝑛(
𝑂𝑖𝑗
𝐸𝑖𝑗
)
𝑐
𝑗=1
𝑟
𝑖=1
)
where
 Oij is the observed count in row i and column j
 Eij is the expected count in row i and column j
G has an asymptotically approximate χ2
distribution with (r - 1)(c - 1) degrees of freedom when the null
hypothesis is true and n is large enough.
Mantel-Haenszel Chi-Square Test
The Mantel-Haenszel chi-square statistic tests the alternative hypothesis that there is a linear association
between the row variable and the column variable. Both variables must lie on an ordinal scale. The
Mantel-Haenszel chi-square statistic is computed as:
𝑄 𝑀𝐻 = (𝑛 − 1)𝑟2
Where r is the Pearson correlation between the row variable and the column variable, n is the sample size.
Under the null hypothesis of no association, has an asymptotic chi-square distribution with one degree of
freedom.

5
Fisher's Exact Test
Fisher’s exact test assumes that the row and column totals are fixed, and then uses the hypergeometric
distribution to compute probabilities of possible tables conditional on the observed row and column totals.
Fisher’s exact test does not depend on any large-sample distribution assumptions, and so it is appropriate
even for small sample sizes and for sparse tables. This test is computed for 2X2 tables such as
𝐴 = (
𝑎 𝑏
𝑐 𝑑
)
For an efficient computing, the elements of the matrix A are reordered
A’ = ( 𝑎′ 𝑏′
𝑐′ 𝑑′
)
Being a’ the cell of A that have the minimum marginals (minimum row and column totals). The test result
does not depend on the cells disposition.
The left-sided –value sums the probability for all the tables that have equal or smaller a’.
p 𝑙𝑒𝑓𝑡 = P(𝑥 ≤ 𝑎′) = ∑
(
𝐾 = 𝑎′
+ 𝑏′
𝑖
) (
𝑁 − 𝐾
𝑛 − 𝑖
)
(
𝑁 = 𝑎′ + 𝑏′ + 𝑐′ + 𝑑′
𝑛 = 𝑎′ + 𝑐′
)
𝑎′
𝑖=0
The right-sided –value sums the probability for all the tables that have equal or larger a’.
p 𝑟𝑖𝑔ℎ𝑡 = P(𝑥 ≥ 𝑎′) = ∑
(
𝐾 = 𝑎′
+ 𝑏′
𝑖
) (
𝑁 − 𝐾
𝑛 − 𝑖
)
(
𝑁 = 𝑎′ + 𝑏′ + 𝑐′ + 𝑑′
𝑛 = 𝑎′ + 𝑐′
)
𝐾=𝑎′+𝑏′
𝑖=𝑎′
Most of the statistical packages output -as the one-sided test result- the minimum value of pleft and pright.
The Fisher two-tailed p-value for a table A is defined as the sum of probabilities for all tables consistent
with the marginals that are as likely as the current table.
McNemar's Test
This test is computed for 2X2 tables such as
𝐴 = (
𝑎 𝑏
𝑐 𝑑
)

6
𝜒2
=
(𝑏 − 𝑐)2
𝑏 + 𝑐
The statistic is asymptotically distributed like a chi-squared distribution with 1 degree of freedom.
Edwards Continuity Correction
𝜒2
=
(𝑚𝑎𝑥{0, |𝑏 − 𝑐| − 1})2
𝑏 + 𝑐
When |𝑏 − 𝑐| − 1 is below zero, the statistic is zero.
The statistic is asymptotically distributed like a chi-squared distribution with 1 degree of freedom.
McNemar Exact Binomial
Assuming that b < c. Let be n = b + c, and B(x, n, p) the binomial distribution
Two − sided p − value = 2 ∗ (one − sided p − value) = 2 ∗ ∑ 𝐵(𝑥, 𝑛, 0.5)
𝑏
𝑥=0
= 2 ∗ ∑ (
𝑛
𝑥
) ∗ 0.5 𝑥
∗ 0.5 𝑛−𝑥
𝑏
𝑥=0
= 2 ∗
1
2 𝑛
∗ ∑ (
𝑛
𝑥
)
𝑏
𝑥=0
If b = c, the exact p-value equals 1.0.
Mid-P McNemar Test
Let be n = b + c. Assuming that b < c.
Mid − P value = 2 ∗ ∑ 𝐵(𝑥, 𝑛, 0.5)
𝑏
𝑥=0
− 𝐵(𝑏, 𝑛, 0.5) = 2 ∗
1
2 𝑛
∗ ∑ (
𝑛
𝑥
) − (
𝑛
𝑏
) ∗
1
2 𝑛
𝑏
𝑥=0
If b = c, the mid p-value is 1.0 −
1
2
(
𝑛
𝑏
) ∗
1
2 𝑛
Bowker’s Test of Symmetry
This test is computed for m-by-m square matrix as:
𝐵𝑊 = ∑ ∑
(𝑛𝑖𝑗 − 𝑛𝑗𝑖)2
𝑛𝑖𝑗 + 𝑛𝑗𝑖
𝑖−1
𝑗=1
𝑚−1
𝑖=1
For large samples, BW has an asymptotic chi-square distribution with M*(M - 1)/2 – R degrees of
freedom under the null hypothesis of symmetry, where R is the number of off-diagonal cells with nij + nji
= 0.

7
Risk Test
Let be
Risk Factor Disease status
Cohort = Present Cohort = Absent
Present a b
Absent c d
Odds ratio
The odds ratio (Risk Factor = Present / Risk Factor = Absent) is computed as:
𝑂𝑅 =
𝑎
𝑏⁄
𝑐
𝑑⁄
The distribution of the log odds ratio is approximately normal with:
𝜒 ~ 𝑁(log(𝑂𝑅) , 𝜎2
)
The standard error for the log odds ratio is approximately
𝑆𝐸 = √
1
𝑎
+
1
𝑏
+
1
𝑐
+
1
𝑑
The 95% confidence interval for the odds ratio is computed as
[exp(log(𝑂𝑅) − 𝑧0.025 ∗ 𝑆𝐸) ; exp(log(𝑂𝑅) + 𝑧0.025 ∗ 𝑆𝐸)]
To test the hypothesis that the population odds ratio equals one, is computed the two-sided p-value as
𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 (2 − 𝑠𝑖𝑑𝑒𝑑) = 2 ∗ 𝑃(𝑧 ≤
−|log(𝑂𝑅)|
𝑆𝐸
)
Source: https://en.wikipedia.org/wiki/Odds_ratio
Relative Risk
The relative risk (for cohort Disease status = Present) is computed as
𝑅𝑅 =
𝑎
𝑎 + 𝑏⁄
𝑐
𝑐 + 𝑑⁄
The distribution of the log relative risk is approximately normal with:
𝜒 ~ 𝑁(log(𝑂𝑅) , 𝜎2
)

8
The standard error for the log relative risk is approximately
𝑆𝐸 = √
1
𝑎
+
1
𝑏
−
1
𝑎 + 𝑏
−
1
𝑐 + 𝑑
The 95% confidence interval for the relative risk is computed as
[exp(log(𝑅𝑅) − 𝑧0.025 ∗ 𝑆𝐸) ; exp(log(𝑅𝑅) + 𝑧0.025 ∗ 𝑆𝐸)]
To test the hypothesis that the population relative risk equals one, is computed the two-sided p-value as
𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 (2 − 𝑠𝑖𝑑𝑒𝑑) = 2 ∗ 𝑃(𝑧 ≤
−|log(𝑅𝑅)|
𝑆𝐸
)
The relative risk (for cohort Disease status = Absent) is computed as
𝑅𝑅 =
𝑏
𝑎 + 𝑏⁄
𝑑
𝑐 + 𝑑⁄
Epidemiology Risk
All the parameters are computed for cohort Disease status = Present.
Attributable risk, represents how much the risk factor increase/decrease the risk of disease
𝐴𝑅 =
𝑎
𝑎 + 𝑏
−
𝑐
𝑐 + 𝑑
If AR > 0 there an increase of the risk. If AR < 0 there is a reduction of the risk.
Relative Attributable Risk
𝑅𝑅 =
𝑎
𝑎 + 𝑏
−
𝑐
𝑐 + 𝑑
𝑐
𝑐 + 𝑑
=
𝐴𝑅
𝑐
𝑐 + 𝑑
Number Needed to Harm
𝑁𝑁𝐻 =
1
𝑎
𝑎 + 𝑏
−
𝑐
𝑐 + 𝑑
=
1
𝐴𝑅
The number needed to harm (NNH) is an epidemiological measure that indicates how many patients on
average need to be exposed to a risk-factor over a specific period to cause harm in an average of one
patient who would not otherwise have been harmed.
A negative number would not be presented as a NNH, rather, as the risk factor is not harmful, it is
expressed as a number needed to treat (NNT) or number needed to avoid to expose to risk.

9
Attributable risk per unit
𝐴𝑅𝑃 =
𝑅𝑅 − 1
𝑅𝑅
Preventive fraction
𝑃𝐹 = 1 − 𝑅𝑅
Etiologic fraction is the proportion of cases in which the exposure has played a causal role in disease
development.
𝐸𝐹 =
𝑎 − 𝑐
𝑎
A similar parameters are computed for cohort Disease status = Absent.
Source: https://en.wikipedia.org/wiki/Relative_risk
Cohen's Kappa Test
Given a k-by-k square matrix, which collect the scores of two raters who each classify N items into k
mutually exclusive categories, the equation for Cohen's kappa coefficient is
𝑘̂ =
𝑝 𝑜 − 𝑝 𝑒
1 − 𝑝 𝑒
Where
𝑝 𝑜 = ∑
𝑛𝑖𝑖
𝑁
= ∑ 𝑝𝑖𝑖
𝑘
𝑖=1
𝑘
𝑖=1
𝑎𝑛𝑑 𝑝𝑒 = ∑ 𝑝𝑖.
𝑝.𝑖
𝑘
𝑖=1
𝑤ℎ𝑒𝑟𝑒 𝑝𝑖𝑗 =
𝑛𝑖𝑗
𝑁
𝑎𝑛𝑑 𝑝𝑖. = ∑
𝑛𝑖𝑗
𝑁
𝑘
𝑗=1
𝑎𝑛𝑑 𝑝.𝑗 = ∑
𝑛𝑖𝑗
𝑁
𝑘
𝑖=1
The asymptotic variance is computed by
𝑣𝑎𝑟(𝑘̂) =
1
𝑁(1 − 𝑝𝑒)4
{ ∑ 𝑝𝑖𝑖[(1 − 𝑝𝑒) − (𝑝.𝑖 + 𝑝𝑖.)(1 − 𝑝 𝑜)]2
𝑘
𝑖=1
+ (1 − 𝑝0)2
∑ ∑ 𝑝𝑖𝑗(𝑝.𝑖 + 𝑝𝑗.)2
𝑘
𝑗=1,𝑗≠𝑖
− (𝑝 𝑜 𝑝𝑒 − 2𝑝𝑒 + 𝑝 𝑜)2
𝑘
𝑖=1
}

10
The formulae is given by Fleiss, Cohen, and Everitt (1969), and modified by Fleiss (1981). The
asymptotic standard error is the root square of the value given above. This standard error and the normal
distribution N(0,1) must be used to compute confidence intervals.
𝑘̂ ± 𝑧∝/2√ 𝑣𝑎𝑟(𝑘̂)
To compute an asymptotic test for the kappa coefficient, ISSTATS uses a standardized test statistic T
which has an asymptotic standard normal distribution under the null hypothesis that kappa equals zero
(H0: k = 0). The standardized test statistic is computed as
𝑇 =
𝑘̂
√ 𝑣𝑎𝑟0(𝑘̂)
≅ 𝑁(0,1)
Where the variance of the kappa coefficient under the null hypothesis is
𝑣𝑎𝑟0(𝑘̂) =
1
𝑁(1 − 𝑝𝑒)2
{ 𝑝𝑒 + 𝑝𝑒
2
− ∑ 𝑝.𝑖 𝑝𝑖.(𝑝.𝑖+ 𝑝𝑖.)
𝑘
𝑖=1
}
Refer to Fleiss (1981)
Source: https://v8doc.sas.com/sashtml/stat/chap28/sect26.htm
Nominal by Nominal Measures of Association
Contingency Coefficient
Cramer's V is a measure of association between two nominal variables, giving a value between 0 and +1
(inclusive).
𝐶 = √
𝜒2
𝜒2 + 𝑁
Where
 𝜒2
is the Pearson's cumulative test statistic.
 N is the total sample size.
C asymptotically approaches a 𝜒2
distribution with (r - 1)(c - 1) degrees of freedom.
Standardized Contingency Coefficient

11
If X and Y have the same number of categories (r = c), then the maximum value for the contingency
coefficient is calculated as:
𝑐 𝑚𝑎𝑥 = √
𝑟 − 1
𝑟
If X and Y have a differing number of categories (r ≠ c), then the maximum value for the contingency
coefficient is calculated as
𝑐 𝑚𝑎𝑥 = √
(𝑟 − 1)(𝑐 − 1)
𝑟 ∗ 𝑐
4
The standardized contingency coefficient is calculated as the ratio:
𝑐𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 =
𝐶
𝑐 𝑚𝑎𝑥
which varies between 0 and 1 with 0 indicating independence and 1 dependence.
Phi coefficient
The phi coefficient is a measure of association for two nominal variables.
𝛷 = √
𝜒2
𝑁
Where
 𝜒2
Phi asymptotically approaches a 𝜒2
Cramer's V
Cramer's V is a measure of association between two nominal variables, giving a value between 0 and +1
(inclusive).
𝑉 = √
𝜒2
𝑁
⁄
𝑚𝑖𝑛{𝑟 − 1, 𝑐 − 1}
Where
 𝜒2
V asymptotically approaches a χ2

12
Tschuprow's T
Tschuprow's T is a measure of association between two nominal variables, giving a value between 0 and
1 (inclusive).
𝑇 = √
𝜒2
𝑁
⁄
√(𝑟 − 1)(𝑐 − 1)
Lambda
Asymmetric lambda, λ(C/R) or column variable dependent, is interpreted as the probable improvement in
predicting the column variable Y given knowledge of the row variable X. The range of asymmetric
lambda is {0, 1}. Asymmetric lambda (C/R) or column variable dependent is computed as
𝜆(𝐶/𝑅) =
∑ 𝑟𝑖𝑖 − 𝑟
𝑁 − 𝑟
The asymptotic variance is
𝑣𝑎𝑟( 𝜆(𝐶/𝑅)) =
𝑁 − ∑ 𝑟𝑖𝑖
( 𝑟 − 𝑁)3
{ ∑ 𝑟𝑖
𝑖
+ 𝑟 − 2 ∑(𝑟𝑖|𝑙𝑖 = 𝑙)
𝑖
}
Where
𝑟𝑖 = max
𝑗
{𝑛𝑖𝑗} 𝑎𝑛𝑑 𝑟 = max
𝑗
{𝑟.𝑗} 𝑎𝑛𝑑 𝑐𝑗 = max
𝑖
{𝑛𝑖𝑗} 𝑎𝑛𝑑 𝑐 = max
𝑖
{𝑛𝑖.}
The values of li and l are determined as follows. Denote by li the unique value of j such that ri = nij, and
let l be the unique value of j such that r = n.j. Because of the uniqueness assumptions, ties in the
frequencies or in the marginal totals must be broken in an arbitrary but consistent manner. In case of ties,
l is defined as the smallest value of such that r = n.j.
For those columns containing a cell (i, j) for which nij = ri = cj, csj records the row in which cj is assumed
to occur. Initially is set equal to –1 for all j. Beginning with i = 1, if there is at least one value j such that
nij = ri = cj, and if csj = -1, then li is defined to be the smallest such value of j, and csj is set equal to i.
Otherwise, if nil = ri, then li is defined to be equal to l. If neither condition is true, then li is taken to be the
smallest value of j such that nij = ri.
The asymptotic standard error is the root square of the asymptotic variance.
The formulas for lambda asymmetric λ(R/C) can be obtained by interchanging the indices.

13
𝜆(𝑅/𝐶) =
∑ 𝑐𝑗𝑗 − 𝑐
𝑁 − 𝑐
The Symmetric lambda is the average of the two asymmetric lambdas, λ(C/R) and λ(R/C). Its range is {-
1, 1}. Lambda symmetric is computed as
𝜆 =
∑ 𝑟𝑖𝑖 + ∑ 𝑐𝑗𝑗 − 𝑟 − 𝑐
2𝑁 − 𝑟 − 𝑐
𝑣𝑎𝑟( 𝜆) =
1
𝑤4
{ 𝑤𝑣𝑦 − 2𝑤2
[𝑁 − ∑ ∑(𝑛𝑖𝑗|𝑗 = 𝑙𝑖, 𝑖 = 𝑘𝑗)
𝑗𝑖
] − 2𝑣2
(𝑁 − 𝑛 𝑘𝑙)}
Where
𝑤 = 2𝑛 − 𝑟 − 𝑐 𝑎𝑛𝑑 𝑣 = 2𝑛 − ∑ 𝑟𝑖
𝑖
− ∑ 𝑐𝑗
𝑗
𝑎𝑛𝑑 𝑥
= ∑(𝑟𝑖
| 𝑙𝑖 = 𝑙)
𝑖
+ ∑(𝑐𝑗
| 𝑘𝑗 = 𝑘)
𝑗
+ 𝑟𝑘 + 𝑐𝑙 𝑎𝑛𝑑 𝑦 = 8𝑁 − 𝑤 − 𝑣 − 2𝑥
The definitions of l and li are given in the previous section. The values k and kj are defined in a similar
way for lambda asymmetric (R/C).
Uncertainty Coefficient
The uncertainty coefficient U (C/R) -or column variable dependent U- measures the proportion of
uncertainty (entropy) in the column variable Y that is explained by the row variable X. Its range is {0, 1}.
The uncertainty coefficient is computed as
𝑈(𝐶/𝑅) = 𝑈 𝑐𝑜𝑙𝑢𝑚𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 =
𝑣
𝑤
=
H(X) + H(Y) − H(XY)
H(Y)
Where
𝐻(𝑋) = − ∑
𝑛𝑖.
𝑛
ln (
𝑛𝑖.
𝑛
)
𝑖
𝑎𝑛𝑑 𝐻(𝑌) = − ∑
𝑛.𝑗
𝑛
ln (
𝑛.𝑗
𝑛
)
𝑖
𝑎𝑛𝑑 𝐻(𝑋𝑌)
= − ∑ ∑
𝑛𝑖𝑗
𝑛
ln (
𝑛𝑖𝑗
𝑛
)
𝑗𝑖
𝑣𝑎𝑟(𝑈(𝐶/𝑅)) =
1
𝑛2 𝑤4
∑ ∑ 𝑛𝑖𝑗 {𝐻(𝑌) ln (
𝑛𝑖𝑗
𝑛𝑖.
) + (H(X) − H(XY)) ln (
𝑛.𝑗
𝑛
)}
2
𝑗𝑖

14
The formulas for the uncertainty coefficient U (C/R) can be obtained by interchanging the indices.
The symmetric uncertainty coefficient is computed as
𝑈 =
2 ∗ [H(X) + H(Y) − H(XY)]
H(X) + H(Y)
𝑣𝑎𝑟(𝑈) = 4 ∑ ∑
𝑛𝑖𝑗 {𝐻(𝑋𝑌) ln (
𝑛𝑖. 𝑛.𝑗
𝑛2 ) − (H(X) − H(Y)) ln (
𝑛.𝑗
𝑛 )}
2
𝑛2(H(X) + H(Y))4
𝑗𝑖
Ordinal by Ordinal Measures of Association
Let nij denote the observed frequency in cell (i, j) in a IxJ contingency table. Let be N the total frequency
and
𝐴𝑖𝑗 = ∑ ∑ 𝑛 𝑘𝑙
𝑙<𝑗𝑘<𝑖
+ ∑ ∑ 𝑛 𝑘𝑙
𝑙>𝑗𝑘>𝑖
𝐷𝑖𝑗 = ∑ ∑ 𝑛 𝑘𝑙
𝑙<𝑗𝑘>𝑖
+ ∑ ∑ 𝑛 𝑘𝑙
𝑙>𝑗𝑘<𝑖
𝑃 = ∑ ∑ 𝑎𝑖𝑗 𝐴𝑖𝑗
𝑗𝑖
𝑎𝑛𝑑 𝑄 = ∑ ∑ 𝑎𝑖𝑗 𝐷𝑖𝑗
𝑗𝑖
Gamma Coefficient
The gamma (G) statistic is based only on the number of concordant and discordant pairs of observations.
It ignores tied pairs (that is, pairs of observations that have equal values of X or equal values of Y).
Gamma is appropriate only when both variables lie on an ordinal scale. The range of gamma is {-1, 1}. If
the row and column variables are independent, then gamma tends to be close to zero.
Gamma is estimated by
𝐺 =
𝑃 − 𝑄
𝑃 + 𝑄

15
𝑣𝑎𝑟(𝐺) =
16
( 𝑃 + 𝑄)2
{ ∑ ∑ 𝑛𝑖𝑗 ∗ (𝑄𝐴𝑖𝑗 − 𝑃𝐷𝑖𝑗)2
𝐽
𝑗=1
𝐼
𝑖=1
}
The variance under the null hypothesis that gamma equals zero is computed as
𝑣𝑎𝑟0(𝐺) =
4
( 𝑃 + 𝑄)2
{ ∑ ∑ 𝑛𝑖𝑗 ∗ 𝑑𝑖𝑗
2
𝐽
𝑗=1
−
(𝑃 − 𝑄)2
𝑁
𝐼
𝑖=1
}
Where dij = Aij - Dij
The asymptotic standard error under the null hypothesis that d equals zero is the root square of the
variance.
Kendall's tau-b
Kendall’s tau-b is similar to gamma except that tau-b uses a correction for ties. Tau-b is appropriate only
when both variables lie on an ordinal scale. The range of tau-b is {-1, 1}. Kendall’s tau-b is estimated by
𝜏 𝑏 =
𝑃 − 𝑄
𝑤
Where
𝑤𝑟 = 𝑛2
− ∑ 𝑛𝑖.
2
𝑖
𝑎𝑛𝑑 𝑤𝑐 = 𝑛2
− ∑ 𝑛.𝑗
2
𝑖
𝑎𝑛𝑑 𝑤 = √ 𝑤𝑟 𝑤𝑐
𝑣𝑎𝑟( 𝜏 𝑏) =
1
𝑤4
{ ∑ ∑ 𝑛𝑖𝑗(2𝑤𝑑𝑖𝑗 + 𝜏 𝑏 𝑣𝑖𝑗)2
𝐽
𝑗=1
𝐼
𝑖=1
− 𝑁3
𝜏 𝑏
2
( 𝑤 𝑟 + 𝑤 𝑐)2
}
where
𝑣𝑖𝑗 = 𝑤 𝑐 𝑛𝑖. + 𝑤 𝑟 𝑛.𝑗
The variance under the null hypothesis that tau-b equals zero is computed as

16
𝑣𝑎𝑟0( 𝜏 𝑏) =
4
𝑤 𝑟 𝑤 𝑐
2
𝐽
𝑗=1
−
(𝑃 − 𝑄)2
𝑁
𝐼
𝑖=1
}
variance.
Stuart-Kendall's tau-c
Stuart-Kendall’s tau-c makes an adjustment for table size in addition to a correction for ties. Tau-c is
appropriate only when both variables lie on an ordinal scale. The range of tau-c is {-1, 1}. Stuart-
Kendall’s tau-c is estimated by
𝜏 𝑐 =
𝑚(𝑃 − 𝑄)
𝑁2(𝑚 − 1)
Where m =min {I, J}.
𝑣𝑎𝑟( 𝜏 𝑐) =
4𝑚2
𝑁4
(𝑚 − 1)2 { ∑ ∑ 𝑛𝑖𝑗 ∗ 𝑑𝑖𝑗
2
𝐽
𝑗=1
−
(𝑃 − 𝑄)2
𝑁
𝐼
𝑖=1
}
The variance under the null hypothesis that tau-c equals zero is the same as the asymptotic variance.
Sommers’ D
Somers’ D(C/R) and Somers’ D(R/C) are asymmetric modifications of tau-b. C/R indicates that the row
variable X is regarded as the independent variable and the column variable Y is regarded as dependent.
Similarly, R/C indicates that the column variable Y is regarded as the independent variable and the row
variable X is regarded as dependent. Somers’ D differs from tau-b in that it uses a correction only for
pairs that are tied on the independent variable. Somers’ D is appropriate only when both variables lie on
an ordinal scale. The range of Somers’ D is {-1, 1}. Somers’ D is computed as
𝐷(𝐶/𝑅) = 𝐷 𝑐𝑜𝑙𝑢𝑚𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 =
𝑃 − 𝑄
𝑤𝑟

17
𝑣𝑎𝑟( 𝐷(𝐶/𝑅)) =
4
𝑤 𝑟
4
{ ∑ ∑ 𝑛𝑖𝑗[𝑤𝑟 𝑑𝑖𝑗 − (𝑃 − 𝑄)(𝑁 − 𝑛𝑖.)]2
𝐽
𝑗=1
𝐼
𝑖=1
}
The variance under the null hypothesis that D(C/R) equals zero is computed as
𝑣𝑎𝑟0( 𝐷(𝐶/𝑅)) =
4
𝑤 𝑟
2
2
𝐽
𝑗=1
−
(𝑃 − 𝑄)2
𝑁
𝐼
𝑖=1
}
The asymptotic standard error under the null hypothesis that D(C/R) equals zero is the root square of the
variance.
Formulas for Somers’ D(R/C) are obtained by interchanging the indices.
Symmetric version of Somers’ d is
𝑑 =
𝑃 − 𝑄
𝑤𝑟 + 𝑤𝑐
2
The standard error is
𝐴𝑆𝐸(𝑑) =
2𝜎 𝜏𝑏 𝑤
𝑤 𝑟 + 𝑤 𝑐
where στb is the asymptotic standard error of Kendall’s tau-b.
The variance under the null hypothesis that d equals zero is computed as
𝑣𝑎𝑟0(𝑑) =
16
( 𝑤 𝑟 + 𝑤 𝑐)2
2
𝐽
𝑗=1
−
(𝑃 − 𝑄)2
𝑁
𝐼
𝑖=1
}
variance.
Confidence Bounds and One-Sided Tests
Suppose you are testing the null hypothesis H0:  ≥ 0 against the one-sided alternative H1:  < 0. Rather
than give a two-sided confidence interval for , the more appropriate procedure is to give an upper
confidence bound in this setting. This upper confidence bound has a direct relationship to the one-sided
test, namely:

18
1. A level  test of H0:  ≥ 0 against the one-sided alternative H1:  < 0 rejects H0 exactly when
the value 0 is above the 1–α upper confidence bound.
2. A level  test of H0:  ≤ 0 against the one-sided alternative H1:  > 0 rejects H0 exactly when
the value 0 is above the 1–α lower confidence bound.
ANOVA Test
𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = ∑ ∑(𝑦𝑖𝑗 − 𝑌..
̅)2
𝑁𝑖
𝑗=1
𝑘
𝑖=1
𝑆𝑆𝐼𝑛𝑡𝑒𝑟 = ∑ 𝑛𝑖(𝑌̅𝑖. − 𝑌..
̅)2
𝑘
𝑖=1
𝑆𝑆𝐼𝑛𝑡𝑟𝑎 = ∑ ∑(𝑦𝑖𝑗 − 𝑌𝑖.
̅ )2
𝑛 𝑖
𝑗=1
𝑘
𝑖=1
= 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 − 𝑆𝑆𝐼𝑛𝑡𝑒𝑟
DF Total = N – 1
DF Inter = k – 1
DF Intra = N – k
𝑀𝑆𝑇𝑜𝑡𝑎𝑙 =
SSTotal
DFTotal
𝑀𝑆𝐼𝑛𝑡𝑒𝑟 =
SSInter
DFInter
𝑀𝑆𝐼𝑛𝑡𝑟𝑎 =
SSIntra
DFIntra
𝐹 =
MSInter
MSIntra
where
 F is the result of the test
 k is the number of different groups to which the sampled cases belong
 𝑁 = ∑ 𝑛𝑖
𝑘
𝑖=1 is the total sample size
 ni is the number of cases in the i-th group
 yij is the value of the measured variable for the j-th case from the i-th group
 𝑌̅.. is the mean of all yij
 𝑌̅𝑖. is the mean of the yij for group i.
The test statistic has a F-distribution with DF Inter and DF Intra degrees of freedom. Thus the null
hypothesis is rejected if 𝐹 ≥ 𝐹(1 − 𝛼) 𝑁−𝑘
𝑘−1

19
ANOVA Multiple Comparisons
Difference of Means
𝑦̅𝑖 − 𝑦̅𝑗
Standard Error of the Difference of Means Estimator
𝑆𝑡𝑑. 𝐸𝑟𝑟𝑜𝑟 = √𝑀𝑆𝐼𝑛𝑡𝑟𝑎 ∗ (
1
𝑛𝑖
+
1
𝑛𝑗
)
Scheffe’s Method
Confidence Interval for Difference of Means
𝐶𝐼 (1 − 𝛼) = 𝑦̅𝑖 − 𝑦̅𝑗 ± √𝐷𝐹𝐼𝑛𝑡𝑒𝑟 ∗ 𝑀𝑆𝐼𝑛𝑡𝑟𝑎 ∗ 𝐹(1 − 𝛼) 𝐷𝐹 𝐼𝑛𝑡𝑟𝑎
𝐷𝐹 𝐼𝑛𝑡𝑒𝑟
∗ (
1
𝑛𝑖
+
1
𝑛𝑗
)
Source: http://en.wikipedia.org/wiki/Scheff%C3%A9%27s_method
Tukey's range test HSD
𝐶𝐼 (1 − 𝛼) = 𝑦̅𝑖 − 𝑦̅𝑗 ± 𝑞(1 − 𝛼) 𝐷𝐹 𝐼𝑛𝑡𝑟𝑎
𝑘
√
𝑀𝑆𝐼𝑛𝑡𝑟𝑎
2
∗ (
1
𝑛𝑖
+
1
𝑛𝑗
)
Where q is the studentized range distribution.
Source: https://en.wikipedia.org/wiki/Tukey%27s_range_test
Fisher's Method LSD
If overall ANOVA test is not significant, you must not consider any results of Fisher test, significant or
not.
𝐶𝐼 (1 − 𝛼) = 𝑦̅𝑖 − 𝑦̅𝑗 ± 𝑡(1 − 𝛼
2⁄ )
𝐷𝐹 𝐼𝑛𝑡𝑟𝑎
√𝑀𝑆𝐼𝑛𝑡𝑟𝑎 ∗ (
1
𝑛𝑖
+
1
𝑛𝑗
)
Where t is the student distribution.
Bonferroni's Method
The family-wise significance level (FWER) is α = 1 - Confidence Level. Thus any comparison flagged by
ISSTATS as significant is based on a Bonferroni Correction:

20
𝛼′ =
2𝛼
𝑘(𝑘 − 1)
𝑝′ = 𝑝
𝑘(𝑘 − 1)
2
Where k is the number of groups.
𝐶𝐼 (1 − 𝛼) = 𝑦̅𝑖 − 𝑦̅𝑗 ± 𝑡 (1 − 𝛼′
2⁄ )
1
𝑛𝑖
+
1
𝑛𝑗
)
Sidak's Method
The family-wise significance level (FWER) is α = 1 - Confidence Level. So any comparison flagged by
ISSTATS as significant is based on a Sidak Correction:
𝛼′ = (1 − 𝛼)
2
𝑘(𝑘−1)
𝑝′
= 1 − 𝑒
log(1−𝑝)𝑘(𝑘−1)
2
Where k is the number of groups.
𝐶𝐼 (1 − 𝛼) = 𝑦̅𝑖 − 𝑦̅𝑗 ± 𝑡 (1 − 𝛼′
2⁄ )
1
𝑛𝑖
+
1
𝑛𝑗
)
Welch’s Test for equality of means
The test statistic, F*
, is defined as follows:
𝐹∗
=
∑ 𝑤𝑖(𝑥̅𝑖 − 𝑋̃)2𝑘
𝑖=1
𝑘 − 1
1 +
2(𝑘 − 2)
𝑘2 − 1
∗ ∑ ℎ𝑖
𝑘
𝑖=1
where
 F*
is the result of the test
 𝑤𝑖 =
𝑛 𝑖
𝑆𝑖
2

21
 𝑊 = ∑ 𝑤𝑖 = ∑
𝑛 𝑖
𝑆𝑖
2
𝑘
𝑖=1
𝑘
𝑖=1
 𝑋̃ =
∑ 𝑤 𝑖 𝑥̅ 𝑖
𝑘
𝑖=1
𝑊
 ℎ𝑖 =
(1−
𝑤 𝑖
𝑊
)
2
𝑛 𝑖−1
The test statistic has approximately a F-distribution with k-1 and 𝑑𝑓 =
𝑘2−1
3∗∑ ℎ 𝑖
𝑘
𝑖=1
degrees of freedom. Thus
the null hypothesis is rejected if 𝐹∗
≥ 𝐹(1 − 𝛼) 𝑑𝑓
𝑘−1
Brown–Forsythe Test for equality of means
The test statistic, F*
, is defined as follows:
𝐹∗
=
∑ 𝑛𝑖(𝑥̅𝑖 − 𝑋̅..)2𝑘
𝑖=1
∑ (1 −
𝑛𝑖
𝑁) 𝑆𝑖
2𝑘
𝑖=1
where
 F*
is the result of the test
 ni is the number of cases in the i-th group (sample size of group i)
 𝑁 = ∑ 𝑛𝑖
𝑘
 𝑋̅.. =
∑ 𝑛 𝑖 𝑥̅ 𝑖
𝑘
𝑖=1
𝑁
is the overall mean.
The test statistic has approximately a F-distribution with k-1 and df degrees of freedom. Where df is
obtained with the Satterthwaite (1941) approximation as
1
df
= ∑
ci
2
ni − 1
k
i=1
with
𝑐𝑗 =
(1 −
𝑛𝑗
𝑁) 𝑆𝑗
2
∑ (1 −
𝑛𝑖
𝑁) 𝑆𝑖
2𝑘
𝑖=1
Thus the null hypothesis is rejected if 𝐹∗
≥ 𝐹(1 − 𝛼) 𝑑𝑓
𝑘−1
Homoscedasticity Tests
Levene's Test
The test statistic, F, is defined as follows:
𝐹 =
𝑁 − 𝑘
𝑘 − 1
∗
∑ 𝑛𝑖(𝑍̅𝑖. − 𝑍̅..)2𝑘
𝑖=1
∑ ∑ (𝑍𝑖𝑗 − 𝑍̅𝑖.)2𝑛 𝑖
𝑗=1
𝑘
𝑖=1

22
where
 𝑁 = ∑ 𝑛𝑖
𝑘
 Yij is the value of the measured variable for the j-th case from the i-th group
 𝑍𝑖𝑗 = |𝑌𝑖𝑗 − 𝑌̅𝑖.| where 𝑌̅𝑖. is a mean of i-th group
 𝑍̅.. is the mean of all Zij
 𝑍̅𝑖. is the mean of the Zij for group i.
The test statistic has a F-distribution with k-1 and N-k degrees of freedom. Thus the null hypothesis is
rejected if 𝐹 ≥ 𝐹(1 − 𝛼) 𝑁−𝑘
𝑘−1
Source: http://en.wikipedia.org/wiki/Levene%27s_test
Brown–Forsythe Test for equality of variances
The test statistic, F, is defined as follows:
𝐹 =
𝑁 − 𝑘
𝑘 − 1
∗
∑ 𝑛𝑖(𝑍̅𝑖. − 𝑍̅..)2𝑘
𝑖=1
∑ ∑ (𝑍𝑖𝑗 − 𝑍̅𝑖.)2𝑛𝑖
𝑗=1
𝑘
𝑖=1
where
 𝑁 = ∑ 𝑛𝑖
𝑘
 Yij is the value of the measured variable for the j-th case from the i-th group
 𝑍𝑖𝑗 = |𝑌𝑖𝑗 − 𝑌̃𝑖.| where 𝑌̃𝑖. is a median of i-th group
 𝑍̅.. is the mean of all Zij
 𝑍̅𝑖. is the mean of the Zij for group i.
The test statistic has a F-distribution with k-1 and N-k degrees of freedom. Thus the null hypothesis is
rejected if 𝐹 ≥ 𝐹(1 − 𝛼) 𝑁−𝑘
𝑘−1
Source: http://en.wikipedia.org/wiki/Levene%27s_test
Bartlett's Test
Bartlett's test is used to test the null hypothesis, H0 that all k population variances are equal against the
alternative that at least two are different.
If there are k samples with size ni and sample variances S2
i then Bartlett's test statistic is
𝜒2
=
(𝑁 − 𝑘)𝑙𝑛(𝑆 𝑝
2
) − ∑ (𝑛𝑖 − 1)𝑙𝑛(𝑆𝑖
2
)𝑘
𝑖=1
1 +
1
3(𝑘 − 1)
∗ (∑ (
1
𝑛𝑖 − 1)𝑘
𝑖=1 −
1
𝑁 − 𝑘
)
where

23
 𝑁 = ∑ 𝑛𝑖
𝑘
 𝑆 𝑝
2
=
∑ (𝑛 𝑖−1)𝑆𝑖
2𝑘
𝑖=1
𝑁−𝑘
is the pooled estimate for the variance
The test statistic has approximately a chi-squared distribution with k-1 degrees of freedom. Thus the null
hypothesis is rejected if 𝜒2
≥ 𝜒 𝑘−1
2
(1 − 𝛼).
Source: http://en.wikipedia.org/wiki/Bartlett%27s_test
Bivariate Correlation Tests
Sample Covariance
Sxy =
∑ (xi − x̅)(yi − y̅)N
i=1
N − 1
Where 𝑁 = ∑ 𝑛𝑖
𝑘
𝑖=1 is the total sample size.
Source: http://en.wikipedia.org/wiki/Covariance#Calculating_the_sample_covariance
Sample Pearson Product-Moment Correlation Coefficient
r =
1
N − 1
∗
∑ (𝑥𝑖 − 𝑥̅)(𝑦𝑖 − 𝑦̅)𝑁
𝑖=1
𝑆 𝑥 𝑆 𝑦
=
𝑆 𝑥𝑦
𝑆 𝑥 𝑆 𝑦
where Sx and Sy are the sample standard deviation of the paired sample (xi, yi), Sxy is the sample
covariance and N is the total sample size.
Source: http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#For_a_sample
Test for the Significance of the Pearson Product-Moment Correlation Coefficient
Test hypothesis are:
 H0: the sample values come from a population in which ρ=0
 H1: the sample values come from a population in which ρ≠0
Test statistic is
t =
r ∗ √N − 2
√1 − r2
where
 𝑁 = ∑ 𝑛𝑖
𝑘
 r is the Sample Pearson Product-Moment Correlation Coefficient

24
The test statistic has a t-student distribution with N-2 degrees of freedom.
Spearman Correlation Coefficient
For each of the variables X and Y separately, the observations are sorted into ascending order and
replaced by their ranks. Identical values (rank ties or value duplicates) are assigned a rank equal to the
average of their positions in the ascending order of the values. Each time t observations are tied (t>1), the
quantity t3
−t is calculated and summed separately for each variable. These sums will be designated STx
and STy.
For each of the N observations, the difference between the rank of X and rank of Y is computed as:
di = Rank(Xi) − Rank(Yi)
If there are no ties in both samples, Spearman’s rho (ρ) is calculated as
ρ = 1 −
6 ∑ 𝑑𝑖
N(𝑁2 − 1)
If there are any ties in any of the samples, Spearman’s rho (ρ) is calculated as (Siegel, 1956):
ρ =
𝑇𝑥 + 𝑇𝑦 − ∑ di
2√ 𝑇𝑥 ∗ 𝑇𝑦
where
𝑇𝑥 =
N(𝑁2
− 1) − 𝑆𝑇𝑥
12
𝑇𝑦 =
N(𝑁2
− 1) − 𝑆𝑇𝑦
12
If Tx or Ty is 0, the statistic is not computed.
Source:
http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2F
alg_nonpar_corr_spearman.htm
Test for the Significance of the Spearman’s Correlation Coefficient
Test hypothesis are:
 H0: the sample values come from a population in which ρ=0
 H1: the sample values come from a population in which ρ≠0
Test statistic is
t =
ρ ∗ √N − 2
√1 − ρ2
The test statistic has a t-student distribution with N-2 degrees of freedom.

25
Kendall's Tau-b Correlation Coefficient
For each of the variables X and Y separately, the observations are sorted into ascending order and
replaced by their ranks. In situations where t observations are tied, the average rank is assigned.
Each time t > 1, the following quantities are computed and summed over all groups of ties for each
variable separately.
T1 = ∑ 𝑡2
− 𝑡
T2 = ∑(𝑡2
− 𝑡)(𝑡 − 2)
T3 = ∑(𝑡2
− 𝑡)(2𝑡 + 5)
Each of the N cases is compared to the others to determine with how many cases its ranking of X and Y is
concordant or discordant. The following procedure is used. For each distinct pair of cases (i, j), where i <
j the quantity
dij=[Rank(Xj)−Rank(Xi)][Rank(Yj)−Rank(Yi)]
is computed. If the sign of this product is positive, the pair of observations (i, j) is concordant. If the sign
is negative, the pair is discordant. The number of concordant pairs minus the number of discordant pairs
is
S = ∑ ∑ 𝑠𝑖𝑔𝑛(𝑑𝑖𝑗)
𝑁
𝑗=𝑖+1
𝑁−1
𝑖=1
where sign(dij) is defined as +1 or –1 depending on the sign of dij. Pairs in which dij=0 are ignored in the
computation of S.
If there are no ties in both samples, Kendall’s tau (τ) is computed as
τ =
2S
N2 − N
If there are any ties in any of the samples, Kendall’s tau (τ) is computed as
τ =
2S
√N2 − N − 𝑇1 𝑥√N2 − N − 𝑇1 𝑦
If the denominator is 0, the statistic is not computed.
Source: http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient#Tau-b
Test for the Significance of the Kendall's Tau-b Correlation Coefficient
The variance of S is estimated by (Kendall, 1955):

26
Var =
(N2
− N)(2N + 5) − T3x − T3y
18
+
T2x ∗ T2y
9(N2 − N)(N − 2)
+
T1x ∗ T1y
2(N2 − N)
The significance level is obtained using
Z =
S
√Var
Which, under the null hypothesis, is approximately distributed as a standard normal when the variables
are statistically independent.
Sources: http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient#Significance_tests
http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2F
alg_nonpar_corr_kendalls.htm
Parametric Value at Risk
Value at Risk of a single asset
Given the time series of daily return rates for an asset, the daily mean of the return rates is μ, the daily
variance of the daily return rates is σ2
. Given the position, hold or investment in the asset P.
One-day Expected Return is:
ER = Pμ
The Standard Deviation or Volatility is the square root of the Variance:
𝜎 = √ 𝜎2
One-day Value at Risk is:
𝑉𝑎𝑅1−𝛼 = −(μ + 𝑧 𝛼 𝜎)P
where zα is the left-tail α quantile of the normal standard distribution.
Total Value at Risk for n trading days is:
𝑉𝑎𝑅1−𝛼
𝑛 𝑑𝑎𝑦𝑠
= 𝑉𝑎𝑅1−𝛼 ∗ √ 𝑛 = −(μ + 𝑧 𝛼 𝜎)P√ 𝑛
Portfolio Value at Risk
Given the time series of daily return rates on different assets, the daily mean of the return rates for the i-th
asset is μi, the daily variance of the return rate for the i-th asset is σi
2
, the daily standard deviation (or
volatility) of the return rates for the i-th asset is σi. The covariance of the daily return rates of i-th and j-th
assets is σij. All parameters are unbiased estimates. Given the holds, positions or investments on each of
these assets: Pi
Total positions is

27
P = ∑ 𝑃𝑖
𝑁
𝑖=1
The weighting of each position is
𝑤𝑖 =
𝑃𝑖
𝑃
The weighted mean of the portfolio is
μ 𝑃 = ∑ 𝑤𝑖 𝜇𝑖 =
𝑁
𝑖=1
1
𝑃
∑ 𝑃𝑖 𝜇𝑖
𝑁
𝑖=1
One-day Expected Return of the portfolio is the weighted mean of the portfolio multiplied by the total
position
ER = Pμ 𝑃 = P ∑ 𝑤𝑖 𝜇𝑖 =
𝑁
𝑖=1
∑ 𝑃𝑖 𝜇𝑖
𝑁
𝑖=1
The Portfolio Variance is
𝜎 𝑃
2
= [𝑤1 … 𝑤𝑖 … 𝑤 𝑛] [
𝜎1
2
⋯ 𝜎1𝑛
⋮ ⋱ ⋮
𝜎 𝑛1 ⋯ 𝜎 𝑛
2
]
[
𝑤1
⋮
𝑤𝑖
⋮
𝑤 𝑛]
= 𝑊 𝑇
𝑀𝑊
where W is the vector of weights and M is the covariance matrix. The item i-th in the diagonal of M is the
daily variance of the return rates for the i-th asset. The items outside the diagonal are covariances.
Portfolio Variance also can be computed as:
𝜎 𝑃
2
=
1
𝑃2
∗ [𝑃1 … 𝑃𝑖 … 𝑃𝑛] [
𝜎1
2
⋯ 𝜎1𝑛
⋮ ⋱ ⋮
2
]
[
𝑃1
⋮
𝑃𝑖
⋮
𝑃𝑛]
=
1
𝑃2
∗ 𝑋 𝑇
𝑀𝑋
where X is the vector of positions.
The Portfolio Standard Deviation or Portfolio Volatility is the square root of the Portfolio Variance:
𝜎 𝑃 = √𝜎 𝑃
2
One-day Value at Risk is:
𝑉𝑎𝑅1−𝛼 = −(μ 𝑃 + 𝑧 𝛼 𝜎 𝑃)P

28
Where zα is the left-tail α quantile of the normal standard distribution.
Total Value at Risk for n trading days is:
= 𝑉𝑎𝑅1−𝛼 ∗ √ 𝑛 = −(μ 𝑃 + 𝑧 𝛼 𝜎 𝑃)P√ 𝑛
is the minimum potential loss that a portfolio can suffer in the α% worst cases in n days.
About the Signs: A positive value of VaR is an expected loss. A negative VaR would imply the portfolio
has a high probability of making a profit.
Source: http://www.jpmorgan.com/tss/General/Risk_Management/1159360877242
Remark: Some texts about VaR express the covariance as σij = σiσjρij where ρij is the correlation
coefficient.
Remark: Sometimes VaR is assumed to be the Portfolio Volatility multiplied by the position as expected
return is supposed to be approximately zero. ISSTATS does NOT consider VaR as Portfolio Volatility
and do NOT suppose expected return is zero.
Marginal Value at Risk
Marginal Value at Risk is the change in portfolio VaR resulting from a marginal change in the currency
(dollar, euro…) position in component i:
𝑀𝑉𝑎𝑅𝑖 =
𝜕𝑉𝑎𝑅
𝜕𝑃𝑖
Assuming the linearity of the risk in the parametric approach, the vector of Marginal Value at Risk is
[
𝑀𝑉𝑎𝑅1
⋮
𝑀𝑉𝑎𝑅𝑖
⋮
𝑀𝑉𝑎𝑅 𝑛]
= −
([
𝜇1
⋮
𝜇𝑖
⋮
𝜇 𝑛]
+
𝑧 𝛼
𝜎 𝑃
∗ [
𝜎1
2
⋯ 𝜎1𝑛
⋮ ⋱ ⋮
2
]
[
𝑤1
⋮
𝑤𝑖
⋮
𝑤 𝑛])
[
𝑀𝑉𝑎𝑅1
⋮
⋮
𝑀𝑉𝑎𝑅 𝑛]
= −
([
𝜇1
⋮
𝜇𝑖
⋮
𝜇 𝑛]
+
𝑧 𝛼
𝑃 ∗ 𝜎 𝑃
∗ [
𝜎1
2
⋯ 𝜎1𝑛
⋮ ⋱ ⋮
2
]
[
𝑃1
⋮
𝑃𝑖
⋮
𝑃𝑛])
Total Marginal Value at Risk for n trading days is:
= 𝑀𝑉𝑎𝑅𝑖 ∗ √ 𝑛
Component Value at Risk
Component Value at Risk is a partition of the portfolio VaR that indicates the change of VaR if a given
component was deleted.

29
𝐶𝑉𝑎𝑅𝑖 =
𝜕𝑉𝑎𝑅
𝜕𝑃𝑖
𝑃𝑖 = 𝑀𝑉𝑎𝑅𝑖 ∗ 𝑃𝑖
Note that the sum of all component VaRs (CVaR) is the VaR for the entire portfolio:
𝑉𝑎𝑅 = ∑ 𝐶𝑉𝑎𝑅𝑖
𝑁
𝑖=1
= ∑
𝜕𝑉𝑎𝑅
𝜕𝑃𝑖
𝑁
𝑖=1
𝑃𝑖 = ∑ 𝑀𝑉𝑎𝑅𝑖
𝑁
𝑖=1
∗ 𝑃𝑖
Total Component Value at Risk for n trading days is:
𝐶𝑉𝑎𝑅𝑖
= 𝐶𝑉𝑎𝑅𝑖 ∗ √ 𝑛
Source: http://www.math.nus.edu.sg/~urops/Projects/valueatrisk.pdf
Incremental Value at Risk
Incremental VaR of a given position is the VaR of the portfolio with the given position minus the VaR of
the portfolio without the given position, which measures the change in VaR due to a new position on the
portfolio:
IVaR (a) = VaR (P) – VaR (P - a)
Source:
http://www.jpmorgan.com/tss/General/Portfolio_Management_With_Incremental_VaR/1259104336084
Conditional Value at Risk, Expected Shortfall, Expected Tail Loss or Average Value at Risk
𝐸𝑆1−𝛼
1 𝑑𝑎𝑦
is the expected value of the loss of the portfolio in the α% worst cases in one day.
Under Multivariate Normal Assumption, Expected Shortfall, also known as Expected Tail Loss (ETL),
Conditional Value-at-Risk (CVaR), Average Value at Risk (AVaR) and Worst Conditional Expectation,
is computed by
ES(−VaR) = −𝐸(𝑥|𝑥 < −𝑉𝑎𝑅) ∗ 𝑃 = −[𝜇 + 𝐸𝑆(𝑧 𝛼)𝜎] ∗ 𝑃 = −[𝜇 + 𝐸(𝑧|𝑧 < 𝑧 𝛼)𝜎] ∗ 𝑃
= − [𝜇 +
∫ 𝑡𝑒−
𝑡2
2
𝑧 𝛼
−∞
𝑑𝑡
𝛼
𝜎] ∗ 𝑃 = −(𝜇 −
𝑒−
𝑧 𝛼
2
2
𝛼√2𝜋
𝜎) ∗ 𝑃
where zα is the left-tail α quantile of the normal standard distribution.
About the Sign: Because VaR is given by ISSTATS with a negative sign, as J.P. Mogan recommend, we
take its original value to perform calculations (-VaR = μ + zασ). Once the ES is computed, it is given
multiplied by a negative sign. That is mean; a positive value of ES is an expected loss. On the other hand,
a negative value of ES would imply the portfolio has a high probability of making a profit even in the
worst cases.
Source: http://www.imes.boj.or.jp/english/publication/mes/2002/me20-1-3.pdf

30
Exponentially Weighted Moving Average (EWMA) Forecast
Given a series of k daily return rates {r1, …….., rk} computed as Continuously Compounded Return:
𝑟𝑖 = ln (
𝑠𝑖
𝑠𝑖−1
)
Where r1 corresponds to the earliest date in the series, and rk corresponds to the latest or most recent date.
Supposed k > 50, and assuming that the sample mean of daily returns is zero, the EWMA estimates the
one-day variance for a given sequence of k returns as:
𝜎2
= (1 − 𝜆) ∑ 𝜆𝑖
𝑟𝑘−𝑖
2
𝑘−1
𝑖=0
where 0 < λ< 1 is the decay factor.
The one-day volatility is:
𝜎 = √ 𝜎2
For horizons greater than one-day, the T-period (i.e., over T days) forecasts of the volatility is:
𝜎 𝑇 𝑑𝑎𝑦𝑠 = 𝜎√𝑇
For two return series, assuming that both averages are zero, the EWMA estimate of one-day covariance
for a given sequence of k returns is given by
𝑐𝑜𝑣1,2 = 𝜎1,2 = (1 − 𝜆) ∑ 𝜆𝑖
𝑟1,𝑘−𝑖 𝑟2,𝑘−𝑖
𝑘−1
𝑖=0
The corresponding one-day correlation forecast for the two returns is given by
𝜌1,2 =
𝑐𝑜𝑣1,2
𝜎1 𝜎2
=
𝜎1,2
𝜎1 𝜎2
For horizons greater than one-day, the T-period (i.e., over T days) forecasts of the covariance is:
𝑐𝑜𝑣1,2
𝑇 𝑑𝑎𝑦𝑠
= 𝜎1,2 𝑇
Source: http://pascal.iseg.utl.pt/~aafonso/eif/rm/TD4ePt_2.pdf
Value at Risk of a single asset, Portfolio Value at Risk, Marginal Value at Risk, Component
Value at Risk, Incremental Value at Risk, Incremental Value at Risk by EWMA method.
See methods and formulas at Parametric Value at Risk.

31
Linear Regression
Given n equations for a regression model, with p predictor variables. The i-th given equation is
yi = β0 + β1xi1 + β2xi2 + …+ βpxip
The n equations stacked together and written in vector form is
[
𝑦1
⋮
𝑦𝑖
⋮
𝑦𝑛]
= [
1 ⋯ 𝑥1𝑝
⋮ ⋱ ⋮
1 ⋯ 𝑥 𝑛𝑝
]
[
β0
⋮
β 𝑖
⋮
β 𝑝]
+
[
ԑ0
⋮
ԑ𝑖
⋮
ԑ 𝑛]
In matrix notation:
Y = Xβ + ԑ
X is here named the design matrix, of dimensions n-by-(p+1).
If constant is not included, the matrix are
[
𝑦1
⋮
𝑦𝑖
⋮
𝑦𝑛]
= [
𝑥11 ⋯ 𝑥1𝑝
⋮ ⋱ ⋮
𝑥 𝑛1 ⋯ 𝑥 𝑛𝑝
]
[
β1
⋮
β 𝑖
⋮
β 𝑝]
+
[
ԑ1
⋮
ԑ𝑖
⋮
ԑ 𝑛]
If constant is not included, X, the design matrix, has now dimensions n-by-p.
The estimated value of the unknown parameter β:
𝛽̂ = (𝑋𝑋 𝑇
)−1
𝑋 𝑇
𝑌
Estimation can be carried out if, and only if, there is no perfect multicollinearity between the predictor
variables.
If constant is not included, the parameters can also be estimated by
𝛽̂𝑗 =
∑ 𝑥𝑖𝑗 𝑦𝑖
𝑛
𝑖=1
∑ 𝑥𝑖𝑗
2𝑛
𝑖=1
The standardized coefficients are
𝛽̂𝑖
𝑠𝑡
=
𝛽̂𝑖 ∗ 𝑆 𝑥𝑖
𝑆 𝑦
Where
 Sxi is the unbiased standard deviation of the i-th predictor variable

32
 Sy is the unbiased standard deviation of the response variable y
The estimate of the standard error of each coefficient is obtained by
𝑠𝑒(𝛽̂𝑖) = √𝑀𝑆𝐸 ∗ (𝑋𝑋 𝑇)𝑖𝑖
−1
Where MSE is the mean squared error of the regression model.
It is known that
𝛽̂𝑖
𝑠𝑒(𝛽̂𝑖)
↝ 𝑡 𝑛−𝑝−1
Where
 p is the number of predictor variables
 n is the total number of observations (number of rows in the design matrix)
If constant is not included, the degrees of freedom for the t statistics are n-p.
ANOVA for linear regression
If the constant is included.
Component Sum of
squares
Degrees of
freedom
Mean of squares F
Model SSM p MSM = SSM/p MSM/MSE
Error SSE n-p-1 MSE = SSE/(n-p-1)
Total SST n-1 MST = SST/(n-1)
Being
𝑆𝑆𝑀 = ∑(𝑦𝑖̂ − 𝑦̅)2
𝑛
𝑖=1
𝑆𝑆𝐸 = ∑(𝑦𝑖 − 𝑦𝑖̂)2
𝑛
𝑖=1
𝑆𝑆𝑇 = ∑(𝑦𝑖 − 𝑦̅)2
𝑛
𝑖=1
Where
 n is the total number of observations (number of rows in the design matrix)
 SSE = sum of squared residuals
 MSE = mean squared error of the regression model
The test statistic has a F-distribution with p and (n-p-1) degrees of freedom. Thus the ANOVA null
hypothesis is rejected if 𝐹 ≥ 𝐹(1 − 𝛼) 𝑛−𝑝−1
𝑝
The coefficient of determination R2
is defined as SSM/SST. It is output as a percentage.

33
The Adjusted R2
is defined as 1-MSE/MST. It is output as a percentage.
The square root of MSE is called the standard error of the regression, or standard error of the Estimate.
If the constant is not included.
Component Sum of
squares
Degrees of
freedom
Mean of squares F
Model SSM p MSM = SSM/p MSM/MSE
Error SSE n-p MSE = SSE/(n-p)
Total SST n SST/n
Being
𝑆𝑆𝑀 = ∑ 𝑦𝑖̂2
𝑛
𝑖=1
𝑆𝑆𝐸 = ∑(𝑦𝑖 − 𝑦𝑖̂)2
𝑛
𝑖=1
𝑆𝑆𝑇 = ∑ 𝑦𝑖
2
𝑛
𝑖=1
Unstandardized Predicted Values
The fitted values (or unstandardized predicted values) from the regression will be
𝑌̂ = 𝑋𝛽̂ = 𝑋(𝑋𝑋 𝑇
)−1
𝑋 𝑇
𝑌 = HY
where H is the projection matrix (also known as hat matrix)
H = X(XXT
)-1
XT
Standardized Predicted Values
Once computed the mean and unbiased standard deviation of the unstandardized predicted values, we
standardize the fitted values as
𝑦̂𝑖
𝑠𝑡
=
𝑦̂𝑖 − 𝑦̂̅
𝑆 𝑦̂
When new predictions are included outside of the design matrix, they are standardized with the above
values.
Prediction Intervals for Mean
Let define the vector of given predictors as

34
Xh = (1, xh,1, xh,2, …, xh, p)T
We define the standard error of the fit at Xh given by:
𝑠𝑒(𝑦̂ℎ) = √𝑀𝑆𝐸 ∗ 𝑋ℎ
𝑇
(𝑋 𝑇 𝑋)−1 𝑋ℎ
Then, the Confidence Interval for the Mean Response is
𝑦̂ℎ ± 𝑡 𝛼
2
;𝑛−𝑝−1
∗ 𝑠𝑒(𝑦̂ℎ)
Where
 X is the design matrix
 ŷh is the "fitted value" or "predicted value" of the response when the predictor values are Xh.
 MSE is the mean squared error of the regression model
 n is the total number of observations
Prediction Intervals for Individuals
Let define the vector of given predictors as
Xh = (1, xh,1, xh,2, …, xh, p)T
We define the standard error of the fit at Xh given by:
𝑠𝑒(𝑦̂ℎ) = √𝑀𝑆𝐸 ∗ [1 + 𝑋ℎ
𝑇(𝑋 𝑇 𝑋)−1 𝑋ℎ]
Then, the Confidence Interval for individuals or new observations is
𝑦̂ℎ ± 𝑡 𝛼
2
;𝑛−𝑝−1
∗ 𝑠𝑒(𝑦̂ℎ)
Where
 X is the design matrix
 ŷh is the "fitted value" or "predicted value" of the response when the predictor values are Xh.
 n is the total number of observations
Unstandardized Residuals

35
The Unstandardized Residual for the i-th data unit is defined as:
êi = yi - ŷi
In matrix notation
Ê = Y - Ȳ = Y – HY = (Inxn – H)Y
Where H is the hat matrix.
Standardized Residuals
The standardized Residual for the i-th data unit is defined as:
eŝ 𝑖 =
ê 𝑖
√ 𝑀𝑆𝐸
Where
 êi is the unstandardized residual for the i-th data unit.
Studentized Residuals (internally studentized residuals)
The leverage score for the i-th data unit is defined as:
hii = [H]ii
the i-th diagonal element of the projection matrix (also known as hat matrix)
H = X(XXT
)-1
XT
where X is the design matrix.
The Studentized Residual for the i-th data unit is defined as:
𝑡𝑖 =
𝑒̂ 𝑖
√𝑀𝑆𝐸 ∗ (1 − ℎ𝑖𝑖)
Where

36
Source:https://en.wikipedia.org/wiki/Studentized_residual
Centered Leverage Values
The regular leverage score for the i-th data unit is defined as:
hii = [H]ii
the i-th diagonal element of the projection matrix (also known as hat matrix)
H = X(XXT
)-1
XT
where X is the design matrix.
The centered leverage value for the i-th data unit is defined as:
clvi = hii – 1/n
Where n is the number of observations.
If the intercept is not included, then the centered leverage value for the i-th data unit is defined as:
clvi = hii
Source:https://en.wikipedia.org/wiki/Leverage_(statistics)
Mahalanobis Distance
The Mahalanobis Distance for the i-th data unit is defined as:
Di2
= (n - 1)*(hii – 1/n) = (n - 1)*clvi
Where
 hii is the i-th diagonal element of the projection matrix.
 n is the number of observations
If the intercept is not included, the Mahalanobis Distance for the i-th data unit is defined as:
Di2
= n*hii
Source: https://en.wikipedia.org/wiki/Mahalanobis_distance
Cook’s Distance

37
The Cook’s Distance for the i-th data unit is defined as:
𝐷𝑖 =
𝑒̂ 𝑖
2
ℎ𝑖𝑖
𝑀𝑆𝐸 ∗ (𝑝 + 1) ∗ (1 − ℎ𝑖𝑖)2
Where
 hii is the i-th diagonal element of the projection matrix.
If the intercept is not included, the Cook’s Distance for the i-th data unit is defined as:
𝐷𝑖 =
𝑒̂ 𝑖
2
ℎ𝑖𝑖
𝑀𝑆𝐸 ∗ 𝑝 ∗ (1 − ℎ𝑖𝑖)2
Source: https://en.wikipedia.org/wiki/Cook%27s_distance
Curve Estimation Models
Linear. Model whose equation is Y = b0 + (b1 * t). The series values are modeled as a linear
function of time.
Quadratic. Model whose equation is Y = b0 + (b1 * t) + (b2 * t**2). The quadratic model can be
used to model a series that "takes off" or a series that dampens.
Cubic. Model that is defined by the equation Y = b0 + (b1 * t) + (b2 * t**2) + (b3 * t**3).
Quartic. Model that is defined by the equation Y = b0 + (b1 * t) + (b2 * t**2) + (b3 * t**3) + (b4
* t**4).
Quintic. Model that is defined by the equation Y = b0 + (b1 * t) + (b2 * t**2) + (b3 * t**3) + (b4
* t**4) + (b5 * t**5).
Sextic. Model that is defined by the equation Y = b0 + (b1 * t) + (b2 * t**2) + (b3 * t**3) + (b4 *
t**4) + (b5 * t**5) + (b6 * t**6).
Logarithmic. Model whose equation is Y = b0 + (b1 * ln(t)).
Inverse. Model whose equation is Y = b0 + (b1 / t).
Power. Model whose equation is Y = b0 * (t**b1) or ln(Y) = ln(b0) + (b1 * ln(t)).
Compound. Model whose equation is Y = b0 * (b1**t) or ln(Y) = ln(b0) + (ln(b1) * t).
S-curve. Model whose equation is Y = e**(b0 + (b1/t)) or ln(Y) = b0 + (b1/t).

38
Logistic. Model whose equation is Y = 1 / (1/u + (b0 * (b1**t))) or ln(1/y-1/u) = ln (b0) + (ln(b1)
* t) where u is the upper boundary value. After selecting Logistic, specify the upper boundary value to
use in the regression equation. The value must be a positive number that is greater than the largest
dependent variable value.
Growth. Model whose equation is Y = e**(b0 + (b1 * t)) or ln(Y) = b0 + (b1 * t).
Exponential. Model whose equation is Y = b0 * (e**(b1 * t)) or ln(Y) = ln(b0) + (b1 * t).

39
© Copyright InnerSoft 2017. All rights reserved.
Los hijos perdidos del Sinclair ZX Spectrum 128K (RANDOMIZE USR 123456)
innersoft@itspanish.org
innersoft@gmail.com
http://isstats.itspanish.org/

InnerSoft STATS - Methods and formulas help

More Related Content

What's hot (20)

Similar to InnerSoft STATS - Methods and formulas help (20)

More from InnerSoft (10)

Recently uploaded (20)

InnerSoft STATS - Methods and formulas help