Fundamental of Statistics and Types of Correlations

Statistics
Dr Rajesh Verma
Assistant Professor in Psychology
Govt. College Adampur, Hisar (Haryana)

Parametric and Non-Parametric Statistics.
Parametric Statistics – The data that meets the
assumptions of Normal Probability Curve (NPC) is
analyzed through parametric tests.
Based on parameters of normal curve.
Assumption – The data should be
normally distributed and
Sample size > 30.
Nonparametric Statistics – Not based on the parameters
of the normal curve. The data violates the assumption
of NPC and have limited
sample sizes (e.g., n < 30).

Descriptive and Inferential Statistics.
Descriptive Statistics – The statistical procedure of
describing fundamental characteristics, features and
properties of the collected data with the help of certain
measures: -
(i) Mean/median/mode (Central Tendency)
(ii) Range, standard deviation, variance (dispersion) and
(iii) Standard Error.
The data is
graphically (Graphs, Pie
charts, tables),
represented so that the
trends it shows can be
grasped easily.

Inferential Statistics – Inference means a conclusion
reached on the basis of evidence and reasoning.
A type of statistics in which conclusion are drawn
from sample and generalized to the population from
which sample was drawn. The data collected from
sample is put to analysis techniques and inferences are
drawn. The Analysis of Variance is the most often used
method that tends
to determines the
probability of
characteristics
of the sample.

Error – The hypothesis is tested using inferential
statistics based on probability theory therefore,
no hypothesis test is foolproof. It means there are
always chances of making errors while making
conclusions. Hence there are chances of
occurrence of two type of errors i.e. type I error
and type II error.

Type I Error – Rejecting the true null hypothesis (False
positive or incorrect rejection). It means there is
significant difference between two groups or
relationship between two variables when in fact there is
not. E.g. a pathological test shows a disease when there
is none. The probability of making a type I error is the
level of significance set for the hypothesis test usually it
is .05 and denoted by α (alpha). To decreased the type I
error lower the
level of significance
or increase the
sample size.

Type II error – Not rejecting the false null hypothesis
(False negative or incorrect not rejection). It means there
is no significant difference between two groups or no
relationship between two variables when in fact there is.
E.g. a pathological test fails to detect a disease when
there is one. The probability of making a type II error is
β, which depends on the
power of the
test. To
decrease the
chances of
type II error
is increase the
sample size.

Setting Up the Level of Significance
In English, "significant" means ‘important’, while in
Statistics "significant" means probably true (not due to
chance). A research finding may be true without being
important. When statisticians say a result is "highly
significant" they mean it is very probably true. They do not
(necessarily) mean it is highly important.
Popular Level of Significance in Social Sciences .05 and .01.
The confidence with which an researcher rejects or retains a
null hypothesis depends upon the level of significance.

Meaning of Levels of Confidence
Le
vel
Amount of
confidence
Interpretation
.05 95% If the experiment is repeated a
100 times, only on five
occasions the obtained mean
will fall outside the limited µ ±
1.96 SE
.01 99% If the experiment is repeated a
100 times, only on one
occasions the obtained mean
will fall outside the limited µ ±
2.58 SE

Fundamental of Statistics and Types of Correlations

Prior to the experiment for testing the hypothesis the
researcher has to decide about the level of confidence or
significance
(i) .05 or 5% level of significance – A criteria for
rejecting a null hypothesis (If a hypothesis is rejected at the
5% level it is said that the chances are 95 out of 100, that the
hypothesis is not true and only 5 chances out of 100 that it is
true).
(ii) .01 or 1% level of significance – A criteria for
rejecting a null hypothesis (If a hypothesis is rejected at this
level, the chances are 99 out of
100, that the hypothesis is not
true and that only 1 chance
out of 100 is true).

Setting up Levels of Confidence
(i) Sample Size – N < 30 or N > 30
(ii) Type of test – One tailed or two tailed test
(iii) State the null hypothesis and the alternative hypothesis.
(iv) Set the criteria for a decision i.e. .05 or .01
(iii) Level of significance for the hypothesis test is the
probability used to define the very unlikely sample outcomes, if
the null hypothesis is true.
(iv) Critical Region - This is the region which is composed of
extreme sample values that are very unlikely outcomes if the null
hypothesis is true. The boundaries for the critical region are
determined by the alpha level. If sample data fall in the critical
region, the null hypothesis is rejected.
(v) Collect data and compute
sample Statistics.

Product Moment Coefficient Correlation
(i) Definition - Correlation is a measure of
association between to variables.
The relationship between two variables can be
- Linear - The relationship between two
variables that can be plotted as a straight line.
- Non-linear – The relationships that cannot be
plotted as a
straight line.

(ii) Direction of Relationship – Positive or Negative.
(iii) Strength of Relationship – Range of correlation is
-1 to +1 (perfect). The correlation between two
variables is expressed in terms
of a number and called as
correlation coefficient that is
denoted by r. The near the
correlation coefficient to -1 or +1, stronger it is.

The strength is calculated in terms of percentage of r
i.e. 𝒓 𝟐
𝐱100 for example if r = .73 then .5329x100 = 53.29%.
The correlation coefficient is a description of association
between two variables in the sample hence, it is a descriptive
statistics.
(ii) Measurements of Correlation –
- Pearson Product Moment Method (Data must be in
interval or ratio scale),
- Partial Correlation,
- Biserial Correlation,
- Point Biserial Correlation,
- Tetrachoric, and Phi coefficient,
- Spearman Rank Order (Data in ordinal scale), and
- Kendall’s Tau (Data in ordinal scale),

Pearson Product Moment Method
XxY XxY X Sqr Y Sqr
11x13 143 121 169
13x16 208 169 256
16x14 224 256 196
9x10 90 81 100
6x8 48 36 64
17x16 272 289 256
7x9 63 49 81
12x12 144 144 144
5x7 35 25 49
14x15 210 196 225
1437 1366 1540

Pearson Product Moment Method Example (your book)

Significance of Correlation Coefficient
When the Pearson’s Correlation coefficient is
computed as an index of description of relationship
between two variables in the sample, the significance
testing is not required. The interpretation of correlation
from the value and direction is enough.
However, when correlation is computed as an
estimate of population
correlation, obviously,
statistical significance
testing is required.

Other Types of Correlation (Phi-Coefficient)
Other types means other than Pearson r
correlations. They are of three types: -
(i) Special type Pearson Correlations (Point-Biserial
Correlation and Phi coefficient),
(ii) Non-Pearson Correlations (Biserial and
Tetrachoric), and
(iii) Rank
Order
Correlations
(Spearman’s rho
and Kendall’s
tau).

Special type of Pearson Correlations
(a) Point-Biserial Correlation (rPB) (PBS) – When
variables are dichotomous i.e. the variables that have
two mutually exclusive categories e.g. male-female,
rural-urban etc. They are not continuous hence Pearson
r can not be applied. For example – If correlate gender,
then let male = 0, and females = 1 or any allot any
other number to
males and females.
So PBS is used
when one variable
is Dichotomous
and other
continuous.

Example of Point Biserial Correlation

Phi Coefficient (ϕ)
When both the variables are dichotomous
the Pearson Correlation is called as Phi
Coefficient (ϕ).
Formula and other
calculations will
remain same.

Significance of Phi Coefficient
The value of Phi Coefficient is squared and
multiplied with n which is equivalent to Chi Square (χ²),
the resultant value is checked in the Chi Square table. If
the computed value is less than the tabulated value we
retain the null hypothesis and vice-versa.
As we know the
relationship between
Chi Square and
Phi Coefficient is

Biserial Correlation Coefficient (rb)
The Biserial Correlation Coefficient is calculated
when both the variables are continuous but one of them
is measured dichotomously. For example correlation
between Academic Achievements (AA) and IQ where
subjects are measured discreetly as those with Low IQ
and High IQ.
Take AA as Y and IQ as X
Low IQ = 0 and High IQ = 1
Formula to calculate rb

Y1 = Mean of data correspondence to X = 1
Y0 = Mean of data correspondence to X = 1
Sy = Standard Deviation of Y
Po = Proportion of data when X = 0 (Po/no)
P1 = Proportion of data when X = 1 (P1/n1)
H = Height of Normal Distribution Curve that
divides Po and P1.
Suppose Therefore
X Y Y1 = 75 +90+48=213/3=71
0 80 Y0 = 80+64 = 144/2=72
1 75 P0 = (n0 = 2, total=5)
1 90 therefore P0 = 2/5
0 64 P1 = (n1 = 3, total=5)
1 48 therefore P1 = 3/5

Tetrachoric Correlation
When both the variables are continuous but
measured as Dichotomous then the technique used to
assess the correlation between such variables is
Tetrachoric Correlation. In other words a correlation
between two dichotomous variables that have
underlying continuous distribution. The Tetrachoric
Correlation coefficient rtet tells the degree of the
association between ratings for two raters. “0” indicates
no agreement while “1”
indicates a perfect
agreement.

Tetrachoric Correlation Example from
IGNOU Book
For example, attitude towards females and attitude
towards liberalization are the two variables that are to be
correlated. In this case we collect the data dichotomously
i.e. Positive or Negative Attitude. So, 0 represents
‘negative attitude’ and 1 ‘positive attitude’ on both the
variables. The correlation between these two variables
will be computed using Tetrachoric correlation.

Continued………
Check Cos value in the Cosine
table or through internet.
Therefore correlation between
attitude towards Women and
Liberalization is moderately
strong and positive.

Rank Order Correlations
When the data is in ordinal scale, rank order correlations
are used. It means the data is presented in the rank such as first
rank, second rank, third rank etc. In other words the individuals
are assigned ranks. The rank-order correlations are applicable
when the relationship between two variables is not linear but
monotonic. Monotonic relationship means the values in the dataset
consistently increases and never decreases or consistently decreases
and never increases. We have two methods under Rank Order
Correlations i.e.
(i) Spearman’s Rank order
Correlation or Spearman’s
Rho ρ (rs).
(ii) Kendall’s Tau (τ )

Spearman’s Rank order Correlation or
Spearman’s Rho ρ (rs).
Charles Spearman developed this technique in
1904. It is computed when the data is in Rank Order or
when two judges for inter-judge agreement. Range is -1 to
+1 and interpretation is dependent upon the sign and
value of the coefficient.
Where, D = Difference
between ranks of X and Y
n = Number of pair of ranks

Example
Subject Rank
(X)
Rank
(Y)
Difference
(D)
D²
1 2 6 -4 16
2 4 8 -4 16
3 6 3 3 9
4 7 1 6 36
5 8 9 -1 1
6 1 2 -1 1
7 9 10 -1 1
8 10 4 6 36
9 3 5 -2 4
10 5 7 -2 4
Total 124
= 1 −
6(124)
10 102 − 1
= 1- 744/990
= 1-0.752
=0.248

Spearman’s Rho ρ (rs) for Tied Ranks
If the two or more subjects get the same score than
allot the ranks to all subjects. Then check the
scores and add up the ranks of same score.
Calculate the average of the ranks. For example if
two subjects have
similar score allotted
with rank 6 and 7
then their respective
ranks will be
6+7=13/2=6.5.

Spearman’s Rho ρ (rs) Formula for Tied Ranks

Spearman’s Rho ρ Example for Tied Ranks from
IGNOU Book

Significance Testing of Spearman’s Rho ρ
Check the table for significance (remember to consider
the whether the test is two tailed or one tailed). If the
calculated value is more than the tabulated value reject
the Null hypo and vice-versa. In our case the calculated
value is 0.902 which is higher than the table value (0.648
at .05 significance level and 0.794 at .01 significance level
for two tailed tests).
Hence, we reject the
Null hypothesis
and accept the
alternative
hypothesis.

Kendall’s Tau (τ)
It was developed by Kendall in 1938 and
alternative to Spearman rho. The range is -1 to +1.
The coefficient is interpreted on the basis of sign
and value. It is based on the concordance and
discordance of two set of ranks of variables X and
Y.

Example to Calculate Kendall’s Tau (τ)
Suppose we asked two booksellers to rank 8
subjects as per demand of their books among the
students. C is concordance (The number of scores below a
particular score
greater than
that particular
Score).
D is Discordance
(The number of
scores below a
particular score
smaller than
that particular
score).
Book Seller
(X)
Book seller
(Y)
C D
Psychology 1 3 5 2
Maths 2 2 5 1
Geography 3 4 4 1
History 4 6 2 2
Phy Edu 5 1 3 0
Hindi 6 7 1 1
English 7 8 0 1
Biology 8 5
Total
20 8

Example to Calculate Kendall’s Tau (τ)
Formula to calculate τ =
𝐂−𝐃
𝑪+𝑫
Where, C is concordance = 20
D is Discordance = 8
Therefore, τ =
𝟐𝟎−𝟖
𝟐𝟎+𝟖
=
𝟏𝟐
𝟐𝟖
= 𝟎. 𝟒𝟐𝟖𝟔
The correlation between two booksellers is
moderate and positive.
For testing the significance of τ check the table at
.05 or .01 if calculated value is lesser than the table value
the Null Hypothesis is rejected and vice versa.

References:
1. Retrieved on 12 November to 23 November
2019 from http://egyankosh.ac.in/

Meet You Soon
With Next Video
vermasujit@yahoo.com
Thanks

Fundamental of Statistics and Types of Correlations

More Related Content

What's hot (20)

Similar to Fundamental of Statistics and Types of Correlations (20)

More from Rajesh Verma (20)

Recently uploaded (20)

Fundamental of Statistics and Types of Correlations