F unit 5.pptx

Analysis of Data
• Analysis mean the computation of certain
indices or measures along with searching for
patterns of relationship that exist among the
data group.

Analysis of
data
Descriptive
analysis
Statistical
analysis

• Descriptive analysis is largely the study of
distributions of one variables.
• This sort of analysis may be in respect of one
variable (uni-dimensional analysis), or in
respect to two variables (bivariate analyisis) or
in respect to more than two variables
(multivariate analysis).

Correlation analysis studies the joint variation
of two or more variables for determining the
amount of correlation between two or more
variables.
Causal analysis is concerned with the study of
how one or more variables affect changes in
other variables.

Multiple Regression Analysis- this analysis
adopted when the researcher has one
dependent variable which is presumed to be a
function of two or more independent
variables. The objective of this analysis is to
make a prediction about the dependent
variable based on its covariance with all the
concerned independent variables.

Multiple discriminant analysis:- this analysis is
appropriate when the researcher has a single
dependent variable that cannot be measured,
but can be classified into two or more groups
on the basis of some attribute. The object of
this analysis happens to be to predict an
entity’s possibility of belonging to a particular
group based on several predictor variables.

Multivariate Analysis of Variance (ANOVA)
This analysis is an extension of two-way ANOVA.
wherein the ratio of among group variance to
within group variance is worked out on a set
of variables.
Canonical analysis: This analysis can be used in
case of both measurable and non-measurable
variables for the purpose of simultaneously
predicting a set of dependent variables from
their joint covariance with a set of
independent variables.

Inferential analysis is concerned with the various
tests of significance for testing hypotheses in
order to determine with what validity data can
be said to indicate some conclusion or
conclusions. It is also concerned with the
estimation of population values. It is mainly
on the basis of inferential analysis that the
task of interpretation (i.e., the task of drawing
inferences and conclusions) is performed

CHARACTERSITICS OF STATISTICAL
METHOD
• Aggregative study
• Quantitative study
• Relevant study

IMPORTANCE & MERITS OF
STATISTICAL METHOD
• Helpful in planning
• Helpful in administration
• Helpful in business
• helpful in objective study of social phenomenon
• Useful for social investigators & social researcher
• Brings about a grater precision in our thinking &
study

PARAMETRIC TEST
Parametric tests are those that make
assumptions about the parameters of the
population distribution from which the
sample is drawn. This is often the assumption
that the population data are normally
distributed. Non-parametric tests are
“distribution-free” and, as such, can be used
for non-Normal variables.

• In Statistics, a parametric test is a kind of
hypothesis test which gives generalizations for
generating records regarding the mean of the
primary/original population. The t-test is carried
out based on the students’ t-statistic, which is
often used in that value.
• The t-statistic test holds on the underlying
hypothesis, which includes the normal
distribution of a variable. In this case, the mean is
known, or it is considered to be known. For
finding the sample from the population,
population variance is identified. It is
hypothesized that the variables of concern in the
population are estimated on an interval scale.

ASSUMPTION IN PARAMETRIC TEST
1. Normality – Data in each group should be
normally distributed.
2. Equal Variance – Data in each group should
have approximately equal variance.
3. Independence – Data in each group should be
randomly and independently sampled from
the population.
4. Scale:- interval & Ratio

Normality, also known as a Gaussian distribution or
a “bell-shaped” curve, refers to the degree to
which data show a central tendency and
symmetrical distribution relative to the mean. It
is important for data in each group being
compared to exhibit characteristics of normality.
Because parametric tests compare means
between data groups, the means must be a
faithful representation of the data. Parametric
tests’ p values will only be valid when the data
exhibit a normal distribution.

Homoscedasticity, or homogeneity of variance, is
the other primary assumption for parametric
tests. This refers to the dispersion pattern of data
and how similar this pattern is between groups
being compared. Not only is it assumed that data
in each comparison group demonstrate a normal
distribution, but each group is also assumed to
exhibit similar levels of variability or “noise” for
parametric tests to compare them accurately.
When groups exhibit different variances,

When parametric tests are used
• When the data has a normal distribution
• When the measurement scale is interval or
ratio
Types of Parametric test–
• Two-sample t-test
• Paired t-test
• Analysis of variance (ANOVA)
• Pearson coefficient of correlation
•

What is a Non-Parametric Test?
There is no requirement for any distribution of the
population in the non-parametric test. Also, the
non-parametric test is a type hypothesis test that
is not dependent on any underlying hypothesis.
In the non-parametric test, the test depends on
the value of the median. This method of testing is
also known as distribution-free testing. Test
values are found based on the ordinal or the
nominal level. The parametric test is usually
performed when the independent variables are
non-metric. This is known as a non-parametric
test.

Non-parametric test are also known is
distribution-free test is considered less
powerful as it uses less information in its
calculation and makes fewer assumption
about the data set.

Parametric Tests Non- Parametric Test
Normal Distribution Skewed Distribution
Quantitative data Qualitative Data
Measurement Scale Interval & Ratio Nominal & Ordinal Scale
More Powerful Less Powerful
Compare Mean & Standard Deviation Compare Percentage& Proportion
Complete information about population Incomplete information about population
Certain assumption about population No assumption are made about
population
Applicability is in variables Applicability is in variables & attributes
Eg. T-test, Z-test, F-Test, ANOVA Chi-square Test, Kruskal Wallis Test, Mann
Whitney test

Types Of Parametric Test
• Student's T-Test:- This test is used when the samples are small and population variances are
unknown. The test is used to do a comparison between two means and proportions of small
independent samples and between the population mean and sample mean.
• 1 Sample T-Test:- Through this test, the comparison between the specified value and
meaning of a single group of observations is done.
• Unpaired 2 Sample T-Test:- The test is performed to compare the two means of two
independent samples. These samples came from the normal populations having the same or
unknown variances.
• Paired 2 Sample T-Test:- In the case of paired data of observations from a single sample, the
paired 2 sample t-test is used.
• ANOVA:- Analysis of variance is used when the difference in the mean values of more than
two groups is given.
• One Way ANOVA:- This test is useful when different testing groups differ by only one factor.
• Two Way ANOVA:- When various testing groups differ by two or more factors, then a two
way ANOVA test is used.
• Pearson's Correlation Coefficient:- This coefficient is the estimation of the strength
between two variables. The test is used in finding the relationship between two continuous
and quantitative variables.
• Z - Test:- The test helps measure the difference between two means.
• Z - Proportionality Test:- It is used in calculating the difference between two proportions.

TYPES OF NON-PARAMETRIC TEST
1 Sample Sign Test:- In this test, the median of a population is calculated and is
compared to the target value or reference value.
1 Sample Wilcoxon Signed Rank Test:- Through this test also, the population median is
calculated and compared with the target value but the data used is extracted from
the symmetric distribution.
Friedman Test:- The difference of the groups having ordinal dependent variables is
calculated. This test is used for continuous data.
Goodman Kruska's Gamma:- It is a group test used for ranked variables.
Kruskal-Wallis Test:- This test is used when two or more medians are different. For the
calculations in this test, ranks of the data points are used.
The Mann-Kendall Trend Test:- The test helps in finding the trends in time-series data.
Mann-Whitney Test:- To compare differences between two independent groups, this
test is used. The condition used in this test is that the dependent values must be
continuous or ordinal.
Mood's Median Test:- This test is used when there are two independent samples.
Spearman Rank Correlation:- This technique is used to estimate the relation between
two sets of data.

What is a Parametric Test?
In Statistics, the generalizations for creating records about the
mean of the original population is given by the parametric
test. This test is also a kind of hypothesis test. A t-test is
performed and this depends on the t-test of students, which
is regularly used in this value. This is known as a parametric
test.
The t-measurement test hangs on the underlying statement that
there is the ordinary distribution of a variable. Here, the
value of mean is known, or it is assumed or taken to be
known. The population variance is determined in order to
find the sample from the population. The population is
estimated with the help of an interval scale and the variables
of concern are hypothesized.

The Chi-Square test
Conformed?
Number of group members?
2 4 6 8 10
Yes 20 50 75 60 30
No 80 50 25 40 70
Apparently, conformity less likely when less or more group
members…

• 20 + 50 + 75 + 60 + 30 = 235 conformed
• out of 500 experiments.
• Overall likelihood of conforming = 235/500 =
.47

Calculating the expected, in general
• Null hypothesis: variables are independent
• Recall that under independence:
P(A)*P(B)=P(A&B)
• Therefore, calculate the marginal probability
of B and the marginal probability of A.
Multiply P(A)*P(B)*N to get the expected cell
count.

Expected frequencies if no
association between group size and
conformity…
Conforme
d?
Number of group members?
2 4 6 8 10
Yes 47 47 47 47 47
No 53 53 53 53 53

• Do observed and expected differ more than
expected due to chance?

Chi-Square test


expected
expected)
-
(observed 2
2

Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4
85
53
)
53
70
(
53
)
53
40
(
53
)
53
25
(
53
)
53
50
(
53
)
53
80
(
47
)
47
30
(
47
)
47
60
(
47
)
47
75
(
47
)
47
50
(
47
)
47
20
(
2
2
2
2
2
2
2
2
2
2
2
4























The Chi-Square distribution:
is sum of squared normal deviates
The expected
value and
variance of a chi-
square:
E(x)=df
Var(x)=2(df)
)
Normal(0,1
~
Z
where
;
1
2
2



df
i
Z
df


Chi-Square test


expected
expected)
-
(observed 2
2

Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(5-1)=4
Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom,
indicates statistical significance. Here 85>>4.
85
53
)
53
70
(
53
)
53
40
(
53
)
53
25
(
53
)
53
50
(
53
)
53
80
(
47
)
47
30
(
47
)
47
60
(
47
)
47
75
(
47
)
47
50
(
47
)
47
20
(
2
2
2
2
2
2
2
2
2
2
2
4























22
.
1
0156
.
019
.
91
)
982
)(.
018
(.
352
)
982
)(.
018
(.
)
033
.
014
(.
018
.
453
8
;
)
1
)(
(
)
1
)(
(
0
)
ˆ
ˆ
(
033
.
91
3
;
014
.
352
5
2
1
2
1
/
/



















Z
p
n
p
p
n
p
p
p
p
Z
p
p nophone
tumor
cellphone
tumor
Brain tumor No brain tumor
Own a cell
phone
5 347 352
Don’t own a
cell phone
3 88 91
8 435 453
Chi-square example: recall data…

Same data, but use Chi-square test
48
.
1
22
.
1
:
note
48
.
1
7
.
345
345.7)
-
(347
3
.
89
88)
-
(89.3
7
.
1
1.7)
-
(3
3
.
6
6.3)
-
(8
df
1
1
1
1
1
d
cell
in
89.3
b;
cell
in
345.7
c;
cell
in
1.7
6.3;
453
*
.014
a
cell
in
Expected
014
.
777
.
*
018
.
777
.
453
352
;
018
.
453
8
2
2
2
2
2
2
1
2

















Z
NS
*
)
)*(C-
(R-
xp
p
p
p
cellphone
tumor
cellphone
tumor

Brain tumor No brain tumor
Own 5 347 352
Don’t own 3 88 91
8 435 453
Expected value in cell
c= 1.7, so technically
should use a Fisher’s
exact here! Next term…

F unit 5.pptx

More Related Content

Similar to F unit 5.pptx (20)

More from agreshgupta (20)

Recently uploaded (20)

F unit 5.pptx