SlideShare a Scribd company logo
Page 1 of 15
DESCRIPTION OF THE TOPIC
Choosing the right statistical method for data analysis is always a challenge as it dependent on a
host of things.
Before we discuss the major determinants of choice of a method in detail, it is also important to
understand that one should have a Research/Data Analysis blueprint of the study one is
undertaking.
1. Research/Data Analysis Blueprint
Generally, the research starts with a broad research question that is often divided into more
measurable, narrower objectives (See Figure 1). Each objective is achieved by splitting the subject
matter into certain statistically testable hypotheses.
Items Description of Topic
Course Data Analysis for Social Science Teachers
Topic
Choosing the Right Statistical Method for
Data Analysis
Page 2 of 15
Figure 1: The Research Blueprint--Objective Hypotheses Mapping
There is no standard rule as to how many hypotheses a research objective can have. One research
objective might have one or two or more hypotheses. However, it is important that each objective
be split into one or more testable hypotheses.
In order that one is clear about how a hypothesis is tested, one must identify the variables
associated with each of the hypotheses (see Figure 2). There is no rule as to how many variables a
hypothesis will have. There could be a hypothesis with just one variable (such as test of population
mean to be equal to a number) or there could be two variables (like tests of hypothesis of
association or difference) or even more (like factor analysis/multiple regression).
Each of the variables is then identified as a Dependent or Independent variable given the nature of
the hypothesis being tested. Further against each variable, its level of measurement is noted. We
shall have them noted as Nominal, Ordinal, Interval or Ratio. Often the nominal and ordinal levels
are be combined into Categorical whereas the Interval and Ratio levels are labeled as Numerical.
Page 3 of 15
The categorical variable is also called Non-Metric or non-Parametric variable. The Numerical
Variables are also called metric or parametric or sometimes even as a continuous variable by some
authors.
Figure 2: The Research Blueprint—Objective-Hypothesis-Variable-Test Mapping
2. Major Determinants of Choice of s Statistical Method
The choice of particular statistical method is generally determined the following:
a) Number and Level of Measurement of Variables
b) Distribution of the variable
c) Dependence and Independence Structure
d) Nature of the Hypothesis
e) Sample Size
We shall now briefly discuss the above:
Page 4 of 15
2.1. Level of Measurement of Variables
We know that there are four levels of measurement:
a) Nominal
b) Ordinal
c) Interval
d) Ratio
Often the nominal and ordinal levels are to be combined into Categorical whereas the Interval and
Ratio levels are to be labeled as Numerical. The categorical variable is also called Non-Metric or
non-Parametric variable. The Numerical Variables are also called metric or parametric or
sometimes it is even called a continuous variable by some authors.
While choosing a particular test, we shall be asking the question:
What is the level of measurement of the data?
--Nominal/Ordinal/interval/Ration
Or simply Categorical or Numerical?
2.2. Distribution of Underlying Variables
Based on the level of measurement, the data might follow a distribution like Normal, Binominal,
Poisson etc. and it might not have a distribution. The variables measured on nominal and ordinal
scales generally do not have any distribution whereas the numerical variables might follow a
normal distribution or other distribution. The tests that are used when the categorical variables are
involved are called non-parametric or distribution-free tests. The tests that are used with numerical
variables will be called parametric tests.
While choosing a particular test we shall be asking the question:
Page 5 of 15
Is the data parametric (measured on a numerical scale) or non-parametric (measured on
a categorical scale)?
2.3. Nature of Hypothesis
Broadly a hypothesis can be categorized as:
a) Hypothesis of Association/Causation and
b) Hypothesis of Differences
The hypothesis of association/causation examines the nature and strength of the relationship
between variables. Correlation, Regression are such examples.
The hypothesis of difference examines whether the two populations differ on a parameter like
mean. Using hypothesis of difference, we generally test the equality of two or more population
means.
While choosing a particular test, we shall be asking the question:
What is the nature of the hypothesis?
---Hypothesis of Association/Causation OR Hypothesis of Differences
2.4. No. of Variables in the Hypothesis
The number of variables associated with a hypothesis is also an important determinant of the
choice of a statistical technique.
Based on the number of variables, we sometimes even classify the statistical techniques as
Univariate (involving one variable) /Bi-variate (two variables)/ Multivariate (more than two)
techniques.
Page 6 of 15
While choosing a particular test, we shall be asking the question:
How many Variables are there in the hypothesis?
-- One or two or more than two
3. An approach for Choosing a Statistical Method
Several authors present different approaches to choose a statistical method. An approach generally
involves starting with one of the above determinants and drilling down with other determinants.
For instance, we might start with the question: What is the nature of the hypothesis? Then, ask the
question: How many variables are involved? And then ask: What is the level of measurement of
each of the variables? And so on. Alternatively, we might start with, say, the number of variables
in the hypothesis, then the nature of the hypothesis and so on.
We suggest starting with the question of a number of variables. The following sections present the
self-explanatory flow charts of how to choose a test once you started with the question: How many
variables are involved in the hypothesis? One or two or more than two. Accordingly, the sections
are titled as Statistical Methods for Univariate /Bi-variate /Multivariate data
3.1. Statistical Methods for Univariate Data
Figure 3 presents the flowchart of how a method can be chosen when the hypothesis involves just
one variable.
Page 7 of 15
Figure 3: Statistical Methods for Univariate Data
We will ask what is it that we are trying to do. Are we trying to describe the data or Are we trying
to make an inference? Trying to make an inference with univariate data generally involves testing
whether the population mean equals a particular numeral like whether µ =3..
Let us look at the first wing: Descriptive statistics.
The kind of descriptive statistics we can use to describe the univariate data straight away depends
on the level of measurement of the variable.
● For nominal data, the measure of central tendency is always mode and mode is the only
choice if your data is nominal. Further, we don't have any measure of spread or variance
when data is on a nominal scale.
● When data is on an ordinal scale, we have two choices of central tendency that is mode and
median. We can use the interquartile range as a measure of dispersion or variance.
Page 8 of 15
● When data is measured in interval or ratio scale, we can use all the three measures of central
tendency, i.e. mean, median and mode. And we can also use several measures of dispersion
such as interquartile range, range, variance and standard deviation.
On the other hand, if we are interested in the hypothesis whether the population mean equals a
particular numeral like µ =3?So, in this case, we call it a hypothesis of difference involving a single
variable and the test is one-sample t-test. Our univariate data is on the numerical scale (interval or
ratio), so we use the one-sample t-test.
3.2. Statistical Methods for Bi-variate Data
Quite often, we will be interested in testing the hypothesis that involves two variables or
sometimes we also have one variable measured across two samples.
Figure 4 presents the flowchart of how a method can be chosen when the hypothesis involves two
variables or two samples measured on one variable.
Page 9 of 15
Figure 4: Statistical Methods for Bi-variate Data
We will start with the question:
What is the nature of the hypothesis?
---Hypothesis of Association/Causation OR Hypothesis of Differences
1. Hypothesis of Difference: A hypothesis of difference in this context generally involves testing
for the equality of two population means (whether µ1=µ2?).
Page 10 of 15
Then, we can ask this question :
Is this data parametric or non-parametric?
When the data is parametric(meaning the underlying variable has a distribution), we will ask this
question whether the samples are independent or dependent. In independent samples, we measure
one variable on two samples whereas a dependent sample generally involves repeated
measurements(twice) of the same variables on a single sample.
If the samples happen to be independent, we use an independent sample t-test, otherwise we use a
paired sample t-test.
And for non-parametric data, we use the Mann-Whitney U test to test the hypothesis of differences.
2. Hypothesis of Association: In Hypothesis of Association again we ask this question whether
the data is parametric or non-parametric. And if the data is parametric, the next level question is
whether we want to look at the association between the two variables or there is a cause-effect
relationship. In Association between the variables, we simply try to know whether two variables
are related. Whereas in causation one of the variables is dependent and the other will be
independent and we just want to see to what extent the independent variable explains the changes
in the dependent variable.
For parametric data, when we are examining the association; the test will be the Pearson
coefficient correlation. And for causation we use Regression.
For non-parametric data, we ask a next level question: whether the data is measured on a nominal
or ordinal scale. If it measured on a nominal scale we use Chi-square test of association. If the data
is measured on an ordinal scale, we use Spearman’s Rank correlation.
Page 11 of 15
3.3. Statistical Methods for Multivariate Data
Figure 5 presents the flowchart of how a method can be chosen when the hypothesis involves more
than two variables.
Figure 5: Statistical Methods for Multivariate Data (1)
In multivariate data, again we will start with the same question whether it is the hypothesis of
difference or the hypothesis of association.
Under the hypothesis of difference again we need to know that data is parametric or non-
parametric. When the data happened to be parametric, we use ANOVA and if the data is
nonparametric, we use Kruskal-Wallis.
Page 12 of 15
Testing a Hypothesis of Association we can ask the question: What is the level of measurement
of the dependent variable, i.e., numerical or categorical?
When the dependent variable is numerical, the next question is to look at whether all independent
variables are also numerical? If all the independent variables are also numerical, then we use
Multiple Regression.
When the dependent variable is categorical, then we look at the type of independent variables. If
all the independent variables are numerical, then we use Multiple Discriminant Analysis. We may
have a case where one or two independent variables are categorical and other variables are
numerical. In this case we use Logistic Regression.
Figure 6 presents the flowchart of how a method is chosen in some special cases involving more
than two variables.
Page 13 of 15
Figure 6: Statistical Methods for Univariate Data (2)
When we are interested in variable/Dimension Reduction that means we don’t have dependent and
independent relation between the variables or when we are working at the item level and we would
like to group the items into certain variables, we use the Factor Analysis. And, of course, the factor
analysis has two variants: exploratory analysis and conformity analysis.
And sometimes we are interested, based on some criteria, to group the cases or respondents(not
the variables) of our study then in such case we will use Cluster Analysis.
The major difference between the factor Analysis and Cluster analysis is:
In Factor Analysis, several variables or several items are grouped into fewer Dimensions or fewer
Variables. In Cluster Analysis, the respondents or subjects in the study are grouped into certain
clusters.
Page 14 of 15
We might also have a situation where you examine several relationships and there are multiple
dependencies. Then, we use Structural Equation Modelling.
4. Choosing between the Z Test and t-test
One more important confusion normally people have is when to use Z -test and when to use t-test.
In the previous discussion, wherever we used t-test that could be a possibility, Z-test can be used.
Figure 7 presents the flow chart of how to choose between a t-test and z-test.
Figure 7: Choosing between Z test and t-test
We start with the question: Is population normal? If the population is normal, then we go with
another question: Is the standard deviation of the population known?
Page 15 of 15
If population is normal and the standard deviation of the population is known, we use Z-test. If
the standard deviation of the population is not known, then we use t-test.
If the population is not normal, then we ask the question as to whether the sample size is more than
or equal to 30. If the sample size is more than or equal to 30, then we go back to the same logic of
asking the question: Is the standard deviation of the population known? If the standard deviation
of the population is known, we use Z-test. If the standard deviation of the population is not known
then we use t-test.
If the sample size is not more than 30, we need to ask whether it is a large population. If it is a
large population, we use Binomial test; if it is not a large population, we use Hyper Geometric
Test.
References
1. Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2013). Multivariate data analysis:
Pearson new international edition. Pearson Higher Ed.
2. Field, A. (2013). Discovering statistics using IBM SPSS statistics. sage.

More Related Content

PPTX
Introduction to Structural Equation Modeling
PPT
Descriptive statistics ppt
ODP
Multiple Linear Regression II and ANOVA I
PPTX
Analysis of data in research
PDF
Choosing the Right Statistical Techniques
PPTX
Descriptive statistics
PPTX
INFERENTIAL STATISTICS: AN INTRODUCTION
PDF
Data Analysis with SPSS PPT.pdf
Introduction to Structural Equation Modeling
Descriptive statistics ppt
Multiple Linear Regression II and ANOVA I
Analysis of data in research
Choosing the Right Statistical Techniques
Descriptive statistics
INFERENTIAL STATISTICS: AN INTRODUCTION
Data Analysis with SPSS PPT.pdf

What's hot (20)

PPTX
Misuses of statistics
PPSX
Inferential statistics.ppt
PPTX
Hypothesis Testing
PDF
Measures of central tendency
PDF
Introduction to Statistics
PDF
Pearson Product Moment Correlation - Thiyagu
PDF
Phi Coefficient of Correlation - Thiyagu
PPT
Anova post hoc
PPTX
Null hypothesis for a chi-square goodness of fit test
PPTX
Probability sampling
PPTX
Linear correlation
PDF
Confirmatory Factor Analysis
PPTX
Correlation Analysis
PPTX
Sampling and sampling distributions
PPT
Likert scale
PPTX
Descriptive Statistics
PPTX
Projective techniques
PPT
Quantitative analysis
PPTX
Basic Descriptive statistics
PPTX
Research design
Misuses of statistics
Inferential statistics.ppt
Hypothesis Testing
Measures of central tendency
Introduction to Statistics
Pearson Product Moment Correlation - Thiyagu
Phi Coefficient of Correlation - Thiyagu
Anova post hoc
Null hypothesis for a chi-square goodness of fit test
Probability sampling
Linear correlation
Confirmatory Factor Analysis
Correlation Analysis
Sampling and sampling distributions
Likert scale
Descriptive Statistics
Projective techniques
Quantitative analysis
Basic Descriptive statistics
Research design
Ad

Similar to Selection of appropriate data analysis technique (20)

DOCX
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
PPTX
Univariate Analysis
PPTX
F unit 5.pptx
PDF
Selection of appropriate statistics and tools
PPTX
Research Workshop for educators and teachers.pptx
PPTX
statistical analysis gr12.pptx lesson in research
PPTX
statistical analysis.pptx
PDF
Lessons learnt in statistics essay
PPTX
Descriptive Analysis.pptx
PPT
Week-7-Slides-Mean-Tests-Parametrics-Test-selection-module.ppt
PPTX
Hypothesis Testing.pptx ( T- test, F- test, U- test , Anova)
PPTX
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
PPTX
050325Online SPSS.pptx spss social science
PPT
Chapter34
PPTX
Analysis and interpretation of data
PPT
PPT
Ressearch design - Copy.ppt
PPT
Stats-Review-Maie-St-John-5-20-2009.ppt
PDF
Basic stat tools
Planning-Data-Analysis-CHOOSING-STATISTICAL-TOOL.docx
Univariate Analysis
F unit 5.pptx
Selection of appropriate statistics and tools
Research Workshop for educators and teachers.pptx
statistical analysis gr12.pptx lesson in research
statistical analysis.pptx
Lessons learnt in statistics essay
Descriptive Analysis.pptx
Week-7-Slides-Mean-Tests-Parametrics-Test-selection-module.ppt
Hypothesis Testing.pptx ( T- test, F- test, U- test , Anova)
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
050325Online SPSS.pptx spss social science
Chapter34
Analysis and interpretation of data
Ressearch design - Copy.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
Basic stat tools
Ad

More from RajaKrishnan M (20)

PPTX
Shortcomings of Demat Account
PPTX
Demat Account Services
PPTX
Depository Participant
PPTX
Services provided in Mobile Banking
PPTX
Ombudsman scheme
PPTX
Factors affecting share price
PPTX
Rights of investors
PPTX
Loss of Confidence of small investors
PPTX
Facilities by BSE
PPTX
Technological forces fueling e-commerce
PPTX
Encryption and Decryption
PPTX
Meaning, Anatomy and Forces Fueling e-commerce
PPTX
Forces Fueling e-commerce
PPTX
Inter Organizational e-commerce
PDF
Factors for the success of m-commerce
PPTX
Advantages of E-Commerce
PPTX
Types of E-Commerce
PPTX
E-Commerce and E- Businesss
PPTX
PPTX
Electronic Data Interchange & Internet
Shortcomings of Demat Account
Demat Account Services
Depository Participant
Services provided in Mobile Banking
Ombudsman scheme
Factors affecting share price
Rights of investors
Loss of Confidence of small investors
Facilities by BSE
Technological forces fueling e-commerce
Encryption and Decryption
Meaning, Anatomy and Forces Fueling e-commerce
Forces Fueling e-commerce
Inter Organizational e-commerce
Factors for the success of m-commerce
Advantages of E-Commerce
Types of E-Commerce
E-Commerce and E- Businesss
Electronic Data Interchange & Internet

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Types and Its function , kingdom of life
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Complications of Minimal Access Surgery at WLH
2.FourierTransform-ShortQuestionswithAnswers.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Final Presentation General Medicine 03-08-2024.pptx
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
Pharma ospi slides which help in ospi learning
STATICS OF THE RIGID BODIES Hibbelers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
human mycosis Human fungal infections are called human mycosis..pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
O7-L3 Supply Chain Operations - ICLT Program
Classroom Observation Tools for Teachers
Cell Types and Its function , kingdom of life
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Complications of Minimal Access Surgery at WLH

Selection of appropriate data analysis technique

  • 1. Page 1 of 15 DESCRIPTION OF THE TOPIC Choosing the right statistical method for data analysis is always a challenge as it dependent on a host of things. Before we discuss the major determinants of choice of a method in detail, it is also important to understand that one should have a Research/Data Analysis blueprint of the study one is undertaking. 1. Research/Data Analysis Blueprint Generally, the research starts with a broad research question that is often divided into more measurable, narrower objectives (See Figure 1). Each objective is achieved by splitting the subject matter into certain statistically testable hypotheses. Items Description of Topic Course Data Analysis for Social Science Teachers Topic Choosing the Right Statistical Method for Data Analysis
  • 2. Page 2 of 15 Figure 1: The Research Blueprint--Objective Hypotheses Mapping There is no standard rule as to how many hypotheses a research objective can have. One research objective might have one or two or more hypotheses. However, it is important that each objective be split into one or more testable hypotheses. In order that one is clear about how a hypothesis is tested, one must identify the variables associated with each of the hypotheses (see Figure 2). There is no rule as to how many variables a hypothesis will have. There could be a hypothesis with just one variable (such as test of population mean to be equal to a number) or there could be two variables (like tests of hypothesis of association or difference) or even more (like factor analysis/multiple regression). Each of the variables is then identified as a Dependent or Independent variable given the nature of the hypothesis being tested. Further against each variable, its level of measurement is noted. We shall have them noted as Nominal, Ordinal, Interval or Ratio. Often the nominal and ordinal levels are be combined into Categorical whereas the Interval and Ratio levels are labeled as Numerical.
  • 3. Page 3 of 15 The categorical variable is also called Non-Metric or non-Parametric variable. The Numerical Variables are also called metric or parametric or sometimes even as a continuous variable by some authors. Figure 2: The Research Blueprint—Objective-Hypothesis-Variable-Test Mapping 2. Major Determinants of Choice of s Statistical Method The choice of particular statistical method is generally determined the following: a) Number and Level of Measurement of Variables b) Distribution of the variable c) Dependence and Independence Structure d) Nature of the Hypothesis e) Sample Size We shall now briefly discuss the above:
  • 4. Page 4 of 15 2.1. Level of Measurement of Variables We know that there are four levels of measurement: a) Nominal b) Ordinal c) Interval d) Ratio Often the nominal and ordinal levels are to be combined into Categorical whereas the Interval and Ratio levels are to be labeled as Numerical. The categorical variable is also called Non-Metric or non-Parametric variable. The Numerical Variables are also called metric or parametric or sometimes it is even called a continuous variable by some authors. While choosing a particular test, we shall be asking the question: What is the level of measurement of the data? --Nominal/Ordinal/interval/Ration Or simply Categorical or Numerical? 2.2. Distribution of Underlying Variables Based on the level of measurement, the data might follow a distribution like Normal, Binominal, Poisson etc. and it might not have a distribution. The variables measured on nominal and ordinal scales generally do not have any distribution whereas the numerical variables might follow a normal distribution or other distribution. The tests that are used when the categorical variables are involved are called non-parametric or distribution-free tests. The tests that are used with numerical variables will be called parametric tests. While choosing a particular test we shall be asking the question:
  • 5. Page 5 of 15 Is the data parametric (measured on a numerical scale) or non-parametric (measured on a categorical scale)? 2.3. Nature of Hypothesis Broadly a hypothesis can be categorized as: a) Hypothesis of Association/Causation and b) Hypothesis of Differences The hypothesis of association/causation examines the nature and strength of the relationship between variables. Correlation, Regression are such examples. The hypothesis of difference examines whether the two populations differ on a parameter like mean. Using hypothesis of difference, we generally test the equality of two or more population means. While choosing a particular test, we shall be asking the question: What is the nature of the hypothesis? ---Hypothesis of Association/Causation OR Hypothesis of Differences 2.4. No. of Variables in the Hypothesis The number of variables associated with a hypothesis is also an important determinant of the choice of a statistical technique. Based on the number of variables, we sometimes even classify the statistical techniques as Univariate (involving one variable) /Bi-variate (two variables)/ Multivariate (more than two) techniques.
  • 6. Page 6 of 15 While choosing a particular test, we shall be asking the question: How many Variables are there in the hypothesis? -- One or two or more than two 3. An approach for Choosing a Statistical Method Several authors present different approaches to choose a statistical method. An approach generally involves starting with one of the above determinants and drilling down with other determinants. For instance, we might start with the question: What is the nature of the hypothesis? Then, ask the question: How many variables are involved? And then ask: What is the level of measurement of each of the variables? And so on. Alternatively, we might start with, say, the number of variables in the hypothesis, then the nature of the hypothesis and so on. We suggest starting with the question of a number of variables. The following sections present the self-explanatory flow charts of how to choose a test once you started with the question: How many variables are involved in the hypothesis? One or two or more than two. Accordingly, the sections are titled as Statistical Methods for Univariate /Bi-variate /Multivariate data 3.1. Statistical Methods for Univariate Data Figure 3 presents the flowchart of how a method can be chosen when the hypothesis involves just one variable.
  • 7. Page 7 of 15 Figure 3: Statistical Methods for Univariate Data We will ask what is it that we are trying to do. Are we trying to describe the data or Are we trying to make an inference? Trying to make an inference with univariate data generally involves testing whether the population mean equals a particular numeral like whether µ =3.. Let us look at the first wing: Descriptive statistics. The kind of descriptive statistics we can use to describe the univariate data straight away depends on the level of measurement of the variable. ● For nominal data, the measure of central tendency is always mode and mode is the only choice if your data is nominal. Further, we don't have any measure of spread or variance when data is on a nominal scale. ● When data is on an ordinal scale, we have two choices of central tendency that is mode and median. We can use the interquartile range as a measure of dispersion or variance.
  • 8. Page 8 of 15 ● When data is measured in interval or ratio scale, we can use all the three measures of central tendency, i.e. mean, median and mode. And we can also use several measures of dispersion such as interquartile range, range, variance and standard deviation. On the other hand, if we are interested in the hypothesis whether the population mean equals a particular numeral like µ =3?So, in this case, we call it a hypothesis of difference involving a single variable and the test is one-sample t-test. Our univariate data is on the numerical scale (interval or ratio), so we use the one-sample t-test. 3.2. Statistical Methods for Bi-variate Data Quite often, we will be interested in testing the hypothesis that involves two variables or sometimes we also have one variable measured across two samples. Figure 4 presents the flowchart of how a method can be chosen when the hypothesis involves two variables or two samples measured on one variable.
  • 9. Page 9 of 15 Figure 4: Statistical Methods for Bi-variate Data We will start with the question: What is the nature of the hypothesis? ---Hypothesis of Association/Causation OR Hypothesis of Differences 1. Hypothesis of Difference: A hypothesis of difference in this context generally involves testing for the equality of two population means (whether µ1=µ2?).
  • 10. Page 10 of 15 Then, we can ask this question : Is this data parametric or non-parametric? When the data is parametric(meaning the underlying variable has a distribution), we will ask this question whether the samples are independent or dependent. In independent samples, we measure one variable on two samples whereas a dependent sample generally involves repeated measurements(twice) of the same variables on a single sample. If the samples happen to be independent, we use an independent sample t-test, otherwise we use a paired sample t-test. And for non-parametric data, we use the Mann-Whitney U test to test the hypothesis of differences. 2. Hypothesis of Association: In Hypothesis of Association again we ask this question whether the data is parametric or non-parametric. And if the data is parametric, the next level question is whether we want to look at the association between the two variables or there is a cause-effect relationship. In Association between the variables, we simply try to know whether two variables are related. Whereas in causation one of the variables is dependent and the other will be independent and we just want to see to what extent the independent variable explains the changes in the dependent variable. For parametric data, when we are examining the association; the test will be the Pearson coefficient correlation. And for causation we use Regression. For non-parametric data, we ask a next level question: whether the data is measured on a nominal or ordinal scale. If it measured on a nominal scale we use Chi-square test of association. If the data is measured on an ordinal scale, we use Spearman’s Rank correlation.
  • 11. Page 11 of 15 3.3. Statistical Methods for Multivariate Data Figure 5 presents the flowchart of how a method can be chosen when the hypothesis involves more than two variables. Figure 5: Statistical Methods for Multivariate Data (1) In multivariate data, again we will start with the same question whether it is the hypothesis of difference or the hypothesis of association. Under the hypothesis of difference again we need to know that data is parametric or non- parametric. When the data happened to be parametric, we use ANOVA and if the data is nonparametric, we use Kruskal-Wallis.
  • 12. Page 12 of 15 Testing a Hypothesis of Association we can ask the question: What is the level of measurement of the dependent variable, i.e., numerical or categorical? When the dependent variable is numerical, the next question is to look at whether all independent variables are also numerical? If all the independent variables are also numerical, then we use Multiple Regression. When the dependent variable is categorical, then we look at the type of independent variables. If all the independent variables are numerical, then we use Multiple Discriminant Analysis. We may have a case where one or two independent variables are categorical and other variables are numerical. In this case we use Logistic Regression. Figure 6 presents the flowchart of how a method is chosen in some special cases involving more than two variables.
  • 13. Page 13 of 15 Figure 6: Statistical Methods for Univariate Data (2) When we are interested in variable/Dimension Reduction that means we don’t have dependent and independent relation between the variables or when we are working at the item level and we would like to group the items into certain variables, we use the Factor Analysis. And, of course, the factor analysis has two variants: exploratory analysis and conformity analysis. And sometimes we are interested, based on some criteria, to group the cases or respondents(not the variables) of our study then in such case we will use Cluster Analysis. The major difference between the factor Analysis and Cluster analysis is: In Factor Analysis, several variables or several items are grouped into fewer Dimensions or fewer Variables. In Cluster Analysis, the respondents or subjects in the study are grouped into certain clusters.
  • 14. Page 14 of 15 We might also have a situation where you examine several relationships and there are multiple dependencies. Then, we use Structural Equation Modelling. 4. Choosing between the Z Test and t-test One more important confusion normally people have is when to use Z -test and when to use t-test. In the previous discussion, wherever we used t-test that could be a possibility, Z-test can be used. Figure 7 presents the flow chart of how to choose between a t-test and z-test. Figure 7: Choosing between Z test and t-test We start with the question: Is population normal? If the population is normal, then we go with another question: Is the standard deviation of the population known?
  • 15. Page 15 of 15 If population is normal and the standard deviation of the population is known, we use Z-test. If the standard deviation of the population is not known, then we use t-test. If the population is not normal, then we ask the question as to whether the sample size is more than or equal to 30. If the sample size is more than or equal to 30, then we go back to the same logic of asking the question: Is the standard deviation of the population known? If the standard deviation of the population is known, we use Z-test. If the standard deviation of the population is not known then we use t-test. If the sample size is not more than 30, we need to ask whether it is a large population. If it is a large population, we use Binomial test; if it is not a large population, we use Hyper Geometric Test. References 1. Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2013). Multivariate data analysis: Pearson new international edition. Pearson Higher Ed. 2. Field, A. (2013). Discovering statistics using IBM SPSS statistics. sage.