SlideShare a Scribd company logo
P a g e 1 | 8
DESCRIPTION OF THE TOPIC
Items Description of the Topic
Course Data Analysis for Social Science Teachers
Topic Introduction to Data Analysis
Module Id 1.1
Introduction
In the recent past, quite a bit of importance has been given to data analysis in
research. One of the possible reasons is that empirical evidence establishes a firm grounding
to either accept or reject the proposed hypotheses. The choice of the statistical technique
depends on the nature of the research problem or question and also on the nature of the data
set.
The research questions to solve a research gap or problem may be related to
identifying the degree of relationships among variables, checking for the significance of
group differences, predicting of group memberships or structure, or it could be time-related.
In order to identify associations between two or more variables, depending on
whether their nature of being parametric or non-parametric, correlation, and regression or chi-
square techniques may be adopted. This can be done as a Bi-variate correlation and
regression, multiple correlation and regression, Canonical correlation, Multiple Discriminant
Analysis, and Log-it regression. The bi-variate correlation is a good starting point to identify
the degree of relationship between two continuous variables, such as job and family
satisfaction where either of them can be treated as a DV and IV as the research question may
be. But bi-variate regression would require one of them to be defined as the DV and the other
as the IV. Although these are not multivariate techniques, they form the basis of the
Multivariate Analysis (MVA).
1. Importance of Multivariate Analysis (MVA)
If watching a movie needs to be a pleasant experience, the lighting, the projected film
light and sound effects in the theatre must be optimum. The other factors that may contribute
to a pleasant viewing experience may include, but not limited to, seating arrangements, air-
conditioning and hall odour. If one has to study or measure the pleasantness of watching a
movie in a theatre, all of the above factors must be studied together and not in isolation.
There is a possibility of an unpleasant parking experience that may negatively impact the
pleasantness of watching a movie. So, the real value of measuring the pleasantness of a
movie-watching experience lies in measuring all the influencing factors together.
This is exactly what Multivariate analysis is all about. So, analysis of multiple
variables simultaneously would result in a better picture to arrive at inferences instead of
multiple uni-variate analyses done with the individual variables. Statistical Techniques that
simultaneously analyse multiple measurements of the observed variables are known as
Multivariate Analysis (MVA). We may perform MVA by using multiple variables in a single
relationship or in multiple relationships.
P a g e 2 | 8
In a truly multivariate scenario, all variables must be:
i. Random in nature,
ii. Inter-related, and
iii. Interpreted in unison.
Reading the paper related to testing the Greenhaus and Allen model by Pattusamy and
Jacob (2015) will help in understanding our forthcoming discussions and answering a few
questions in the end. The theoretical model is shown in Figure 1.
Figure 1 - Theoretical Model
From Figure 1, it is seen that family-work conflict (FWC) will have a negative effect
on job satisfaction (JS) while family-work facilitation (FWF) will have a positive effect on
job satisfaction. Similarly, work-family facilitation (WFF) will have a positive effect on
family satisfaction (FS) while work-family conflict (WFC) will have a negative effect on
family satisfaction. Both job and family satisfaction will influence feelings of work-family
balance positively which in turn will positively influence life satisfaction (LS). All the above
statements have been hypothesized and can be stated conclusively if we have empirical data
to establish the stated hypotheses. The use of appropriate statistical methods will facilitate the
data analysis to arrive at well-grounded inferences and conclusions.
Univariate statistical tests involve one dependent variable. Examples include, but are
not limited to, t-tests of means, analysis of variance (ANOVA), analysis of covariance and
simple linear regression (with one dependent and one independent variable). Having said so
much about the importance of data analysis, let us have a quick look at a few multivariate
techniques that we are likely to study in detail during the course of this study.
The next section leads us to the classification of MVA.
P a g e 3 | 8
2. Classification of MVA
MVA can be classified as Dependence techniques and Interdependence techniques.
2.1 Dependence techniques (used when there are one or more dependent variables and
independent variables. Eg. Multiple regression analysis)
i. Multiple regression and multiple correlation
ii. Multiple Discriminant Analysis (MDA) and Logistic Regression
iii. Canonical Correlation Analysis
iv. Multivariate Analysis of Variance and Covariance
v. Conjoint Analysis
vi. Structural Equation Modelling (SEM) and Confirmatory Factor Analysis (CFA)
2.1.1 Multiple Regression
Let us presume that some previous research has established that cars with higher
engine capacity and higher unladen weight offer lesser fuel efficiency (possibly validated
using a correlation analysis). If a researcher wants to predict the fuel efficiency based on
engine capacity and unladen weight, then fuel efficiency is treated as the dependent variable
while engine capacity and unladen weight are treated as the independent variables. The
researcher collects data on fuel efficiency, engine capacity and an unladen weight of about
100 cars or more (that run on the same type of fuel) and would possibly use the multiple
regression (MR) method to predict fuel efficiency. In order to use the MR method the
dependent and the independent variables (two or more) must be metric data.
2.1.2 Multiple Discriminant Analysis (MDA) and Logit Analysis
If the dependent variable is dichotomous (Yes/No, Men / Women) type, then MDA is
an appropriate technique. The independent variables need to be metric data. MDA helps to
understand group differences and to predict the possibility that an observation or object
would belong to a specific group. An example that we had discussed in MR in the previous
section, suppose we had data on the engine capacity and unladen weight of about 100 plus
cars (that run on the same type of fuel) and if we want to classify them as Big and Small cars,
then MDA would be a relevant technique.
Logit Analysis also is known as Logistics regression is a combination of MR and
MDA. Although the regression principle is similar to that of MR, the DV in Logit regression
need not be metric as in the case of MR but can be a dichotomous variable as in MDA.
Another distinguishing fact of Logit regression is that it can accommodate both metric and
non-metric IVs and overlook the multivariate normality assumption.
P a g e 4 | 8
2.1.3 Canonical Correlation Analysis
If there are multiple metric dependent and metric independent variables to be
correlated and regressed, then the right tool is Canonical Correlation Analysis. We actually
try to determine the associations between two sets of variables. For example, we might study
the relationship between a number of indices of fuel efficiency (the DVs such as Indicated
Horse Power (IHP) and Brake Horse Power (BHP)) and the IVs (such as engine capacity,
unladen weight of the car, and age of the car).
2.1.4 Multivariate Analysis of Variance and Covariance
In-order to simultaneously explore the relationship between multiple categorical
independent variables, which are also called treatments, and two or more metric dependent
variables, an ideal technique would be the Multivariate Analysis of Variance and Covariance
(MANOVA). If the analysis requires the elimination of the effect of the uncontrolled metric
independent variables, which are known as covariates, on the dependent variables, then the
multivariate analysis of covariance (MANCOVA) is used. Both MANOVA and MANCOVA
may be done as one way or factorial. In our car example with fuel efficiency as the DV, age
of the car can be treated as a covariate.
2.1.5 Conjoint Analysis
Conjoint Analysis is a contemporary dependence technique that would help a decision-maker
(product design head) evaluate the importance of attributes (typically product attributes)
along with its levels. Let us say we have three attributes of a car, namely, airbags (2, 4, or 6
airbags), speakers for infotainment (2, 4 or 6 speakers) and steering wheel height adjustment
(low, medium and high). If we want to know popular combinations preferred by car
enthusiasts, we may have to ask them to rate all of the 27 combinations. For example a car
enthusiast may prefer 6 airbags, 4 speakers and medium height for his steering wheel.
Likewise, there are 27 possible combinations. However, using conjoint analysis it is possible
to capture the ratings of the prospective car buyer with just 9 or more combinations. The
conjoint analysis helps a great deal is product design simulation studies.
2.1.6 Structural Equation Modelling (SEM) and Confirmatory Factor Analysis
(CFA)
While multiple regression examines a single relationship between a DV and multiple
IVs in an SEM, it is possible to examine multiple relationships simultaneously. Generally, a
CFA is done prior to the SEM. The SEM consists of the structural and the measurement
model. The structural model may have one or more DVs and one or more IVs with all
relationships defined. Each of the DVs and IVS may be either uni- or multi-dimensional and
each of the dimensions may be measured using scale items for indicators. The CFA will show
the contribution of each scale item to its dimension and the extent to which it measures the
same. By this the measurement model is evaluated. After the validity and reliability of the
P a g e 5 | 8
measurement model are established, the structural model is evaluated to establish and prove
or disprove hypotheses. Hence, SEM supports simultaneous assessment of relationships and
accommodates multi-item scales.
2. 2 Interdependence techniques (absence of dependent or independent variables but
involves techniques to simultaneously analyze all variables together in the set. Eg. Factor
Analysis).
a) Factor Analysis (both Principal Component Analysis and Common Factor Analysis)
b) Cluster Analysis
c) Perceptual Mapping (also called as Multidimensional Scaling)
d) Correspondence Analysis
2.2.1 Factor Analysis
The objective of factor analysis is to reduce the number of measured variables into
meaningful factors (or variates) with minimal loss of information. This can either be done by
the PCA method or by common factor analysis. Suppose a prospective car buyer is
considering the color of the car, the aerodynamic design, body-colored bumpers, height-
adjustable steering column, driver seat height adjustment, touch screen for infotainment, ABS
and Airbags. If the opinion of the car buyer is captured using a 7 point Likert scale, either
PCA or common factor analysis may group these eight variables in three groups, namely,
external features (colour of the car, the aerodynamic design, body-colored bumpers), internal
features (height-adjustable steering column, driver seat height adjustment, touch screen for
infotainment) and safety features (ABS and Airbags). So factor analysis helps us to reduce
eight variables into three meaningful factors (variates).
2.2.2 Cluster Analysis
In the car example that we have been discussing so far, suppose we have the data on
engine capacities of about 130 cars with the engine capacities ranging from a minimum of
799cc to 2399cc and we want these 130 cars to be placed in three groups, namely, small,
medium and large cars, cluster analysis would be a recommended technique. The Cluster
analysis algorithm places the objects in homogeneous groups depending on the characteristics
specified by the researcher. In our example, the cars would be placed in groups based on
engine capacity. Clustering can be done based on multiple characteristics too. Either
hierarchical or non-hierarchical clustering procedures may be adopted. Basically hierarchical
methods could be either agglomerative or divisive. The algorithms followed in the
hierarchical methods are single, complete and average linkage methods. The other methods
are the Centroid and Ward methods. Alternatively the non-hierarchical clustering popularly
follows the k-means algorithm and places objects in cluster groups once the number of
clusters is specified. The decision on whether to adopt the hierarchical or non-hierarchical
procedure depends on the choice of the researcher and the problem defined.
P a g e 6 | 8
2.2.3 Perceptual Mapping
If we consider two dimensions of the car, namely, fuel efficiency and driving comfort
and we want to know how the brands of cars currently available in the market are positioned
in the minds of the car enthusiasts and perceived by the car enthusiasts, the right technique is
Perceptual Mapping (PM) also known as Multi-dimensional Scaling (MDS). MDS
typically helps a researcher to determine the perceived relative image of the cars (in this case)
considering the two dimensions. In MDS, unlike in factor or cluster analysis, a solution can
be obtained for each respondent and there is no variate. The researcher makes choices
between similarity and preference data, disaggregate and aggregate analysis and on whether
to use the Compositional or decompositional methods. Although earlier MDS programs were
predominantly non-metric in output, the contemporary programs provide metric output.
2.2.4 Correspondence Analysis
If we have non-metric data such as colors of the cars, classification of car size such as
small, medium and large and we want to position the cars in a perceptual map, then the
technique to be adopted is the Correspondence Analysis (CA). It starts with a cross-tabulation
of the two attributes, namely, colors and car size; after that it carries out a non-metric to
metric conversion, and then leads to dimension reduction and finally the perceptual map is
prepared. CA is the best option for a multivariate representation of interdependence for non-
metric data.
3. Nature of Data
The following table gives a summary of the nature of data:
Name of the Multivariate
Technique
Nature of the Data
DV IV
Canonical Correlation Metric, Non-metric Metric, Non-metric
MANOVA Metric Non-metric
ANOVA Metric Non-metric
MDA Non-metric Metric
Multiple Regression Metric Metric, Non-metric
Conjoint Analysis Non-metric, Metric Non-metric
SEM Metric Metric, Non-metric
4 Some Generic Tips to Perform Multivariate Analysis
P a g e 7 | 8
While performing MVA on the research problem, it would help if the researcher observes the
following tips:
1. Ensure that both statistical and practical significance exists in the research being done.
2. The sample size should be adequate but neither under sized nor over sized.
3. Clearly, understand the nature of the data.
4. Use a minimum number of variables in the model to obtain the desired results.
5. Identify and eliminate errors.
6. Ensure a fool-proof validation of the results.
I hope the above content gives you a fair idea of the existing multivariate techniques
that we would be covering in our course and a snapshot of their applications. For further
learning, may I also suggest the open courseware by Cynthia et al., (2011), titled “Statistical
Thinking and Data Analysis”.
Although at the beginning of this discussion, I had suggested the reading of the paper
by Pattusamy and Jacob (2015), throughout the discussion I used examples relating to cars. If
you have understood the application of the discussed MVA tests with the variables in the car
example, you should be able to answer a few fundamental questions relating to data analysis
with respect to the variables in the paper. Here are your challenges.
Self-Assessment:
You could suggest appropriate statistical tests to answer the following research
questions. It does help if you could also justify your choice of the technique.
1. Are men more satisfied with their jobs than women?
2. Does life satisfaction vary with age?
3. Will feelings of work-life balance influence the relationship between job
satisfaction and life satisfaction?
4. Would there be a difference in the strength of the relationship between family
satisfaction and life satisfaction between men and women?
5. Would it be possible to categorize men who are highly and moderately satisfied in
their lives?
P a g e 8 | 8
References
1. Barbara G.T and Linda S.F, Using Multivariate Statistics, 6th Edition, Pearson Education
Inc, pp. 612-680.
2. Cynthia Rudin, Allison Chang, and Dimitrios Bisias. 15.075J Statistical Thinking and
Data Analysis. Fall 2011. Massachusetts Institute of Technology: MIT
OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.
3. Hair J.F, Black W.C, Babin B.J and Anderson R.E, Multivariate Data Analysis, 7th
Edition, Pearson Education (South Asia), pp. 89-149.
4. Murugan Pattusamy and Jayanth Jacob, A test of Greenhaus and Allen (2011) model on
Work Family Balance, Current Psychology, Springer, 2015.
5. Zumbo B.D. (2014) Univariate Tests. In: Michalos A.C. (eds) Encyclopedia of Quality
of Life and Well-Being Research. Springer, Dordrecht
***************************************************************************

More Related Content

PPT
cfa in mplus
PPT
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
PDF
IRJET- An Empirical Study on the Relationship Between Meditation, Emotion...
PPT
Factor anaysis scale dimensionality
PDF
Canonical correlation
PDF
cfa in mplus
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
IRJET- An Empirical Study on the Relationship Between Meditation, Emotion...
Factor anaysis scale dimensionality
Canonical correlation

What's hot (19)

PDF
Otto_Elmgart_Noise_Vol_Struct
DOC
Unit iv statistical tools
PDF
PDF
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
PPT
Slides sem on pls-complete
PPT
Overview of Multivariate Statistical Methods
DOC
Ash bus 308 week 4 quiz
PDF
Geert van Kollenburg-masterthesis
DOC
Ash bus 308 week 4 quiz
DOCX
Cb sem and pls-sem
DOC
Ash bus 308 week 4 quiz
PPTX
Discriminant analysis
PPTX
s.analysis
PDF
Multi-dimensional time series based approach for Banking Regulatory Stress Te...
PPTX
Factor analysis in R by Aman Chauhan
PPTX
Discriminant analysis
PDF
Optimised random mutations for
PDF
Evaluating competing predictive distributions
PPT
Chapter 05
Otto_Elmgart_Noise_Vol_Struct
Unit iv statistical tools
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Slides sem on pls-complete
Overview of Multivariate Statistical Methods
Ash bus 308 week 4 quiz
Geert van Kollenburg-masterthesis
Ash bus 308 week 4 quiz
Cb sem and pls-sem
Ash bus 308 week 4 quiz
Discriminant analysis
s.analysis
Multi-dimensional time series based approach for Banking Regulatory Stress Te...
Factor analysis in R by Aman Chauhan
Discriminant analysis
Optimised random mutations for
Evaluating competing predictive distributions
Chapter 05
Ad

Similar to Introduction to data analysis (20)

DOCX
Data Analytics Notes
PDF
2012-Nathans-PARE-RegressionGuidebook.pdf
DOCX
A researcher in attempting to run a regression model noticed a neg.docx
PPTX
Data-Analysis.pptx
DOCX
7Repeated Measures Designs for Interval DataLearnin.docx
PPTX
Multivariate Analysis AND Multivariate Analysis.pptx
PPTX
Introduction to regression
PPTX
binary logistic assessment methods and strategies
DOCX
Advanced StatisticsUnit 5There are several r.docx
PDF
STRUCTURAL EQUATION MODEL (SEM)
PDF
Brm unit iv - cheet sheet
DOCX
Manova Report
PPTX
ders 6 Panel data analysis.pptx
PDF
Factor analysis using spss 2005
PPTX
Factor analysis (1)
PPTX
Factor Analysis (Marketing Research)
PDF
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition
DOCX
Chapter 12Choosing an Appropriate Statistical TestiStockph.docx
PDF
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
PDF
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
Data Analytics Notes
2012-Nathans-PARE-RegressionGuidebook.pdf
A researcher in attempting to run a regression model noticed a neg.docx
Data-Analysis.pptx
7Repeated Measures Designs for Interval DataLearnin.docx
Multivariate Analysis AND Multivariate Analysis.pptx
Introduction to regression
binary logistic assessment methods and strategies
Advanced StatisticsUnit 5There are several r.docx
STRUCTURAL EQUATION MODEL (SEM)
Brm unit iv - cheet sheet
Manova Report
ders 6 Panel data analysis.pptx
Factor analysis using spss 2005
Factor analysis (1)
Factor Analysis (Marketing Research)
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition
Chapter 12Choosing an Appropriate Statistical TestiStockph.docx
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
Ad

More from RajaKrishnan M (20)

PPTX
Shortcomings of Demat Account
PPTX
Demat Account Services
PPTX
Depository Participant
PPTX
Services provided in Mobile Banking
PPTX
Ombudsman scheme
PPTX
Factors affecting share price
PPTX
Rights of investors
PPTX
Loss of Confidence of small investors
PPTX
Facilities by BSE
PPTX
Technological forces fueling e-commerce
PPTX
Encryption and Decryption
PPTX
Meaning, Anatomy and Forces Fueling e-commerce
PPTX
Forces Fueling e-commerce
PPTX
Inter Organizational e-commerce
PDF
Factors for the success of m-commerce
PPTX
Advantages of E-Commerce
PPTX
Types of E-Commerce
PPTX
E-Commerce and E- Businesss
PPTX
PPTX
Electronic Data Interchange & Internet
Shortcomings of Demat Account
Demat Account Services
Depository Participant
Services provided in Mobile Banking
Ombudsman scheme
Factors affecting share price
Rights of investors
Loss of Confidence of small investors
Facilities by BSE
Technological forces fueling e-commerce
Encryption and Decryption
Meaning, Anatomy and Forces Fueling e-commerce
Forces Fueling e-commerce
Inter Organizational e-commerce
Factors for the success of m-commerce
Advantages of E-Commerce
Types of E-Commerce
E-Commerce and E- Businesss
Electronic Data Interchange & Internet

Recently uploaded (20)

PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
master seminar digital applications in india
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
Institutional Correction lecture only . . .
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Pre independence Education in Inndia.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Renaissance Architecture: A Journey from Faith to Humanism
master seminar digital applications in india
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cell Structure & Organelles in detailed.
Institutional Correction lecture only . . .
TR - Agricultural Crops Production NC III.pdf
GDM (1) (1).pptx small presentation for students
Final Presentation General Medicine 03-08-2024.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Insiders guide to clinical Medicine.pdf
VCE English Exam - Section C Student Revision Booklet
Pre independence Education in Inndia.pdf
Sports Quiz easy sports quiz sports quiz

Introduction to data analysis

  • 1. P a g e 1 | 8 DESCRIPTION OF THE TOPIC Items Description of the Topic Course Data Analysis for Social Science Teachers Topic Introduction to Data Analysis Module Id 1.1 Introduction In the recent past, quite a bit of importance has been given to data analysis in research. One of the possible reasons is that empirical evidence establishes a firm grounding to either accept or reject the proposed hypotheses. The choice of the statistical technique depends on the nature of the research problem or question and also on the nature of the data set. The research questions to solve a research gap or problem may be related to identifying the degree of relationships among variables, checking for the significance of group differences, predicting of group memberships or structure, or it could be time-related. In order to identify associations between two or more variables, depending on whether their nature of being parametric or non-parametric, correlation, and regression or chi- square techniques may be adopted. This can be done as a Bi-variate correlation and regression, multiple correlation and regression, Canonical correlation, Multiple Discriminant Analysis, and Log-it regression. The bi-variate correlation is a good starting point to identify the degree of relationship between two continuous variables, such as job and family satisfaction where either of them can be treated as a DV and IV as the research question may be. But bi-variate regression would require one of them to be defined as the DV and the other as the IV. Although these are not multivariate techniques, they form the basis of the Multivariate Analysis (MVA). 1. Importance of Multivariate Analysis (MVA) If watching a movie needs to be a pleasant experience, the lighting, the projected film light and sound effects in the theatre must be optimum. The other factors that may contribute to a pleasant viewing experience may include, but not limited to, seating arrangements, air- conditioning and hall odour. If one has to study or measure the pleasantness of watching a movie in a theatre, all of the above factors must be studied together and not in isolation. There is a possibility of an unpleasant parking experience that may negatively impact the pleasantness of watching a movie. So, the real value of measuring the pleasantness of a movie-watching experience lies in measuring all the influencing factors together. This is exactly what Multivariate analysis is all about. So, analysis of multiple variables simultaneously would result in a better picture to arrive at inferences instead of multiple uni-variate analyses done with the individual variables. Statistical Techniques that simultaneously analyse multiple measurements of the observed variables are known as Multivariate Analysis (MVA). We may perform MVA by using multiple variables in a single relationship or in multiple relationships.
  • 2. P a g e 2 | 8 In a truly multivariate scenario, all variables must be: i. Random in nature, ii. Inter-related, and iii. Interpreted in unison. Reading the paper related to testing the Greenhaus and Allen model by Pattusamy and Jacob (2015) will help in understanding our forthcoming discussions and answering a few questions in the end. The theoretical model is shown in Figure 1. Figure 1 - Theoretical Model From Figure 1, it is seen that family-work conflict (FWC) will have a negative effect on job satisfaction (JS) while family-work facilitation (FWF) will have a positive effect on job satisfaction. Similarly, work-family facilitation (WFF) will have a positive effect on family satisfaction (FS) while work-family conflict (WFC) will have a negative effect on family satisfaction. Both job and family satisfaction will influence feelings of work-family balance positively which in turn will positively influence life satisfaction (LS). All the above statements have been hypothesized and can be stated conclusively if we have empirical data to establish the stated hypotheses. The use of appropriate statistical methods will facilitate the data analysis to arrive at well-grounded inferences and conclusions. Univariate statistical tests involve one dependent variable. Examples include, but are not limited to, t-tests of means, analysis of variance (ANOVA), analysis of covariance and simple linear regression (with one dependent and one independent variable). Having said so much about the importance of data analysis, let us have a quick look at a few multivariate techniques that we are likely to study in detail during the course of this study. The next section leads us to the classification of MVA.
  • 3. P a g e 3 | 8 2. Classification of MVA MVA can be classified as Dependence techniques and Interdependence techniques. 2.1 Dependence techniques (used when there are one or more dependent variables and independent variables. Eg. Multiple regression analysis) i. Multiple regression and multiple correlation ii. Multiple Discriminant Analysis (MDA) and Logistic Regression iii. Canonical Correlation Analysis iv. Multivariate Analysis of Variance and Covariance v. Conjoint Analysis vi. Structural Equation Modelling (SEM) and Confirmatory Factor Analysis (CFA) 2.1.1 Multiple Regression Let us presume that some previous research has established that cars with higher engine capacity and higher unladen weight offer lesser fuel efficiency (possibly validated using a correlation analysis). If a researcher wants to predict the fuel efficiency based on engine capacity and unladen weight, then fuel efficiency is treated as the dependent variable while engine capacity and unladen weight are treated as the independent variables. The researcher collects data on fuel efficiency, engine capacity and an unladen weight of about 100 cars or more (that run on the same type of fuel) and would possibly use the multiple regression (MR) method to predict fuel efficiency. In order to use the MR method the dependent and the independent variables (two or more) must be metric data. 2.1.2 Multiple Discriminant Analysis (MDA) and Logit Analysis If the dependent variable is dichotomous (Yes/No, Men / Women) type, then MDA is an appropriate technique. The independent variables need to be metric data. MDA helps to understand group differences and to predict the possibility that an observation or object would belong to a specific group. An example that we had discussed in MR in the previous section, suppose we had data on the engine capacity and unladen weight of about 100 plus cars (that run on the same type of fuel) and if we want to classify them as Big and Small cars, then MDA would be a relevant technique. Logit Analysis also is known as Logistics regression is a combination of MR and MDA. Although the regression principle is similar to that of MR, the DV in Logit regression need not be metric as in the case of MR but can be a dichotomous variable as in MDA. Another distinguishing fact of Logit regression is that it can accommodate both metric and non-metric IVs and overlook the multivariate normality assumption.
  • 4. P a g e 4 | 8 2.1.3 Canonical Correlation Analysis If there are multiple metric dependent and metric independent variables to be correlated and regressed, then the right tool is Canonical Correlation Analysis. We actually try to determine the associations between two sets of variables. For example, we might study the relationship between a number of indices of fuel efficiency (the DVs such as Indicated Horse Power (IHP) and Brake Horse Power (BHP)) and the IVs (such as engine capacity, unladen weight of the car, and age of the car). 2.1.4 Multivariate Analysis of Variance and Covariance In-order to simultaneously explore the relationship between multiple categorical independent variables, which are also called treatments, and two or more metric dependent variables, an ideal technique would be the Multivariate Analysis of Variance and Covariance (MANOVA). If the analysis requires the elimination of the effect of the uncontrolled metric independent variables, which are known as covariates, on the dependent variables, then the multivariate analysis of covariance (MANCOVA) is used. Both MANOVA and MANCOVA may be done as one way or factorial. In our car example with fuel efficiency as the DV, age of the car can be treated as a covariate. 2.1.5 Conjoint Analysis Conjoint Analysis is a contemporary dependence technique that would help a decision-maker (product design head) evaluate the importance of attributes (typically product attributes) along with its levels. Let us say we have three attributes of a car, namely, airbags (2, 4, or 6 airbags), speakers for infotainment (2, 4 or 6 speakers) and steering wheel height adjustment (low, medium and high). If we want to know popular combinations preferred by car enthusiasts, we may have to ask them to rate all of the 27 combinations. For example a car enthusiast may prefer 6 airbags, 4 speakers and medium height for his steering wheel. Likewise, there are 27 possible combinations. However, using conjoint analysis it is possible to capture the ratings of the prospective car buyer with just 9 or more combinations. The conjoint analysis helps a great deal is product design simulation studies. 2.1.6 Structural Equation Modelling (SEM) and Confirmatory Factor Analysis (CFA) While multiple regression examines a single relationship between a DV and multiple IVs in an SEM, it is possible to examine multiple relationships simultaneously. Generally, a CFA is done prior to the SEM. The SEM consists of the structural and the measurement model. The structural model may have one or more DVs and one or more IVs with all relationships defined. Each of the DVs and IVS may be either uni- or multi-dimensional and each of the dimensions may be measured using scale items for indicators. The CFA will show the contribution of each scale item to its dimension and the extent to which it measures the same. By this the measurement model is evaluated. After the validity and reliability of the
  • 5. P a g e 5 | 8 measurement model are established, the structural model is evaluated to establish and prove or disprove hypotheses. Hence, SEM supports simultaneous assessment of relationships and accommodates multi-item scales. 2. 2 Interdependence techniques (absence of dependent or independent variables but involves techniques to simultaneously analyze all variables together in the set. Eg. Factor Analysis). a) Factor Analysis (both Principal Component Analysis and Common Factor Analysis) b) Cluster Analysis c) Perceptual Mapping (also called as Multidimensional Scaling) d) Correspondence Analysis 2.2.1 Factor Analysis The objective of factor analysis is to reduce the number of measured variables into meaningful factors (or variates) with minimal loss of information. This can either be done by the PCA method or by common factor analysis. Suppose a prospective car buyer is considering the color of the car, the aerodynamic design, body-colored bumpers, height- adjustable steering column, driver seat height adjustment, touch screen for infotainment, ABS and Airbags. If the opinion of the car buyer is captured using a 7 point Likert scale, either PCA or common factor analysis may group these eight variables in three groups, namely, external features (colour of the car, the aerodynamic design, body-colored bumpers), internal features (height-adjustable steering column, driver seat height adjustment, touch screen for infotainment) and safety features (ABS and Airbags). So factor analysis helps us to reduce eight variables into three meaningful factors (variates). 2.2.2 Cluster Analysis In the car example that we have been discussing so far, suppose we have the data on engine capacities of about 130 cars with the engine capacities ranging from a minimum of 799cc to 2399cc and we want these 130 cars to be placed in three groups, namely, small, medium and large cars, cluster analysis would be a recommended technique. The Cluster analysis algorithm places the objects in homogeneous groups depending on the characteristics specified by the researcher. In our example, the cars would be placed in groups based on engine capacity. Clustering can be done based on multiple characteristics too. Either hierarchical or non-hierarchical clustering procedures may be adopted. Basically hierarchical methods could be either agglomerative or divisive. The algorithms followed in the hierarchical methods are single, complete and average linkage methods. The other methods are the Centroid and Ward methods. Alternatively the non-hierarchical clustering popularly follows the k-means algorithm and places objects in cluster groups once the number of clusters is specified. The decision on whether to adopt the hierarchical or non-hierarchical procedure depends on the choice of the researcher and the problem defined.
  • 6. P a g e 6 | 8 2.2.3 Perceptual Mapping If we consider two dimensions of the car, namely, fuel efficiency and driving comfort and we want to know how the brands of cars currently available in the market are positioned in the minds of the car enthusiasts and perceived by the car enthusiasts, the right technique is Perceptual Mapping (PM) also known as Multi-dimensional Scaling (MDS). MDS typically helps a researcher to determine the perceived relative image of the cars (in this case) considering the two dimensions. In MDS, unlike in factor or cluster analysis, a solution can be obtained for each respondent and there is no variate. The researcher makes choices between similarity and preference data, disaggregate and aggregate analysis and on whether to use the Compositional or decompositional methods. Although earlier MDS programs were predominantly non-metric in output, the contemporary programs provide metric output. 2.2.4 Correspondence Analysis If we have non-metric data such as colors of the cars, classification of car size such as small, medium and large and we want to position the cars in a perceptual map, then the technique to be adopted is the Correspondence Analysis (CA). It starts with a cross-tabulation of the two attributes, namely, colors and car size; after that it carries out a non-metric to metric conversion, and then leads to dimension reduction and finally the perceptual map is prepared. CA is the best option for a multivariate representation of interdependence for non- metric data. 3. Nature of Data The following table gives a summary of the nature of data: Name of the Multivariate Technique Nature of the Data DV IV Canonical Correlation Metric, Non-metric Metric, Non-metric MANOVA Metric Non-metric ANOVA Metric Non-metric MDA Non-metric Metric Multiple Regression Metric Metric, Non-metric Conjoint Analysis Non-metric, Metric Non-metric SEM Metric Metric, Non-metric 4 Some Generic Tips to Perform Multivariate Analysis
  • 7. P a g e 7 | 8 While performing MVA on the research problem, it would help if the researcher observes the following tips: 1. Ensure that both statistical and practical significance exists in the research being done. 2. The sample size should be adequate but neither under sized nor over sized. 3. Clearly, understand the nature of the data. 4. Use a minimum number of variables in the model to obtain the desired results. 5. Identify and eliminate errors. 6. Ensure a fool-proof validation of the results. I hope the above content gives you a fair idea of the existing multivariate techniques that we would be covering in our course and a snapshot of their applications. For further learning, may I also suggest the open courseware by Cynthia et al., (2011), titled “Statistical Thinking and Data Analysis”. Although at the beginning of this discussion, I had suggested the reading of the paper by Pattusamy and Jacob (2015), throughout the discussion I used examples relating to cars. If you have understood the application of the discussed MVA tests with the variables in the car example, you should be able to answer a few fundamental questions relating to data analysis with respect to the variables in the paper. Here are your challenges. Self-Assessment: You could suggest appropriate statistical tests to answer the following research questions. It does help if you could also justify your choice of the technique. 1. Are men more satisfied with their jobs than women? 2. Does life satisfaction vary with age? 3. Will feelings of work-life balance influence the relationship between job satisfaction and life satisfaction? 4. Would there be a difference in the strength of the relationship between family satisfaction and life satisfaction between men and women? 5. Would it be possible to categorize men who are highly and moderately satisfied in their lives?
  • 8. P a g e 8 | 8 References 1. Barbara G.T and Linda S.F, Using Multivariate Statistics, 6th Edition, Pearson Education Inc, pp. 612-680. 2. Cynthia Rudin, Allison Chang, and Dimitrios Bisias. 15.075J Statistical Thinking and Data Analysis. Fall 2011. Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA. 3. Hair J.F, Black W.C, Babin B.J and Anderson R.E, Multivariate Data Analysis, 7th Edition, Pearson Education (South Asia), pp. 89-149. 4. Murugan Pattusamy and Jayanth Jacob, A test of Greenhaus and Allen (2011) model on Work Family Balance, Current Psychology, Springer, 2015. 5. Zumbo B.D. (2014) Univariate Tests. In: Michalos A.C. (eds) Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht ***************************************************************************