SlideShare a Scribd company logo
FACTOR ANALYSIS
Course Teacher
Dr Dr. R. GangaiSelvi
(Assoc. Professor -Statistics)
Department of Statistics
BY
Namadara Sandhya
1st year PhD
202211004
Department of Agriculture
Microbiology
STA 602 (1+1)
Factor analysis – objective – designing and assumptions
– various rotations
🞭 It is a dimension reduction technique.
🞭 It is used when in analysis a large number of variables and it
is not possible to deal with all the variables simultaneously.
🞭 The factor analysis is of two types:
1. Exploratory FactorAnalysis (EFA)
2. Confirmatory Factor Analysis (CFA)
🞭 The EFA is used when the structure of underlying factors is
unknownand is to be determine.
🞭 The CFA is used when the structure of underlying factors is
already known and it is required to check whether the data
collected confirm that structureor not.
FACTOR ANALYSIS
🞭 The main objectives of the EFA are:
1. To identifying the underlying dimensions or factors that
explain the variation (or correlations) among the set of
variables.
2. To obtain a new smaller set of uncorrelated variables to
replace the original set of correlated variables in subsequent
analysis.
3. To obtain a smaller set of salient variables from a large set
for use in subsequent analysis.
EXPLORATORY FACTOR ANALYSIS
🞭 Both the techniques the Exploratory Factor Analysis (EFA)
and Principal Component Analysis (PCA) are termed as data
reduction techniques. But EFA and PCA can’t be seprated
from each other. PCA can be termed as a method of
performing the EFA. The PCA is a technique in which we
obtain the uncorrelated linear combinations of the variables
under study which are able to explain the variation (or
correlation) in the dataset, but is unable to answer the 2nd and
3rd objectives of the EFA i.e. How many factors should be
retained in the data and which variable should be considered
within which factor.
EFAVERSUS PRINCIPAL COMPONENT ANALYSIS
🞭 Factor Analysis is based on a model in which the observed
vector is partitioned into an unobserved systematic part and
an unobserved error part.
🞭 The components of error part are considered as independent
whereas systematic part is taken as a linear combination of
relatively small number of unobserved factor variables.
🞭 This model separates the effect of factors from the error.
🞭 The model for FactorAnalysis is defined as:
X = μ + Λf + U
🞭 X is the px1 vector of observed variables. It may be
considered as the score on a battery of test.
🞭 μ is (px1) vector of the average score of this test in the
population.
MODEL FOR FACTOR ANALYSIS
🞭 f is (mx1) vector of unobserved variables called as common
factors. These are the scores on hidden (underlying) ability whose
linear combinations enters to the test scores.
🞭 Λ is (pxm) matrix of the component loadings or factor loadings. It
consists of the coefficients of the linear combinations of factors.
🞭 U is (px1) vector of random error terms.
MODEL FOR FACTOR ANALYSIS
1. The mean of random error term is 0, i.e. E[U] = 0
2. The mean of common factor is 0, i.e. E[f] = 0
3. The variance of error term is ψi, i.e. V(ui) = ψi; i=1, 2, …, p.
4. The error terms are independent of each other, i.e.,
Cov(ui, uj) = 0; i≠j=1, 2, …, p
The assumption 3 and 4 can be collectively written as: V(U) = Ψ
= diag[ψ1 ψ2 …ψp].
5. The variance of the common factor is given by: V(f) = Φ.
If the factors are considered to be orthogonal then V(f) = I.
6. The common factors and error terms are independent of each
other, i.e. Cov(ui, fj) = 0; i = 1, 2, …,p & j = 1, 2, …, m.
ASSUMPTIONS FOR FACTORANALYSIS
ESTIMATION OF PARAMETERS
🞭 Now consider the variance of X vector:
V(X) = V(μ+Λf+U)
or, Σ = ΛΦΛ’+Ψ
If factors are considered to be orthogonal then: Σ = ΛΛ’+Ψ
🞭 Therefore, in factor analysis there are basically two type of
parameters involved:
1. pm parameters in matrixΛ.
2. m parameters in diagonal matrix Ψ.
🞭 Therefore there are a total of p(m+1) parameters which are
required to be estimated.
🞭 There are several methods for obtaining the estimates of these
parameters among which two most commonly used methods are:
1. Principal component method
2. Method of maximum likelihood.
USING PRINCIPALCOMPONENT METHOD
🞭 It is discussed in detail in the lecture of Principal component
Analysis.
🞭 It has following steps:
1. First transform the matrix of all variables under
consideration to a matrix X such that mean of X will be 0.
2. Obtain the Variance-covariance matrix of X, Σ (or its MLE)
under the assumption that X is Normally Distributed.
3. Obtain the Characteristic roots of Σ and arrange them in
descending order (λ1≥λ2≥…≥λp).
4. For each distinct Eigen root obtain Eigen vector.
5. Normalize these Eigen vectors dividing these by their
norms (β(1), β(2), …,β(p)).
6. Then obtain the principal components by multiplying
these βi’s with X (i.e. β(1)X, β(2)X, …, β(p)X)
7. In the situation if the unit of measurements for variables
are not same it is better to use correlation matrix in
place of variance-covariance matrix.
🞭 Using these steps the estimates of elements of Λ can be
obtained.
🞭 Now for obtaining the estimate of elements of Ψ, we can use:
Ψ = Σ – ΛΛ’
USING PRINCIPALCOMPONENT METHOD
USING METHOD OF MAXIMUM LIKELIHOOD
🞭 In this method it is assumed that the vector X have Multivariate
Normal distribution with mean μ and variance-covariance matrix Σ,
i.e. X ~ Np(μ, Σ).
🞭 Let X1, X2, …, Xn be the random sample from above distribution.
Then the log-likelihood function can be written as:
🞭 Putting Σ = ΛΛ’+ Ψ in log-likelihood we get:
🞭 However, it is not quite easy to obtain the estimates of Λ and Ψ.
🞭 A lot of methods can be used to maximize it among which main
methods are steepest descent method, Newton-Raphson iterative
procedure and scoring method.
COMPUTATION OF FACTOR SCORES
🞭 For obtaining the estimate of factor scores factor (f) analysis
model is reconsidered:
X = μ + Λf + U
🞭 It is fitted in same manner as Linear Regression model. Instead
of Λ its estimate obtained by above stated method is used and
model become:
🞭 For estimating f following methods are used:
🞭 The estimate of f can be obtained by using:
1. Ordinary Least Square (OLS Method)
2. Weighted Least Square (Bartlett’s Method)
3. Regression Method
1. OLS Method:
🞭 In this method the estimates are obtained by minimizing the
error sum of square (U’U). The estimate of factor score is
given by:
2. Bartlett’s Method:
🞭 In OLS method V(U) is considered as identity matrix but in
factor analysis it is considered as Ψ matrix. Identity matrix
will be its one special case therefore Bartlett had suggested to
use the weighted least square method. Using this method the
estimate of factor score is obtained as:
COMPUTATION OF FACTOR SCORES
by using
3. Regression Method:
🞭 In this method the factor scores are obtained
maximum likelihood method.
🞭 Here the joint distribution of X and f is taken as:
🞭 The by using conditional expectation it is obtained that:
E(f | X) = L’(LL’+Ψ)-1(X – μ)
🞭 Using the estimates of L and Ψ the estimate of factor scores
will be:
🞭 Here is the estimate of μ.
COMPUTATION OF FACTOR SCORES
The unrotated output maximizes variance accounted for by the
first and subsequent factors, and forces the factors to
be orthogonal. This data-compression comes at the cost of having
most items load on the early factors, and usually, of having many
items load substantially on more than one factor. Rotation serves
to make the output more understandable, by seeking so-called
“Simple Structure” which is a pattern of loadings where each item
loads strongly on only one of the factors, and much more weakly
on the other factors. It is of two types:
1. Orthogonal rotation
2. Oblique rotation
ROTATION OF FACTORS
🞭 It is a transformational system used in factor analysis in which the
different underlying or latent variables are required to remain separated
from or uncorrelated with one another. There are three different methods
that can be used for Orthogonal rotation:
1. Varimax rotation: It is an orthogonal rotation of the factor axes to
maximize the variance of the squared loadings of a factor (column) on all
the variables (rows) in a factor matrix, which has the effect of
differentiating the original variables by extracted factor. A varimax
solution yields results which make it as easy as possible to identify each
variable with a single factor. This is the most common and most
frequently used rotation method.
2. Quartimaxrotation: It is an orthogonal alternative which minimizes the
number of factorsneeded to explain each variable.This type of rotation
often generates a general factor on which most variables are loaded to a
high or medium degree.
3. Equimax rotation: It is a compromise betweenVarimax and Quartimax
criteria.
ORTHOGONALROTATION
🞭 It is a transformational system used in factor
analysis when two or more factors (i.e., latent variables)
are correlated. Oblique rotation reorients the factors so
that they fall closer to clusters of vectors
representing manifest variables, thereby simplifying the
mathematical description of the manifest variables.
There are two methods used for the oblique rotation:
1. Direct oblimin rotation:
2. Promax Rotation
🞭 Promax method is similar to Direct oblimin method but
is computationally faster than it.
OBLIQUE ROTATION
🞭 For performing Exploratory Factor Analysis (EFA) using
SPSS Following steps are used.
🞭 Click onAnalyze → Dimension Reduction → Factor
FACTOR ANALYSIS:AN EXAMPLE USING
SPSS
EXAMPLE (CONTD.)
🞭 IT WILL OPEN THE FACTOR ANALYSIS WINDOW PUT ALL THE
VARIABLES REQUIRED FOR EFAIN VARIABLE BOX. THEN
CLICK ON EXTRACTION.
EXAMPLE (CONTD.)
🞭 CLICK ON DESCRIPTIVE BUTTON IT WILL OPEN A NEW WINDOW. IN
THIS WINDOW SELECT COEFFICIENTS IN CORRELATION MATRIX
AND KMO AND BARTLETT’S TEST FOR SPHERICITY. CLICK ON
CONTINUE.
🞭 On clicking Extraction window will be open. Click on Correlation matrix
and Scree plot. For number of factors to extracted you can choose any
option. In this based on Eigen values is selected. By default it take Eigen
V
alue > 1 which can be changed.Click on continue.
EXAMPLE (CONTD.)
🞭 On clicking Rotation a window will be open. Click on Varimax
rotation as it is most commonly used method (As per requirement
one can choose any other rotation method. Click on continue.
EXAMPLE (CONTD.)
🞭 Click on continue, then click on Scores → Save as variable →
Display factor score coefficient matrix. Click on continue.
EXAMPLE (CONTD.)
🞭 Click on Options. It will open a new window. Click on Sorted by
Size and then on continue. Then click on OK.
EXAMPLE (CONTD.)
🞭 The output of SPSS shows a no. of tables. The interpretation of these tables
is as follows:
🞭 Table1: Correlation Matrix
As most of the variables are highly correlated it can be said that Factor
Analysis is suitable for the data and will give very good results.
EXAMPLE (CONTD.)
CorrelationMatrix
Pricein
thousands
Enginesize
Horse
power
Wheelbase Width Length
Curb
weight
Fuel
capacity
Fuel
efficiency
Pricein
thousands
1.000 0.624 0.841 0.108 0.328 0.155 0.527 0.424 -0.492
Enginesize 0.624 1.000 0.837 0.473 0.692 0.542 0.761 0.667 -0.737
Horse
power
0.841 0.837 1.000 0.282 0.535 0.385 0.611 0.505 -0.616
Wheelbase 0.108 0.473 0.282 1.000 0.681 0.840 0.651 0.657 -0.497
Width 0.328 0.692 0.535 0.681 1.000 0.706 0.723 0.663 -0.602
Length 0.155 0.542 0.385 0.840 0.706 1.000 0.629 0.571 -0.448
Curb
weight
0.527 0.761 0.611 0.651 0.723 0.629 1.000 0.865 -0.820
Fuel
capacity
0.424 0.667 0.505 0.657 0.663 0.571 0.865 1.000 -0.802
Fuel
efficiency
-0.492 -0.737 -0.616 -0.497 -0.602 -0.448 -0.820 -0.802 1.000
🞭 Table2: It shows the result of KMO and Bartlett’s test. It shows the results
of two results:
1. Kaiser-Meyer-Olkin Measure of Sampling Adequacy: It shows the
proportion of variance in your variables that might be caused by
underlying factors. Higher value of it indicates the usefulness of the
analysis.
2. Bartlett's test of sphericity: It is used to test the null hypothesis that the
correlation matrix is identity. P-value smaller than 0.05 shows that
correlation matrix is not Identity and FactorAnalysis may be useful.
🞭 Here the value of KMO measure is 0.843 which shows that FA is useful in
this case and Bartlett’s test shows that the correlation matrix is not identity.
EXAMPLE (CONTD.)
KMO and Bartlett'sTest
Kaiser-Meyer-OlkinMeasureof Sampling
Adequacy.
0.843
Bartlett'sTest of
Sphericity
Approx.Chi-Square 1407.020
df 36.000
Sig. <0.001
🞭 Table3: Communalities: It shows two values Initial and Extraction.
Initial communalities shows how much percentage of the variation in the
variable is caused by the other variables. The Extraction communalities
shows how much percentage of the variation in the variable is caused by
the factors.
EXAMPLE (CONTD.)
Communalities
Initial Extraction
Pricein
thousands 1.000 0.853
Enginesize 1.000 0.838
Horsepower 1.000 0.878
Wheelbase 1.000 0.868
Width 1.000 0.745
Length 1.000 0.797
Curb weight 1.000 0.854
Fuel capacity 1.000 0.762
Fuel efficiency
1.000 0.726
🞭 Table3: Total Variance Explained: Table is divide into three parts. First
part shows initial Eigen Values, which indicates how much percent of
variance can be explained by a particular factor (% of variance) and the
factor along with previous factors how much percent of variance can be
explained (cumulative %). Second part shows how many factors are
extracted from the data or in other words how many factor are sufficient to
explain the variation in the data. As per rule of thumb the factors having
Eigen value >1.0 or cumulative % extraction more the 70 % are sufficient
to explain the data. Third part shows the rotated sum of square loadings,
which is the result obtained by the rotation of the factor. It distribute the %
of variance explained by the factors approximately equal to each factor.
🞭 In our results the Eigen values for first two factors are more than 1.0 and it
can explain 81% of total variation in the data. Therefore these two factor
can be considered sufficient for the data. In initial solution first factor
explain 64% whereas second factor explain 17% of the total variation,
however in rotated solution first factor explains the 43% and second factor
38% of the total variation.
EXAMPLE (CONTD.)
T
otalV
ariance Explained
Component
Initial Eigenvalues
ExtractionSums of Squared
Loadings
RotationSums of Squared
Loadings
Total
% of
V
ariance
Cumulative
%
Total
% of
V
ariance
Cumulative
%
Total
% of
V
ariance
Cumulative
%
1 5.804 64.490 64.490 5.804 64.490 64.490 3.911 43.457 43.457
2 1.517 16.860 81.349 1.517 16.860 81.349 3.410 37.892 81.349
3 0.623 6.918 88.267
4 0.338 3.757 92.025
5 0.247 2.747 94.772
6 0.155 1.719 96.491
7 0.139 1.547 98.038
8 0.114 1.266 99.305
9 0.063 0.695 100.000
EXAMPLE (CONTD.)
EXAMPLE (CONTD.)
 🞭 Scree Plot: It is another method to obtain the
required number of factors. In this the Eigen value
is plotted against the number of factor. The point
after which the curve become parallel to the
horizontal axis will be the last factor selected. In the
given example after 2nd factor curve become parallel to
the horizontal axis therefore only two factors are retained.
🞭 Table 4: Component Matrix: This table shows the correlation
of the factor with the variables under consideration. It is helpful
in the detection of the structure of the factor. A variable is said
to be contained in a factor if the correlation of the variable with
the factor is maximum among all the factors. In the example 8
out of 9 variables are highly correlated to 1st factor as compared
to second factor therefore these 8 variables (Curb weight,
Engine size, Fuel capacity, Fuel efficiency, Width, Horsepower,
Length, Wheelbase, Price in thousands) are said to be contained
in 1st factor whereas 9th one (price in thousand) is said to be
contained in 2nd factor however the correlation of 9th variable
with both the factors are approximately similar and it may be
contained in any of the factors. It is drawback of the component
matrix and therefore the rotated component matrix is used.
EXAMPLE (CONTD.)
EXAMPLE (CONTD.)
Component Matrix Rotated Component Matrix
Component
1 2
Curb weight 0.923 0.039
Engine size 0.882 -0.243
Fuel capacity 0.865 0.119
Fuel
efficiency
-0.845 0.106
Width 0.829 0.241
Horse power 0.771 -0.533
Length 0.732 0.512
Wheelbase 0.722 0.588
Price in
thousands
0.610 -0.694
Component
1 2
Wheelbase 0.931 0.040
Length 0.887 0.104
Width 0.779 0.371
Fuel capacity 0.725 0.486
Curb weight 0.716 0.585
Price in
thousands
-0.005 0.924
Horse power 0.221 0.911
Engine size 0.498 0.768
Fuel
efficiency
-0.562 -0.641
EXAMPLE (CONTD.)
🞭 Table 5: Rotated Component Matrix: This table shows the
correlation of the factors retained with the variables after
applying Varimax rotation. It is helpful in the detection of the
structure of the factor. A variable is said to be contained in a
factor if the correlation of the variable with the factor is
maximum among all the factors. In the example 5 variables
(Wheelbase, Length, Width, Fuel capacity, Curb weight) are
highly correlated to 1st factor and are said to be contained in 1st
factor. Other 4 variables (Price in thousands, Horsepower,
Engine size, Fuel efficiency) are highly correlated to the 2nd
factor as compared to first factor and are said to be contained in
2nd factor.
1. Anderson TW, An introduction to Multivariate Statistical Analysis,
3rd Edition, John Wiley & Sons Inc., New Jersey.
2. Lesson 12 for Course STAT505 of Penn State University available on
website https://online.stat.psu.edu/stat505/book/export/html/691.
3. Malhotra NK, Birks DF, Marketing Research an Applied Approach,
4th Edition, Prentice Hall, New Delhi.
4. Johnson RA, Wichern DW, Applied Multivariate Statistical
Analysis, 3rd Edition, Prentice Hall, New Delhi.
5. Morrison DF, Multivariate Statistical Methods, 2nd Edition, McGraw
Hill Publication, India.
6. Rencher AC, Methods of Multivariate Analysis, 2nd Edition, Wiley
Interscience, NewY
ork.
7. Everitt BS, Dunn G, Applied Multivariate Data Analysis, 2nd
Edition, John Wiley & Sons, London.
8. Jobson JD, Applied Multivariate Data Analysis Vol. II, Springer–
V
erlag Inc. New Y
ork.
REFERENCES

More Related Content

PPTX
Priya
PPTX
Factor analysis
PPTX
Marketing Research-Factor Analysis
PDF
the unconditional Logistic Regression .pdf
PPTX
Factor Extraction method in factor analysis with example in R studio.pptx
PPTX
AI & ML(Unit III).pptx.It contains also syllabus
PPTX
Advanced Methods of Statistical Analysis used in Animal Breeding.
PPT
FactorAnalysis.ppt
Priya
Factor analysis
Marketing Research-Factor Analysis
the unconditional Logistic Regression .pdf
Factor Extraction method in factor analysis with example in R studio.pptx
AI & ML(Unit III).pptx.It contains also syllabus
Advanced Methods of Statistical Analysis used in Animal Breeding.
FactorAnalysis.ppt

Similar to Factor_Analysis mastes science presntation (20)

PPTX
08 - FACTOR ANALYSIS PPT.pptx
PPTX
Factor analysis
PPTX
Factor Analysis of MPH Biostatistics.pptx
PDF
Methods of point estimation
PPT
Regression analysis ppt
DOCX
NPTEL Machine Learning Week 2.docx
PPT
Factor anaysis scale dimensionality
PDF
Regression Analysis-Machine Learning -Different Types
PPTX
QSAR statistical methods for drug discovery(pharmacology m.pharm2nd sem)
PPTX
REGRESSION ANALYSIS THEORY EXPLAINED HERE
PPTX
Data Processing and Statistical Treatment.pptx
PPTX
conditional probablity in logistic regression
PPT
Factorial Experiments
PPTX
Factor Analysis in Research
PPTX
MODULE-3edited.pptx machine learning modulk
PPTX
MODULE-2.pptx machine learning notes for vtu 6th sem cse
PPTX
correction maximum likelihood estimation method
PPTX
PPTX
03 Data Mining Techniques
PDF
Overview and Implementation of Principal Component Analysis
08 - FACTOR ANALYSIS PPT.pptx
Factor analysis
Factor Analysis of MPH Biostatistics.pptx
Methods of point estimation
Regression analysis ppt
NPTEL Machine Learning Week 2.docx
Factor anaysis scale dimensionality
Regression Analysis-Machine Learning -Different Types
QSAR statistical methods for drug discovery(pharmacology m.pharm2nd sem)
REGRESSION ANALYSIS THEORY EXPLAINED HERE
Data Processing and Statistical Treatment.pptx
conditional probablity in logistic regression
Factorial Experiments
Factor Analysis in Research
MODULE-3edited.pptx machine learning modulk
MODULE-2.pptx machine learning notes for vtu 6th sem cse
correction maximum likelihood estimation method
03 Data Mining Techniques
Overview and Implementation of Principal Component Analysis
Ad

More from sandhyanamadara (10)

PPTX
ARULSIA AGM SEMINAR life science for agri
PPTX
Biological Data bases for biological sciences bioinformatics
PPTX
Food borne Viral Pathogens harmful to human
PPTX
the enzymes study in plant diseases for benificial
PPTX
bioluminiscence example for light movement
PPTX
emergence and remergence of food borne disease food mcrobiology
PPTX
food safty 1.pptx
PPTX
null.pptx
PPTX
role of sigma factor.pptx
PPTX
RHIZOIGANDS.pptx
ARULSIA AGM SEMINAR life science for agri
Biological Data bases for biological sciences bioinformatics
Food borne Viral Pathogens harmful to human
the enzymes study in plant diseases for benificial
bioluminiscence example for light movement
emergence and remergence of food borne disease food mcrobiology
food safty 1.pptx
null.pptx
role of sigma factor.pptx
RHIZOIGANDS.pptx
Ad

Recently uploaded (20)

PDF
BRANDBOOK-Presidential Award Scheme-Kenya-2023
PDF
Trusted Executive Protection Services in Ontario — Discreet & Professional.pdf
PPT
pump pump is a mechanism that is used to transfer a liquid from one place to ...
PDF
Urban Design Final Project-Context
PPTX
Complete Guide to Microsoft PowerPoint 2019 – Features, Tools, and Tips"
PPTX
rapid fire quiz in your house is your india.pptx
PDF
Quality Control Management for RMG, Level- 4, Certificate
PPTX
CLASS_11_BUSINESS_STUDIES_PPT_CHAPTER_1_Business_Trade_Commerce.pptx
PPTX
An introduction to AI in research and reference management
PDF
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
PDF
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
PPTX
AD Bungalow Case studies Sem 2.pptxvwewev
PDF
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
PPTX
12. Community Pharmacy and How to organize it
PPTX
mahatma gandhi bus terminal in india Case Study.pptx
PDF
Interior Structure and Construction A1 NGYANQI
PDF
The Advantages of Working With a Design-Build Studio
PDF
GREEN BUILDING MATERIALS FOR SUISTAINABLE ARCHITECTURE AND BUILDING STUDY
PDF
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
PDF
Integrated-2D-and-3D-Animation-Bridging-Dimensions-for-Impactful-Storytelling...
BRANDBOOK-Presidential Award Scheme-Kenya-2023
Trusted Executive Protection Services in Ontario — Discreet & Professional.pdf
pump pump is a mechanism that is used to transfer a liquid from one place to ...
Urban Design Final Project-Context
Complete Guide to Microsoft PowerPoint 2019 – Features, Tools, and Tips"
rapid fire quiz in your house is your india.pptx
Quality Control Management for RMG, Level- 4, Certificate
CLASS_11_BUSINESS_STUDIES_PPT_CHAPTER_1_Business_Trade_Commerce.pptx
An introduction to AI in research and reference management
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
AD Bungalow Case studies Sem 2.pptxvwewev
UNIT 1 Introduction fnfbbfhfhfbdhdbdto Java.pptx.pdf
12. Community Pharmacy and How to organize it
mahatma gandhi bus terminal in india Case Study.pptx
Interior Structure and Construction A1 NGYANQI
The Advantages of Working With a Design-Build Studio
GREEN BUILDING MATERIALS FOR SUISTAINABLE ARCHITECTURE AND BUILDING STUDY
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
Integrated-2D-and-3D-Animation-Bridging-Dimensions-for-Impactful-Storytelling...

Factor_Analysis mastes science presntation

  • 1. FACTOR ANALYSIS Course Teacher Dr Dr. R. GangaiSelvi (Assoc. Professor -Statistics) Department of Statistics BY Namadara Sandhya 1st year PhD 202211004 Department of Agriculture Microbiology STA 602 (1+1) Factor analysis – objective – designing and assumptions – various rotations
  • 2. 🞭 It is a dimension reduction technique. 🞭 It is used when in analysis a large number of variables and it is not possible to deal with all the variables simultaneously. 🞭 The factor analysis is of two types: 1. Exploratory FactorAnalysis (EFA) 2. Confirmatory Factor Analysis (CFA) 🞭 The EFA is used when the structure of underlying factors is unknownand is to be determine. 🞭 The CFA is used when the structure of underlying factors is already known and it is required to check whether the data collected confirm that structureor not. FACTOR ANALYSIS
  • 3. 🞭 The main objectives of the EFA are: 1. To identifying the underlying dimensions or factors that explain the variation (or correlations) among the set of variables. 2. To obtain a new smaller set of uncorrelated variables to replace the original set of correlated variables in subsequent analysis. 3. To obtain a smaller set of salient variables from a large set for use in subsequent analysis. EXPLORATORY FACTOR ANALYSIS
  • 4. 🞭 Both the techniques the Exploratory Factor Analysis (EFA) and Principal Component Analysis (PCA) are termed as data reduction techniques. But EFA and PCA can’t be seprated from each other. PCA can be termed as a method of performing the EFA. The PCA is a technique in which we obtain the uncorrelated linear combinations of the variables under study which are able to explain the variation (or correlation) in the dataset, but is unable to answer the 2nd and 3rd objectives of the EFA i.e. How many factors should be retained in the data and which variable should be considered within which factor. EFAVERSUS PRINCIPAL COMPONENT ANALYSIS
  • 5. 🞭 Factor Analysis is based on a model in which the observed vector is partitioned into an unobserved systematic part and an unobserved error part. 🞭 The components of error part are considered as independent whereas systematic part is taken as a linear combination of relatively small number of unobserved factor variables. 🞭 This model separates the effect of factors from the error. 🞭 The model for FactorAnalysis is defined as: X = μ + Λf + U 🞭 X is the px1 vector of observed variables. It may be considered as the score on a battery of test. 🞭 μ is (px1) vector of the average score of this test in the population. MODEL FOR FACTOR ANALYSIS
  • 6. 🞭 f is (mx1) vector of unobserved variables called as common factors. These are the scores on hidden (underlying) ability whose linear combinations enters to the test scores. 🞭 Λ is (pxm) matrix of the component loadings or factor loadings. It consists of the coefficients of the linear combinations of factors. 🞭 U is (px1) vector of random error terms. MODEL FOR FACTOR ANALYSIS
  • 7. 1. The mean of random error term is 0, i.e. E[U] = 0 2. The mean of common factor is 0, i.e. E[f] = 0 3. The variance of error term is ψi, i.e. V(ui) = ψi; i=1, 2, …, p. 4. The error terms are independent of each other, i.e., Cov(ui, uj) = 0; i≠j=1, 2, …, p The assumption 3 and 4 can be collectively written as: V(U) = Ψ = diag[ψ1 ψ2 …ψp]. 5. The variance of the common factor is given by: V(f) = Φ. If the factors are considered to be orthogonal then V(f) = I. 6. The common factors and error terms are independent of each other, i.e. Cov(ui, fj) = 0; i = 1, 2, …,p & j = 1, 2, …, m. ASSUMPTIONS FOR FACTORANALYSIS
  • 8. ESTIMATION OF PARAMETERS 🞭 Now consider the variance of X vector: V(X) = V(μ+Λf+U) or, Σ = ΛΦΛ’+Ψ If factors are considered to be orthogonal then: Σ = ΛΛ’+Ψ 🞭 Therefore, in factor analysis there are basically two type of parameters involved: 1. pm parameters in matrixΛ. 2. m parameters in diagonal matrix Ψ. 🞭 Therefore there are a total of p(m+1) parameters which are required to be estimated. 🞭 There are several methods for obtaining the estimates of these parameters among which two most commonly used methods are: 1. Principal component method 2. Method of maximum likelihood.
  • 9. USING PRINCIPALCOMPONENT METHOD 🞭 It is discussed in detail in the lecture of Principal component Analysis. 🞭 It has following steps: 1. First transform the matrix of all variables under consideration to a matrix X such that mean of X will be 0. 2. Obtain the Variance-covariance matrix of X, Σ (or its MLE) under the assumption that X is Normally Distributed. 3. Obtain the Characteristic roots of Σ and arrange them in descending order (λ1≥λ2≥…≥λp). 4. For each distinct Eigen root obtain Eigen vector.
  • 10. 5. Normalize these Eigen vectors dividing these by their norms (β(1), β(2), …,β(p)). 6. Then obtain the principal components by multiplying these βi’s with X (i.e. β(1)X, β(2)X, …, β(p)X) 7. In the situation if the unit of measurements for variables are not same it is better to use correlation matrix in place of variance-covariance matrix. 🞭 Using these steps the estimates of elements of Λ can be obtained. 🞭 Now for obtaining the estimate of elements of Ψ, we can use: Ψ = Σ – ΛΛ’ USING PRINCIPALCOMPONENT METHOD
  • 11. USING METHOD OF MAXIMUM LIKELIHOOD 🞭 In this method it is assumed that the vector X have Multivariate Normal distribution with mean μ and variance-covariance matrix Σ, i.e. X ~ Np(μ, Σ). 🞭 Let X1, X2, …, Xn be the random sample from above distribution. Then the log-likelihood function can be written as: 🞭 Putting Σ = ΛΛ’+ Ψ in log-likelihood we get: 🞭 However, it is not quite easy to obtain the estimates of Λ and Ψ. 🞭 A lot of methods can be used to maximize it among which main methods are steepest descent method, Newton-Raphson iterative procedure and scoring method.
  • 12. COMPUTATION OF FACTOR SCORES 🞭 For obtaining the estimate of factor scores factor (f) analysis model is reconsidered: X = μ + Λf + U 🞭 It is fitted in same manner as Linear Regression model. Instead of Λ its estimate obtained by above stated method is used and model become: 🞭 For estimating f following methods are used: 🞭 The estimate of f can be obtained by using: 1. Ordinary Least Square (OLS Method) 2. Weighted Least Square (Bartlett’s Method) 3. Regression Method
  • 13. 1. OLS Method: 🞭 In this method the estimates are obtained by minimizing the error sum of square (U’U). The estimate of factor score is given by: 2. Bartlett’s Method: 🞭 In OLS method V(U) is considered as identity matrix but in factor analysis it is considered as Ψ matrix. Identity matrix will be its one special case therefore Bartlett had suggested to use the weighted least square method. Using this method the estimate of factor score is obtained as: COMPUTATION OF FACTOR SCORES
  • 14. by using 3. Regression Method: 🞭 In this method the factor scores are obtained maximum likelihood method. 🞭 Here the joint distribution of X and f is taken as: 🞭 The by using conditional expectation it is obtained that: E(f | X) = L’(LL’+Ψ)-1(X – μ) 🞭 Using the estimates of L and Ψ the estimate of factor scores will be: 🞭 Here is the estimate of μ. COMPUTATION OF FACTOR SCORES
  • 15. The unrotated output maximizes variance accounted for by the first and subsequent factors, and forces the factors to be orthogonal. This data-compression comes at the cost of having most items load on the early factors, and usually, of having many items load substantially on more than one factor. Rotation serves to make the output more understandable, by seeking so-called “Simple Structure” which is a pattern of loadings where each item loads strongly on only one of the factors, and much more weakly on the other factors. It is of two types: 1. Orthogonal rotation 2. Oblique rotation ROTATION OF FACTORS
  • 16. 🞭 It is a transformational system used in factor analysis in which the different underlying or latent variables are required to remain separated from or uncorrelated with one another. There are three different methods that can be used for Orthogonal rotation: 1. Varimax rotation: It is an orthogonal rotation of the factor axes to maximize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which has the effect of differentiating the original variables by extracted factor. A varimax solution yields results which make it as easy as possible to identify each variable with a single factor. This is the most common and most frequently used rotation method. 2. Quartimaxrotation: It is an orthogonal alternative which minimizes the number of factorsneeded to explain each variable.This type of rotation often generates a general factor on which most variables are loaded to a high or medium degree. 3. Equimax rotation: It is a compromise betweenVarimax and Quartimax criteria. ORTHOGONALROTATION
  • 17. 🞭 It is a transformational system used in factor analysis when two or more factors (i.e., latent variables) are correlated. Oblique rotation reorients the factors so that they fall closer to clusters of vectors representing manifest variables, thereby simplifying the mathematical description of the manifest variables. There are two methods used for the oblique rotation: 1. Direct oblimin rotation: 2. Promax Rotation 🞭 Promax method is similar to Direct oblimin method but is computationally faster than it. OBLIQUE ROTATION
  • 18. 🞭 For performing Exploratory Factor Analysis (EFA) using SPSS Following steps are used. 🞭 Click onAnalyze → Dimension Reduction → Factor FACTOR ANALYSIS:AN EXAMPLE USING SPSS
  • 19. EXAMPLE (CONTD.) 🞭 IT WILL OPEN THE FACTOR ANALYSIS WINDOW PUT ALL THE VARIABLES REQUIRED FOR EFAIN VARIABLE BOX. THEN CLICK ON EXTRACTION.
  • 20. EXAMPLE (CONTD.) 🞭 CLICK ON DESCRIPTIVE BUTTON IT WILL OPEN A NEW WINDOW. IN THIS WINDOW SELECT COEFFICIENTS IN CORRELATION MATRIX AND KMO AND BARTLETT’S TEST FOR SPHERICITY. CLICK ON CONTINUE.
  • 21. 🞭 On clicking Extraction window will be open. Click on Correlation matrix and Scree plot. For number of factors to extracted you can choose any option. In this based on Eigen values is selected. By default it take Eigen V alue > 1 which can be changed.Click on continue. EXAMPLE (CONTD.)
  • 22. 🞭 On clicking Rotation a window will be open. Click on Varimax rotation as it is most commonly used method (As per requirement one can choose any other rotation method. Click on continue. EXAMPLE (CONTD.)
  • 23. 🞭 Click on continue, then click on Scores → Save as variable → Display factor score coefficient matrix. Click on continue. EXAMPLE (CONTD.)
  • 24. 🞭 Click on Options. It will open a new window. Click on Sorted by Size and then on continue. Then click on OK. EXAMPLE (CONTD.)
  • 25. 🞭 The output of SPSS shows a no. of tables. The interpretation of these tables is as follows: 🞭 Table1: Correlation Matrix As most of the variables are highly correlated it can be said that Factor Analysis is suitable for the data and will give very good results. EXAMPLE (CONTD.) CorrelationMatrix Pricein thousands Enginesize Horse power Wheelbase Width Length Curb weight Fuel capacity Fuel efficiency Pricein thousands 1.000 0.624 0.841 0.108 0.328 0.155 0.527 0.424 -0.492 Enginesize 0.624 1.000 0.837 0.473 0.692 0.542 0.761 0.667 -0.737 Horse power 0.841 0.837 1.000 0.282 0.535 0.385 0.611 0.505 -0.616 Wheelbase 0.108 0.473 0.282 1.000 0.681 0.840 0.651 0.657 -0.497 Width 0.328 0.692 0.535 0.681 1.000 0.706 0.723 0.663 -0.602 Length 0.155 0.542 0.385 0.840 0.706 1.000 0.629 0.571 -0.448 Curb weight 0.527 0.761 0.611 0.651 0.723 0.629 1.000 0.865 -0.820 Fuel capacity 0.424 0.667 0.505 0.657 0.663 0.571 0.865 1.000 -0.802 Fuel efficiency -0.492 -0.737 -0.616 -0.497 -0.602 -0.448 -0.820 -0.802 1.000
  • 26. 🞭 Table2: It shows the result of KMO and Bartlett’s test. It shows the results of two results: 1. Kaiser-Meyer-Olkin Measure of Sampling Adequacy: It shows the proportion of variance in your variables that might be caused by underlying factors. Higher value of it indicates the usefulness of the analysis. 2. Bartlett's test of sphericity: It is used to test the null hypothesis that the correlation matrix is identity. P-value smaller than 0.05 shows that correlation matrix is not Identity and FactorAnalysis may be useful. 🞭 Here the value of KMO measure is 0.843 which shows that FA is useful in this case and Bartlett’s test shows that the correlation matrix is not identity. EXAMPLE (CONTD.) KMO and Bartlett'sTest Kaiser-Meyer-OlkinMeasureof Sampling Adequacy. 0.843 Bartlett'sTest of Sphericity Approx.Chi-Square 1407.020 df 36.000 Sig. <0.001
  • 27. 🞭 Table3: Communalities: It shows two values Initial and Extraction. Initial communalities shows how much percentage of the variation in the variable is caused by the other variables. The Extraction communalities shows how much percentage of the variation in the variable is caused by the factors. EXAMPLE (CONTD.) Communalities Initial Extraction Pricein thousands 1.000 0.853 Enginesize 1.000 0.838 Horsepower 1.000 0.878 Wheelbase 1.000 0.868 Width 1.000 0.745 Length 1.000 0.797 Curb weight 1.000 0.854 Fuel capacity 1.000 0.762 Fuel efficiency 1.000 0.726
  • 28. 🞭 Table3: Total Variance Explained: Table is divide into three parts. First part shows initial Eigen Values, which indicates how much percent of variance can be explained by a particular factor (% of variance) and the factor along with previous factors how much percent of variance can be explained (cumulative %). Second part shows how many factors are extracted from the data or in other words how many factor are sufficient to explain the variation in the data. As per rule of thumb the factors having Eigen value >1.0 or cumulative % extraction more the 70 % are sufficient to explain the data. Third part shows the rotated sum of square loadings, which is the result obtained by the rotation of the factor. It distribute the % of variance explained by the factors approximately equal to each factor. 🞭 In our results the Eigen values for first two factors are more than 1.0 and it can explain 81% of total variation in the data. Therefore these two factor can be considered sufficient for the data. In initial solution first factor explain 64% whereas second factor explain 17% of the total variation, however in rotated solution first factor explains the 43% and second factor 38% of the total variation. EXAMPLE (CONTD.)
  • 29. T otalV ariance Explained Component Initial Eigenvalues ExtractionSums of Squared Loadings RotationSums of Squared Loadings Total % of V ariance Cumulative % Total % of V ariance Cumulative % Total % of V ariance Cumulative % 1 5.804 64.490 64.490 5.804 64.490 64.490 3.911 43.457 43.457 2 1.517 16.860 81.349 1.517 16.860 81.349 3.410 37.892 81.349 3 0.623 6.918 88.267 4 0.338 3.757 92.025 5 0.247 2.747 94.772 6 0.155 1.719 96.491 7 0.139 1.547 98.038 8 0.114 1.266 99.305 9 0.063 0.695 100.000 EXAMPLE (CONTD.)
  • 30. EXAMPLE (CONTD.)  🞭 Scree Plot: It is another method to obtain the required number of factors. In this the Eigen value is plotted against the number of factor. The point after which the curve become parallel to the horizontal axis will be the last factor selected. In the given example after 2nd factor curve become parallel to the horizontal axis therefore only two factors are retained.
  • 31. 🞭 Table 4: Component Matrix: This table shows the correlation of the factor with the variables under consideration. It is helpful in the detection of the structure of the factor. A variable is said to be contained in a factor if the correlation of the variable with the factor is maximum among all the factors. In the example 8 out of 9 variables are highly correlated to 1st factor as compared to second factor therefore these 8 variables (Curb weight, Engine size, Fuel capacity, Fuel efficiency, Width, Horsepower, Length, Wheelbase, Price in thousands) are said to be contained in 1st factor whereas 9th one (price in thousand) is said to be contained in 2nd factor however the correlation of 9th variable with both the factors are approximately similar and it may be contained in any of the factors. It is drawback of the component matrix and therefore the rotated component matrix is used. EXAMPLE (CONTD.)
  • 32. EXAMPLE (CONTD.) Component Matrix Rotated Component Matrix Component 1 2 Curb weight 0.923 0.039 Engine size 0.882 -0.243 Fuel capacity 0.865 0.119 Fuel efficiency -0.845 0.106 Width 0.829 0.241 Horse power 0.771 -0.533 Length 0.732 0.512 Wheelbase 0.722 0.588 Price in thousands 0.610 -0.694 Component 1 2 Wheelbase 0.931 0.040 Length 0.887 0.104 Width 0.779 0.371 Fuel capacity 0.725 0.486 Curb weight 0.716 0.585 Price in thousands -0.005 0.924 Horse power 0.221 0.911 Engine size 0.498 0.768 Fuel efficiency -0.562 -0.641
  • 33. EXAMPLE (CONTD.) 🞭 Table 5: Rotated Component Matrix: This table shows the correlation of the factors retained with the variables after applying Varimax rotation. It is helpful in the detection of the structure of the factor. A variable is said to be contained in a factor if the correlation of the variable with the factor is maximum among all the factors. In the example 5 variables (Wheelbase, Length, Width, Fuel capacity, Curb weight) are highly correlated to 1st factor and are said to be contained in 1st factor. Other 4 variables (Price in thousands, Horsepower, Engine size, Fuel efficiency) are highly correlated to the 2nd factor as compared to first factor and are said to be contained in 2nd factor.
  • 34. 1. Anderson TW, An introduction to Multivariate Statistical Analysis, 3rd Edition, John Wiley & Sons Inc., New Jersey. 2. Lesson 12 for Course STAT505 of Penn State University available on website https://online.stat.psu.edu/stat505/book/export/html/691. 3. Malhotra NK, Birks DF, Marketing Research an Applied Approach, 4th Edition, Prentice Hall, New Delhi. 4. Johnson RA, Wichern DW, Applied Multivariate Statistical Analysis, 3rd Edition, Prentice Hall, New Delhi. 5. Morrison DF, Multivariate Statistical Methods, 2nd Edition, McGraw Hill Publication, India. 6. Rencher AC, Methods of Multivariate Analysis, 2nd Edition, Wiley Interscience, NewY ork. 7. Everitt BS, Dunn G, Applied Multivariate Data Analysis, 2nd Edition, John Wiley & Sons, London. 8. Jobson JD, Applied Multivariate Data Analysis Vol. II, Springer– V erlag Inc. New Y ork. REFERENCES