1. 📊 Great question! In the context of Factor Analysis, the term "underlying structure" refers to:
👉 The hidden or unobserved (latent) factors that explain the patterns of relationships
(correlations) among a set of observed variables.
🔍 In simple terms:
Imagine you have a long questionnaire with 20 different questions about a person’s job
experience—like job satisfaction, working hours, stress, communication, etc.
These questions are observed variables.
They may seem different, but they could all be influenced by a few underlying
themes, such as:
o "Work Environment"
o "Job Satisfaction"
o "Supervisor Relationship"
These themes or dimensions are the underlying structure — they are not directly measured,
but they cause the observed responses to be related.
🧠 Why it matters in Factor Analysis:
Factor analysis identifies and groups together variables that are highly correlated and
reveals the common factor(s) behind them. This helps reduce many variables into fewer
factors that explain most of the data.
📘 In your quoted paragraph:
The "underlying structure" means those common factors or latent variables that the
observed variables share.
Factor analysis is the tool that finds and describes this hidden structure.
2. What Is Factor Analysis?
Factor Analysis is a statistical method used to:
👉 Identify hidden (underlying) patterns or structures among a large group of variables.
Imagine a big questionnaire with 20 items about students' feelings toward school. Factor analysis
helps reduce that to just a few key categories, like:
Academic stress
Teacher support
Peer relationships
So instead of 20 separate items, we can talk about 3 broader factors.
🎯 Why Use Factor Analysis?
It helps to:
Simplify large sets of data (data reduction)
Identify clusters of related variables
Find latent factors (things you can't directly measure, like satisfaction, motivation,
attitude)
🔍 How It Works (Simplified)
You give a survey with many related questions.
🔢 For example:
1. I like my job.
2. I feel satisfied with my supervisor.
3. My work environment is comfortable.
4. I feel respected at work.
5. My supervisor listens to me.
People who rate high on question 1 usually rate high on 4 too. Why?
➡️Because both questions relate to job satisfaction.
3. Factor Analysis looks for these relationships and groups the questions that are highly correlated.
🛠️Two Main Types
1. Exploratory Factor Analysis (EFA)
🔹 You explore the data to discover the structure
🔹 Use when you don’t know how many factors exist
2. Confirmatory Factor Analysis (CFA)
🔹 You test a hypothesis (e.g., “I think there are 3 factors”)
🔹 Used when structure is already expected
🟨 In your textbook chapter, the focus is on Exploratory Factor Analysis (EFA).
📋 Steps in Factor Analysis (from the textbook)
1. Select Variables
Use items that are interrelated (correlation above 0.3 recommended).
2. Check Assumptions
Sample size: Bigger is better. A rule of thumb is 5–10 participants per variable
Use:
o KMO test (Kaiser-Meyer-Olkin) → Should be ≥ 0.6
o Bartlett’s Test of Sphericity → Should be significant (p < 0.05)
3. Extract Factors
Use a method like:
Principal Axis Factoring
Based on Eigenvalues (≥1 = important factors)
🎯 This tells us how many factors to keep.
4. Rotate the Factors
Rotation makes interpretation easier:
Varimax (orthogonal, no correlation between factors)
4. Promax (oblique, allows correlation between factors)
5. Interpret the Factors
Check factor loadings:
A factor loading shows how strongly each item relates to a factor
Loadings ≥ 0.40 are generally considered meaningful
You then label the factors based on the types of items they include.
📊 Output You See in SPSS
KMO and Bartlett’s Test
Communalities (how much of each item’s variance is explained)
Eigenvalues Table
Scree Plot (to decide number of factors visually)
Rotated Factor Matrix (to see how items load on each factor)
📝 Writing the Results
A typical write-up includes:
“Factor analysis was conducted using Principal Axis Factoring and Varimax rotation. Three
factors were extracted, accounting for 65% of the total variance. KMO = 0.78, Bartlett’s test was
significant (p < .001).”
Then you describe each factor based on the items it includes.
⚠️Limitations
Subjective decisions (e.g., naming factors)
Needs large sample size
Assumes linear relationships
Cannot show cause-and-effect
5. 🧠 Real-Life Example from the Book
In your textbook:
They analyzed job satisfaction data
Found 3 key factors:
1. General Satisfaction
2. Working Conditions
3. Supervisor Relationship
Each factor had a group of items that loaded strongly, and they interpreted what each group
meant.
✅ Summary
Step What Happens
1. Select Variables Choose items that might be related
2. Check Assumptions Use KMO and Bartlett's test
3. Extract Factors Use eigenvalues and scree plot
4. Rotate Use Varimax or Promax
5. Interpret Examine factor loadings and name factors
6. Report Write up results clearly
Factor analysis is a statistical method that groups related variables based on how similarly they
behave (their shared variability).
Example: Student Survey
Imagine you give a survey to students with these 3 questions:
1. I enjoy my classes.
2. I look forward to going to school.
3. I feel motivated to study.
Let’s say a lot of students answer these in similar ways:
If a student gives a high score to Question 1, they also give high scores to 2 and 3.
6. If a student gives a low score to one, they often give low scores to the others too.
✅ What does this mean?
These 3 questions are moving together — they share variability.
➡️This shared pattern tells us something:
All 3 questions are likely measuring the same hidden idea, such as "motivation to learn".
📊 What Factor Analysis Does
Factor analysis sees that:
These 3 questions are highly correlated (they go up/down together)
So it groups them into one factor (like Motivation)
This is called:
Capitalizing on shared variability
(📌 i.e., finding and using the patterns where variables behave similarly)
What are factor loadings?
Factor loadings are numbers that show how strongly a variable is connected to a factor.
Think of it like this:
A factor is a hidden idea (like “Job Satisfaction”)
A loading shows how much one question (like “I enjoy my job”) is related to that factor
👉 It’s just like a correlation.
🧮 Loadings look like this:
Question / Variable Factor 1 (Satisfaction) Factor 2 (Supervisor)
I enjoy my job 0.75 0.10
I feel happy with my supervisor 0.20 0.80
0.75 = strong relationship with Factor 1
7. 0.80 = strong relationship with Factor 2
Values range from –1.00 to +1.00, just like correlation
o Positive means they move together
o Negative means they move in opposite directions
o 0 means no relationship
📌 Key points from the paragraph:
Sentence Simple meaning
"The main set of results..."
The most important output from factor analysis is the list of
factor loadings.
"A factor loading is interpreted
as..."
A loading shows how strongly a variable (like a survey
question) is linked to a factor (like “Satisfaction”).
"Like correlations..."
Loadings are similar to correlations and range from –1.00 to
+1.00.
"Variables typically will have
loadings on all factors..."
A variable might show some connection to many factors...
"...but will usually have high
loadings on only one factor."
...but usually, it’s strongly linked to just one. That’s the
one we pay attention to.
🧠 Visual Example
Let’s say we run a factor analysis on 6 survey questions. We get this table:
Variable Factor 1 Factor 2
Q1: I enjoy my work 0.78 0.12
Q2: My boss listens to me 0.11 0.81
Q3: I feel productive 0.74 0.09
Q1 and Q3 have high loadings on Factor 1 → they belong to the Job Satisfaction factor
Q2 has a high loading on Factor 2 → it belongs to the Supervisor Support factor
8. ✅ Summary:
A factor loading tells you how closely related each variable is to a factor.
It’s like a correlation — higher values (closer to 1 or –1) mean a stronger connection.
Correlation Value Range and Meaning
Correlation Value
(r)
Strength of
Relationship
Direction What it Means
+1.00 Perfect Positive
As one goes up, the other goes up
equally
+0.70 to +0.99 Strong Positive Variables rise together strongly
+0.40 to +0.69 Moderate Positive One rises, the other tends to rise
+0.10 to +0.39 Weak Positive Small upward relationship
0.00 None — No relationship
–0.10 to –0.39 Weak Negative
One goes up, the other goes down
slightly
–0.40 to –0.69 Moderate Negative
One goes up, the other tends to go
down
–0.70 to –0.99 Strong Negative
One increases, the other strongly
decreases
–1.00 Perfect Negative Exact opposite movement
📌 In Factor Analysis:
The factor loadings are correlations between each variable and the underlying factor.
For example:
A loading of 0.75 means a strong positive relationship with the factor
A loading of –0.60 means a strong negative relationship
A loading of 0.20 means a weak relationship — usually not important
✅ Rule of Thumb in Factor Analysis:
9. Loading Value Interpretation
≥ 0.70 Very strong loading
≥ 0.50 Strong loading
≥ 0.30 Acceptable / Moderate
< 0.30 Usually ignored or considered weak
What does a negative factor loading mean?
A negative value means that the variable and the factor are inversely related.
As the factor goes up, the variable goes down, and vice versa.
🔍 Example
Let’s say you’re measuring a factor called "Job Stress."
Survey Item Factor Loading
"I feel overwhelmed at work" +0.80
"I am calm during the workday" –0.75
🔸 Both of these items relate to the same factor: Job Stress.
🔸 But the second one is reverse-coded (low stress = high calmness), so it has a negative
loading.
✅ So yes, we use negative values to:
Interpret inverse relationships
Understand that the variable is related to the factor, just in the opposite direction
📌 Important Notes:
You should not ignore negative loadings — they tell part of the story.
In interpretation, sometimes we reverse the sign or reverse-code items so all values are
easier to read.
10. Negative loadings are valid and useful — especially when the survey includes positively
and negatively worded items.
🧠 Summary:
Yes, negative values in factor loadings are useful.
They show a strong relationship in the opposite direction between a variable and a factor.
What is a communality (h²)?
A communality tells you how much of a variable’s information is explained by the factors.
🔍 Think of it like this:
Let’s say you ask:
“I enjoy my job” — this is one variable in your factor analysis.
If the communality for this variable is:
0.80 → 80% of the variable is explained by the factors
0.20 → Only 20% is explained; most of it is unrelated to the factors
📊 How it's calculated:
It’s usually the sum of the squared factor loadings for that variable across all factors.
Example:
Factor Loadings for variable "Enjoy job"
Factor 1: 0.70
Factor 2: 0.40
Communality = (0.70)² + (0.40)² = 0.49 + 0.16 = 0.65
So, 65% of the variation in this question is explained by the factors.
🧠 Summary:
Term Simple Meaning
Communality (h²) How much of the variable is explained by the factors
11. Term Simple Meaning
High value (close to 1.0) The factor model explains the variable very well
Low value (close to 0.0) The variable is mostly unexplained by the factors
Calculated as Sum of squared loadings for that variable
📌 Final Simple Sentence:
Communalities tell us how well each question or variable is represented by the factors found in
factor analysis. Higher = better.
Summary:
Term Meaning
Communality Total variance in a variable explained by all factors
Formula Sum of squared factor loadings for that variable
Good Value Typically > 0.40 is considered acceptable
Example Table:
Let’s say we have 3 factors and 1 variable:
Variable: "I feel successful at work" Loading on Factor 1 Factor 2 Factor 3
Loadings 0.60 0.30 0.10
✅ Step-by-step:
Square each loading:
→ 0.602=0.360.60^2 = 0.360.602=0.36
→ 0.302=0.090.30^2 = 0.090.302=0.09
→ 0.102=0.010.10^2 = 0.010.102=0.01
Add them up:
→ 0.36 + 0.09 + 0.01 = 0.46
📌 So, the communality = 0.46
➡️That means 46% of this variable’s variation is explained by all three factors together.
Detailed Explanation:
12. Term What it does Meaning in context
Factor A hidden (latent) variable It explains patterns in your data
Loading A correlation-like value
It shows how much a variable relates to a
factor
Communality
Total variance explained by all
factors
Calculated by summing squared loadings
🔍 Think of it like this:
Imagine:
Factor = light source 💡
Variable = a shadow on the wall
Loading = how clearly the shadow is shaped by the light
So:
The factor is the thing doing the explaining.
The loading is the strength of the connection between each variable and that factor.
📌 Conclusion:
✅ Factors are what explain the variable’s variance.
✅ Loadings tell us how much each factor contributes to explaining the variable.
That’s why we square and add the loadings across all factors — to measure how much of the
variable is explained (communality).
A “linear combination” is a mathematical expression where several variables are combined
using addition and multiplication by constants (called weights or coefficients).
📌 For example:
If you have three variables:
X = exam score
₁
X = homework score
₂
X = attendance
₃
13. A linear combination might look like:
Factor 1 = 0.6·X + 0.3·X + 0.1·X
₁ ₂ ₃
This means the factor is a single value calculated from the three original variables by
multiplying each by a weight and adding the results.
🔎 In PCA and Factor Analysis:
You start with many variables (like X , X , X , ..., X ).
₁ ₂ ₃ ₁₀
These variables may be related (correlated).
The method creates new variables (factors) that are combinations of the original ones.
These factors summarize or explain the patterns in the original data.
🎯 Purpose of Linear Combinations:
Reduce complexity: Instead of working with 10 original variables, you might use 2 or 3
factors.
Reveal structure: The combinations show what common patterns or traits the variables
share.
“In both principal components analysis and factor analysis, linear combinations (the factors)
of original variables are produced, and a small number of these combinations typically account
for the majority of the variability within the set of intercorrelations among the original
variables.” This means that
This is the main goal of PCA and Factor Analysis:
To reduce a large set of related variables into a smaller set of unobserved factors that still
capture the core information.
This paragraph compares Principal Components Analysis (PCA) and Factor Analysis (FA)
and explains why PCA is often used, especially in exploratory research. Here's a breakdown of
what it means:
14. 🔍 Key Concepts Explained
1. Variance vs. Covariance
PCA analyzes variance: It looks at all variability in the data, including:
o Shared variance (common between variables)
o Unique variance (specific to each variable)
o Error variance (random noise)
Factor analysis (FA) analyzes only shared variance: It ignores unique and error
variance because these don't help in identifying the underlying structure (factors).
📌 Summary: PCA = total variance; FA = only shared (common) variance.
2. Why PCA is Often Preferred in Exploratory Analysis
PCA is mathematically simpler and doesn’t assume a specific model about what causes
the variability.
When you don’t yet know the structure of your data (i.e., it’s exploratory), PCA helps
by extracting the maximum amount of variance using fewer components.
These components are orthogonal (i.e., uncorrelated with each other), which helps in
reducing redundancy in the data.
3. Terminology Difference
When PCA is used: the extracted results are called components.
When FA is used: the extracted results are called factors.
4. What Will Be Discussed Next
The text ends by saying that, although PCA and FA are different, the rest of the chapter will
focus only on PCA.
✅ So, What Does It Mean in Simple Terms?
PCA uses all the data's variability (even noise) to find new, uncorrelated variables (components)
that summarize the data best. In contrast, FA focuses only on what's shared between variables.
15. Since PCA is simpler and often better for finding patterns when you don't know much about the
data yet, it's the method they’ll focus on in this chapter.
Would you like me to turn this into a slide or visual summary for your presentation?
You said:
give me examples?
ChatGPT said:
Sure! Here are some simple, clear examples to help you understand the difference between
Principal Components Analysis (PCA) and Factor Analysis (FA), especially regarding
variance vs. shared variance.
🔸 Imagine You Collected Data on These 3 Variables:
Math Score
Science Score
Reading Score
You want to reduce the number of variables to better understand patterns or structure.
🧠 Principal Components Analysis (PCA) Example
Let’s say these three scores are:
Highly correlated (students who score high in one subject tend to score high in others).
But they also contain some noise and individual characteristics (like a student being
very good at science but weak in reading).
🟢 What PCA Does:
PCA creates new variables (components) by using all the variability in the scores.
Component 1 might be:
📘 0.6*Math + 0.6*Science + 0.6*Reading
→ This explains the most total variation in the data (including shared and unique parts).
Component 2 might capture what remains, like students who are better at reading than
math.
16. ✅ PCA uses all variance — shared, unique, and even error. It compresses data while keeping as
much of total information as possible.
🧠 Factor Analysis (FA) Example
Using the same test scores...
🔵 What FA Does:
FA only looks at shared variance — the part of variability that is common across
variables.
It assumes there's a hidden "factor" like General Academic Ability, which causes high
scores across all subjects.
Example Factor:
Factor 1: Academic Ability
📘 0.7*Math + 0.7*Science + 0.7*Reading
→ This factor tries to represent the underlying cause for the similarity between
variables.
The unique variance (like being unusually good at science) and error are ignored.
✅ FA is about discovering the underlying hidden causes or latent traits.
🧩 Summary Table
Feature PCA Factor Analysis (FA)
Focuses on Total variance (shared + unique + error) Shared variance only
Goal Data reduction Identify hidden factors (causes)
Uses All variability in the data Only common variability
Example use Simplifying data Finding constructs like intelligence
Result called Component Factor
Would you like me to turn this into a PowerPoint slide?
You said:
17. We can use factor analysis lonely without PCA?
ChatGPT said:
Yes, you can use factor analysis alone without using PCA — and in fact, Factor Analysis
(FA) and Principal Components Analysis (PCA) are two different techniques, even though
they are sometimes confused because they both reduce variables and often give similar-looking
results.
✅ You Can Use Factor Analysis Without PCA
Factor Analysis can be performed without ever using PCA, especially when:
✅ Your goal is to identify latent constructs or hidden variables (e.g., intelligence,
anxiety, motivation).
✅ You are not interested in reducing total variance, but rather in explaining the shared
variance.
✅ You want to model the underlying structure or cause of the correlations between
variables.
In this case, you would use:
Common Factor Analysis (also called Exploratory Factor Analysis or EFA),
and choose an extraction method such as:
o Principal Axis Factoring (PAF),
o Maximum Likelihood (ML),
o or others like Unweighted Least Squares (ULS).
❌ PCA Is Not Required for FA
PCA is sometimes used as an extraction method to get starting values or simplify the
process,
But it is not required, and not true factor analysis, because PCA includes all sources of
variance, while factor analysis does not.
🧠 Easy Analogy
18. PCA = compressing a photo to reduce size while keeping the full picture (all colors and
noise).
FA = removing only the background noise from a photo to focus on the subject (shared
features).
🔍 In Practice
Statistical software like SPSS or R lets you choose whether you want to use PCA or a "real"
factor analysis method (like Principal Axis Factoring).
Would you like me to show you how to choose these options in SPSS or add this explanation
to your PowerPoint?
You said:
when to use PCA and Factor Analysis?
ChatGPT said:
Great question! Choosing between Principal Components Analysis (PCA) and Factor
Analysis (FA) depends on your goal — whether you want to summarize data or reveal hidden
structures.
✅ When to Use PCA (Principal Components Analysis)
🔍 Use PCA when:
1. ✅ Your goal is data reduction — to reduce many variables into fewer components while
keeping as much of the total variance as possible.
2. ✅ You want to summarize patterns in the data, not discover hidden causes.
3. ✅ You do not assume any underlying latent factors (causes).
4. ✅ You need to prepare variables for further analysis (e.g., regression, clustering).
📊 Examples:
Reducing 20 survey items to 3 components to use in regression.
Simplifying image data (like in face recognition).
19. Identifying patterns in financial indicators.
✅ When to Use Factor Analysis (Exploratory FA)
🔍 Use FA when:
1. ✅ Your goal is to identify underlying latent variables (factors) that explain the shared
variance among observed variables.
2. ✅ You believe unobserved constructs cause the relationships between variables.
3. ✅ You want to build or validate a scale (e.g., measuring depression, anxiety,
motivation).
4. ✅ You are interested in theoretical interpretation of what each factor represents.
📊 Examples:
Discovering psychological traits from a personality questionnaire.
Finding underlying skills from academic test scores.
Understanding customer satisfaction dimensions from survey responses.
🔄 Summary Table
Feature PCA Factor Analysis (FA)
Goal Data reduction Discover hidden constructs
Type of variance
analyzed
Total variance (shared + unique +
error)
Shared variance only
Based on Variance Covariance
Underlying model? ❌ No ✅ Yes
Result names Components Factors
Interpretation Mathematical combinations
Latent psychological/social
factors
📝 Rule of Thumb:
20. ✅ Use PCA if your purpose is to reduce variables.
✅ Use Factor Analysis if your purpose is to explore or confirm a theoretical structure.
In Principal Component Analysis (PCA):
Yes, the components are always orthogonal (i.e., uncorrelated). This is a key characteristic of
PCA — each principal component is mathematically constructed to be uncorrelated with the
others.
🔹 In Factor Analysis (FA):
Not necessarily. The **factors in factor analysis can be either uncorrelated or correlated — it
depends on the rotation method used.
Here’s how it works:
✅ 1. Orthogonal Rotation (Uncorrelated Factors)
Example: Varimax rotation
Assumes that factors are not correlated with each other.
Simpler and easier to interpret, but may not reflect reality if constructs are related.
✅ 2. Oblique Rotation (Correlated Factors)
Examples: Promax, Oblimin
Allows factors to be correlated with each other.
More realistic in social sciences and psychology where constructs are usually related
(e.g., anxiety and depression).
🟡 Summary:
In PCA, components must be uncorrelated.
In Factor Analysis, factors can be:
o Uncorrelated (if using orthogonal rotation),
o Correlated (if using oblique rotation).