SlideShare a Scribd company logo
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
EXPLORING VARIABLE CLUSTERING
AND IMPORTANCE IN JMP
CHRIS GOTWALT AND RYAN PARKER
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
INTRODUCTION
• Variable clustering is a method that performs dimension reduction on the
number of input variables to be used in a predictive model.
• Reduces inputs by finding groups of similar variables so that a single variable
can represent each group.
• Helps reduce effects of collinearity on the input variables.
• Developed by SAS/STAT Development Director Warren Sarle.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
AN ITERATIVE ALGORITHM
• Iteratively splits and assigns variables to clusters.
• Sample iterations for variables in Wine Quality data set:
Iteration 1 Alcohol, Citric Acid, pH, Sugar, Sulfur Dioxide
Alcohol, Citric Acid, Sulfur Dioxide
Alcohol, Sugar
pH, Sulfur
Dioxide
pH, Sugar
Citric Acid
Iteration 2
Iteration 3
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
ALGORITHM DETAILS
• At each iteration the cluster with the largest second eigenvalue is split.
• Variables within this cluster are assigned to two new clusters based on each
variable’s correlation with the first two orthoblique rotated principal
components.
• After the split, variables from other clusters are reassigned to one of the new
clusters if they have a higher correlation with the new cluster.
• Ends when the second eigenvalue of all clusters is less than one.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
REDUCING EACH CLUSTER TO A SINGLE VARIABLE
pH
Sugar
pH
Citric
Acid
• Each cluster can be reduced to a single
variable for modeling.
• There are two ways to do this:
1. We can use the most representative
variable from each cluster.
2. Alternatively, the cluster component from
each cluster can be used.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
MOST REPRESENTATIVE VARIABLES
• These are variables that best represent each cluster.
• They have the highest correlation with the variables in its cluster.
• Most representative variables provide a clear interpretation when used.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
CLUSTER COMPONENTS
• New variables created using the first principal component of each cluster.
• Provide a way to combine variables in each cluster into a single variable.
• Similar to traditional principal components analysis (PCA) except that each
cluster component only uses variables from that cluster.
• Interpretation not as clear when compared to most representative variables.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
DEMO: IMPORTANT TERMS
• RSquare with Own Cluster
• The RSquare a variable has with variables in its cluster.
• RSquare with Next Closest
• The RSquare a variable has with variables in the next most similar cluster.
• 1-RSquare Ratio
• Relative similarity between a variable’s own cluster and the next closest cluster.
• Values should always be less than 1.
• Values greater than 1 indicate variable should be moved to the next closest cluster.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
INTRODUCTION
• Provides a general way to assess the importance of variables for predictive
models in JMP.
• Insight is in terms of practical significance of input variables.
• Based on functional decomposition ideas of I. M. Sobol.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
FUNCTIONAL DECOMPOSITION
• I. M. Sobol showed that we can decompose a function 𝑓(𝑋1, … , 𝑋 𝑝) into the
sum of lower dimensional inputs:
• 𝑓 𝑋1, … , 𝑋 𝑝 = 𝑓0 + 𝑓1 𝑋1 + ⋯ + 𝑓𝑝 𝑋 𝑝 + 𝑓12 𝑋1, 𝑋2 + ⋯
• Decomposition has a function for each 𝑋𝑖, each pair (𝑋𝑖, 𝑋𝑗), etc.
• The variability of these lower dimensional functions assess the importance of
the input variables.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
IMPORTANCE EFFECTS
• Assessment of variable importance is in terms of effect indices.
• These indices are numbers between 0 and 1 indicating relative importance.
• Main effect indices measure variability of predictions due to a single input.
• They do not account for interaction effects.
• Total effect indices measure the total variability of predictions due the input.
• Combines all main and higher order interaction effects.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
DISTRIBUTION OF INPUT VARIABLES
• Variability in predictions is due to the distribution of input variables
• JMP 11 provides three input variable distribution options:
1. Independent Uniform
2. Independent Resampled
3. Dependent Resampled
• Monte Carlo estimation procedure used for independent cases.
• 𝐾-nearest neighbors estimation used for dependent case.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
USE RESAMPLED INPUTS?
Uniform
Acceptable
Resampled
Needed
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
MARGINAL INFERENCE
Main Effects0.16 0.03
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
DEMO

More Related Content

PDF
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
PDF
Cluster Analysis for Dummies
PPTX
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
PDF
Statistical Discovery for Consumer and Marketing Research
PDF
Perk laskrant
PPTX
IKT előadás
PPTX
Localization 140704162405-phpapp02
PPTX
Cld 495 final
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
Cluster Analysis for Dummies
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
Statistical Discovery for Consumer and Marketing Research
Perk laskrant
IKT előadás
Localization 140704162405-phpapp02
Cld 495 final

Viewers also liked (16)

PDF
Advanced Use Cases of the Bootstrap Method in JMP Pro
PPTX
Vicios del lenguaje
PPTX
Tips mengadakan majlis perkahwinan ros
PPT
впн в россии
PDF
Webquest on output_devices[1]
PDF
Perk acties a6
PPTX
Photobooooooooth
PPT
Jeopardy (output devices)
PPTX
Washington presentation 3.1
PPTX
Localization with Mozilla
PPTX
Angloingles
PDF
Building Models for Complex Design of Experiments
PPTX
Washington, d.c. presentation
PDF
Correcting Misconceptions About Optimal Design
PPTX
Random Quiz Maker in C Language Project Slide
PDF
Lighting the-way: ESAB hybrid-laser-welding
Advanced Use Cases of the Bootstrap Method in JMP Pro
Vicios del lenguaje
Tips mengadakan majlis perkahwinan ros
впн в россии
Webquest on output_devices[1]
Perk acties a6
Photobooooooooth
Jeopardy (output devices)
Washington presentation 3.1
Localization with Mozilla
Angloingles
Building Models for Complex Design of Experiments
Washington, d.c. presentation
Correcting Misconceptions About Optimal Design
Random Quiz Maker in C Language Project Slide
Lighting the-way: ESAB hybrid-laser-welding
Ad

Similar to Exploring Variable Clustering and Importance in JMP (20)

PDF
The Bootstrap and Beyond: Using JSL for Resampling
PDF
Mixture DOE Webcast (1).pdf comcocmocmoo
PPTX
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
PPTX
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
PPTX
Design of experiments
PPTX
Regression Analysis by doctor kamau.pptx
PPTX
Regression Analysis.pptx
PPTX
Guide to Java.pptx
PPTX
need to realize in r studio (regression).pptx
PDF
Basic Design of Experiments Using the Custom DOE Platform
PPTX
Computer programming - variables constants operators expressions and statements
PDF
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
PDF
The Straight Way to a Final Result: Mixture Design of Experiments
PDF
166 - ISBSG variables most frequently used for software effort estimation: A ...
PDF
complex modelling.pdf which talks about complex modeling
PPTX
Topic 5 (multiple regression)
PPTX
Design of experiments formulation development exploring the best practices ...
PDF
1015 track2 abbott
PDF
1030 track2 abbott
PDF
Moderation and Meditation conducting in SPSS
The Bootstrap and Beyond: Using JSL for Resampling
Mixture DOE Webcast (1).pdf comcocmocmoo
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
Design of experiments
Regression Analysis by doctor kamau.pptx
Regression Analysis.pptx
Guide to Java.pptx
need to realize in r studio (regression).pptx
Basic Design of Experiments Using the Custom DOE Platform
Computer programming - variables constants operators expressions and statements
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
The Straight Way to a Final Result: Mixture Design of Experiments
166 - ISBSG variables most frequently used for software effort estimation: A ...
complex modelling.pdf which talks about complex modeling
Topic 5 (multiple regression)
Design of experiments formulation development exploring the best practices ...
1015 track2 abbott
1030 track2 abbott
Moderation and Meditation conducting in SPSS
Ad

More from JMP software from SAS (12)

PDF
A Primer in Statistical Discovery
PDF
Grafische Analyse Ihrer Excel Daten
PPTX
Building Better Models
PPTX
JMP for Ethanol Producers
PDF
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
PDF
Exploring Best Practises in Design of Experiments
PDF
Statistical and Predictive Modelling
PDF
Evaluating & Monitoring Your Process Using MSA & SPC
PDF
Everything You Wanted to Know About Definitive Screening Designs
PDF
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
PPTX
Introduction to Modeling
PDF
New Design of Experiments Features in JMP 11
A Primer in Statistical Discovery
Grafische Analyse Ihrer Excel Daten
Building Better Models
JMP for Ethanol Producers
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments
Statistical and Predictive Modelling
Evaluating & Monitoring Your Process Using MSA & SPC
Everything You Wanted to Know About Definitive Screening Designs
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Introduction to Modeling
New Design of Experiments Features in JMP 11

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.

Exploring Variable Clustering and Importance in JMP

  • 1. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. EXPLORING VARIABLE CLUSTERING AND IMPORTANCE IN JMP CHRIS GOTWALT AND RYAN PARKER
  • 2. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING INTRODUCTION • Variable clustering is a method that performs dimension reduction on the number of input variables to be used in a predictive model. • Reduces inputs by finding groups of similar variables so that a single variable can represent each group. • Helps reduce effects of collinearity on the input variables. • Developed by SAS/STAT Development Director Warren Sarle.
  • 3. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING AN ITERATIVE ALGORITHM • Iteratively splits and assigns variables to clusters. • Sample iterations for variables in Wine Quality data set: Iteration 1 Alcohol, Citric Acid, pH, Sugar, Sulfur Dioxide Alcohol, Citric Acid, Sulfur Dioxide Alcohol, Sugar pH, Sulfur Dioxide pH, Sugar Citric Acid Iteration 2 Iteration 3
  • 4. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING ALGORITHM DETAILS • At each iteration the cluster with the largest second eigenvalue is split. • Variables within this cluster are assigned to two new clusters based on each variable’s correlation with the first two orthoblique rotated principal components. • After the split, variables from other clusters are reassigned to one of the new clusters if they have a higher correlation with the new cluster. • Ends when the second eigenvalue of all clusters is less than one.
  • 5. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING REDUCING EACH CLUSTER TO A SINGLE VARIABLE pH Sugar pH Citric Acid • Each cluster can be reduced to a single variable for modeling. • There are two ways to do this: 1. We can use the most representative variable from each cluster. 2. Alternatively, the cluster component from each cluster can be used.
  • 6. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING MOST REPRESENTATIVE VARIABLES • These are variables that best represent each cluster. • They have the highest correlation with the variables in its cluster. • Most representative variables provide a clear interpretation when used.
  • 7. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING CLUSTER COMPONENTS • New variables created using the first principal component of each cluster. • Provide a way to combine variables in each cluster into a single variable. • Similar to traditional principal components analysis (PCA) except that each cluster component only uses variables from that cluster. • Interpretation not as clear when compared to most representative variables.
  • 8. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING DEMO: IMPORTANT TERMS • RSquare with Own Cluster • The RSquare a variable has with variables in its cluster. • RSquare with Next Closest • The RSquare a variable has with variables in the next most similar cluster. • 1-RSquare Ratio • Relative similarity between a variable’s own cluster and the next closest cluster. • Values should always be less than 1. • Values greater than 1 indicate variable should be moved to the next closest cluster.
  • 9. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE INTRODUCTION • Provides a general way to assess the importance of variables for predictive models in JMP. • Insight is in terms of practical significance of input variables. • Based on functional decomposition ideas of I. M. Sobol.
  • 10. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE FUNCTIONAL DECOMPOSITION • I. M. Sobol showed that we can decompose a function 𝑓(𝑋1, … , 𝑋 𝑝) into the sum of lower dimensional inputs: • 𝑓 𝑋1, … , 𝑋 𝑝 = 𝑓0 + 𝑓1 𝑋1 + ⋯ + 𝑓𝑝 𝑋 𝑝 + 𝑓12 𝑋1, 𝑋2 + ⋯ • Decomposition has a function for each 𝑋𝑖, each pair (𝑋𝑖, 𝑋𝑗), etc. • The variability of these lower dimensional functions assess the importance of the input variables.
  • 11. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE IMPORTANCE EFFECTS • Assessment of variable importance is in terms of effect indices. • These indices are numbers between 0 and 1 indicating relative importance. • Main effect indices measure variability of predictions due to a single input. • They do not account for interaction effects. • Total effect indices measure the total variability of predictions due the input. • Combines all main and higher order interaction effects.
  • 12. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE DISTRIBUTION OF INPUT VARIABLES • Variability in predictions is due to the distribution of input variables • JMP 11 provides three input variable distribution options: 1. Independent Uniform 2. Independent Resampled 3. Dependent Resampled • Monte Carlo estimation procedure used for independent cases. • 𝐾-nearest neighbors estimation used for dependent case.
  • 13. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE USE RESAMPLED INPUTS? Uniform Acceptable Resampled Needed
  • 14. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE MARGINAL INFERENCE Main Effects0.16 0.03
  • 15. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE DEMO