SlideShare a Scribd company logo
6
Most read
9
Most read
12
Most read
Dimension
Reduction
CS5122 DESCRIPTIVE & PREDICTIVE ANALYTICS
DILUM BANDARA
DILUM.BANDARA@UOM.LK
Recommender Systems
Use knowledge about preference of a group of users about a
certain items & help predict the interest level for other users
from same group
Collaborative filtering
◦ Widely used method for recommender systems
◦ Tries to find traits of shared interest among users in a group to
help predict likes & dislikes of other users within the group
2
Source: Roberto Mirizzi
Methods Employed for Netflix
Prize Problem
Nearest Neighbor methods
◦ k-NN with variations
Matrix factorization
◦ Probabilistic Latent Semantic Analysis
◦ Probabilistic Matrix Factorization
◦ Expectation Maximization for Matrix Factorization
◦ Singular Value Decomposition
◦ Regularized Matrix Factorization
3
Dimension Reduction
Statistical methods that provide information about
point scattered in multivariate space
◦ Simplify complex relationships between cases and/or
variables
◦ Makes it easier to recognize patterns by
◦ Identify & describe dimensions that underlie input data
◦ Identifying sets of variables with similar behavior & use only a few
of them
4
Consider a 2D scatter of points that show a
high degree of correlation …
x
y
bar-x
bar-y
orthogonal
regression…
Rotated Data
6
1st var. may capture so much of the
information content in original dataset
that we can ignore remaining axis
length
width
“size”
“shape”
Principal Components Analysis
(PCA)
Why?
•Clarify relationships among variables
•Clarify relationships among cases
When?
•Significant correlations exist among variables
How?
•Define new axes (components)
•Examine correlation between axes & variables
•Find scores of cases on new axes
9
r = 0
r = -1
r = 1
x4
x3
x2
x1
pc2
pc1
component
loading
eigenvalue: sum of all
squared loadings on one
component
Eigenvalues
Sum of all eigenvalues = 100% of variance in original
data
Proportion accounted for by each eigenvalue = ev/n
◦ n = # of vars
Correlation matrix; variance in each variable = 1
◦ If an eigenvalue < 1, it explains less variance than one of
original variables
◦ But .7 may be a better threshold…
‘Scree plots’ – show trade-off between loss of
information, & simplification
R Example
If range of each variable is very different data need
to be 1st scaled
◦ Else, larger variables will have an impact on final
result
Examples
◦ Flower dimension dataset
◦ Panel Survey of Income Dynamics
12

More Related Content

PPTX
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
PPTX
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
PPT
Download
PPT
Download
PDF
Unit-3 Data Analytics.pdf
PPTX
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
Download
Download
Unit-3 Data Analytics.pdf
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx

Similar to Introduction to Dimension Reduction with PCA (20)

PPTX
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
PPTX
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
PDF
Matrix Factorization In Recommender Systems
PPT
Lecture1 dosen mengabdi untuk negeri _jps.ppt
PDF
Data Tactics Data Science Brown Bag (April 2014)
PDF
Data Science and Analytics Brown Bag
PDF
CLUSTERING IN DATA MINING.pdf
PPTX
Introduction to database management system
PDF
Ijartes v1-i2-006
PDF
4-RSSI-Spectral Domain Image Transforms_1.pdf
PPTX
Data mining approaches and methods
PDF
Model Evaluation in the land of Deep Learning
PDF
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
PDF
F017132529
PDF
Performance Analysis of Different Clustering Algorithm
PPT
Lecture1_jps.ppt
PPT
Lecture1_jps (1).ppt
PPTX
Large Scale Data Clustering: an overview
PPT
Cs501 cluster analysis
PDF
algoritma klastering.pdf
PCA-LDA-Lobo.pptxttvertyuytreiopkjhgftfv
EDAB - Principal Components Analysis and Classification -Module - 5.pptx
Matrix Factorization In Recommender Systems
Lecture1 dosen mengabdi untuk negeri _jps.ppt
Data Tactics Data Science Brown Bag (April 2014)
Data Science and Analytics Brown Bag
CLUSTERING IN DATA MINING.pdf
Introduction to database management system
Ijartes v1-i2-006
4-RSSI-Spectral Domain Image Transforms_1.pdf
Data mining approaches and methods
Model Evaluation in the land of Deep Learning
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
F017132529
Performance Analysis of Different Clustering Algorithm
Lecture1_jps.ppt
Lecture1_jps (1).ppt
Large Scale Data Clustering: an overview
Cs501 cluster analysis
algoritma klastering.pdf
Ad

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
PPTX
Introduction to Machine Learning
PPTX
Time Series Analysis and Forecasting in Practice
PPTX
Introduction to Descriptive & Predictive Analytics
PPTX
Introduction to Concurrent Data Structures
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
PPTX
Introduction to Map-Reduce Programming with Hadoop
PPTX
Embarrassingly/Delightfully Parallel Problems
PPTX
Introduction to Warehouse-Scale Computers
PPTX
Introduction to Thread Level Parallelism
PPTX
CPU Memory Hierarchy and Caching Techniques
PPTX
Data-Level Parallelism in Microprocessors
PDF
Instruction Level Parallelism – Hardware Techniques
PPTX
Instruction Level Parallelism – Compiler Techniques
PPTX
CPU Pipelining and Hazards - An Introduction
PPTX
Advanced Computer Architecture – An Introduction
PPTX
High Performance Networking with Advanced TCP
PPTX
Introduction to Content Delivery Networks
PPTX
Peer-to-Peer Networking Systems and Streaming
PPTX
Mobile Services
Designing for Multiple Blockchains in Industry Ecosystems
Introduction to Machine Learning
Time Series Analysis and Forecasting in Practice
Introduction to Descriptive & Predictive Analytics
Introduction to Concurrent Data Structures
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Introduction to Map-Reduce Programming with Hadoop
Embarrassingly/Delightfully Parallel Problems
Introduction to Warehouse-Scale Computers
Introduction to Thread Level Parallelism
CPU Memory Hierarchy and Caching Techniques
Data-Level Parallelism in Microprocessors
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Compiler Techniques
CPU Pipelining and Hazards - An Introduction
Advanced Computer Architecture – An Introduction
High Performance Networking with Advanced TCP
Introduction to Content Delivery Networks
Peer-to-Peer Networking Systems and Streaming
Mobile Services
Ad

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Electronic commerce courselecture one. Pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Electronic commerce courselecture one. Pdf
Understanding_Digital_Forensics_Presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25 Week I
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf

Introduction to Dimension Reduction with PCA

  • 1. Dimension Reduction CS5122 DESCRIPTIVE & PREDICTIVE ANALYTICS DILUM BANDARA DILUM.BANDARA@UOM.LK
  • 2. Recommender Systems Use knowledge about preference of a group of users about a certain items & help predict the interest level for other users from same group Collaborative filtering ◦ Widely used method for recommender systems ◦ Tries to find traits of shared interest among users in a group to help predict likes & dislikes of other users within the group 2 Source: Roberto Mirizzi
  • 3. Methods Employed for Netflix Prize Problem Nearest Neighbor methods ◦ k-NN with variations Matrix factorization ◦ Probabilistic Latent Semantic Analysis ◦ Probabilistic Matrix Factorization ◦ Expectation Maximization for Matrix Factorization ◦ Singular Value Decomposition ◦ Regularized Matrix Factorization 3
  • 4. Dimension Reduction Statistical methods that provide information about point scattered in multivariate space ◦ Simplify complex relationships between cases and/or variables ◦ Makes it easier to recognize patterns by ◦ Identify & describe dimensions that underlie input data ◦ Identifying sets of variables with similar behavior & use only a few of them 4
  • 5. Consider a 2D scatter of points that show a high degree of correlation … x y bar-x bar-y orthogonal regression…
  • 6. Rotated Data 6 1st var. may capture so much of the information content in original dataset that we can ignore remaining axis
  • 9. Principal Components Analysis (PCA) Why? •Clarify relationships among variables •Clarify relationships among cases When? •Significant correlations exist among variables How? •Define new axes (components) •Examine correlation between axes & variables •Find scores of cases on new axes 9
  • 10. r = 0 r = -1 r = 1 x4 x3 x2 x1 pc2 pc1 component loading eigenvalue: sum of all squared loadings on one component
  • 11. Eigenvalues Sum of all eigenvalues = 100% of variance in original data Proportion accounted for by each eigenvalue = ev/n ◦ n = # of vars Correlation matrix; variance in each variable = 1 ◦ If an eigenvalue < 1, it explains less variance than one of original variables ◦ But .7 may be a better threshold… ‘Scree plots’ – show trade-off between loss of information, & simplification
  • 12. R Example If range of each variable is very different data need to be 1st scaled ◦ Else, larger variables will have an impact on final result Examples ◦ Flower dimension dataset ◦ Panel Survey of Income Dynamics 12