Curse of dimensionality
Introduction
In the world of data science, dealing with high-dimensional data
Principal Component Analysis (PCA)
PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables.
The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is sensitive to the relative scaling of the original variables, but it does not consider the response variable (in case of supervised learning).
Partial Least Squares (PLS)
PLS, on the other hand, is a method that bears some relation to principal components regression. Unlike PCA, it uses the response variable to guide the data compression process. Hence, it is particularly useful when we need to predict an outcome variable.
PLS attempts to find the multidimensional direction in the X space that explains the maximum multidimensional variance direction in the Y space. PLS regression is particularly suited when the matrix of predictors has more variables than observations, and when there is multicollinearity among X values.
PCA vs PLS
While both PCA and PLS are used for dimensionality reduction, the choice between the two depends on the specific problem at hand. PCA is a good choice when the goal is to reduce the dimensionality of independent variables without considering a dependent variable. On the other hand, PLS takes the dependent variable into account, which can result in better predictive performance
Choosing Your Champion 😊
So, PCA or PLS? It depends on your objective:
i. General Dimensionality Reduction. Go with PCA for overall data compression and improved algorithm performance. (default to this!)
ii. Prediction with High-Dimensional Data. Choose PLS if your primary goal is to predict a specific target variable while considering complex relationships between features.
Conclusion
In conclusion, both PCA and PLS serve as effective remedies for the curse of dimensionality. They allow us to simplify our data