Interpretable Discriminative Dimensionality Reduction and Feature Selection on the Manifold

Interpretable Discriminative Dimensionality
Reduction and Feature Selection
on the Manifold
Babak Hosseini*, Barbara Hammer
*Bielefeld University (formerly)
Dortmund University (currently)
Twitter: @Babak_hss
ECML 2019, 19 September 2019

Babak Hosseini, Barbara Hammer ECML 2019, 19 September 2019
Outline:
• Introduction
• Proposed Method
• Experiments
• Conclusion

Dimensionality reduction (DR):
• Mapping:
• Visualization purpose
• Lower down data complexity

Relational representation:
• No vectorial representation 𝑿 anymore

DR on manifold:
dim. reduction
Input space
Relational rep.
Feature space
Projected space

Interpretation of the projection:
dim. reduction
?
?
Feature space
Projected space

Class-based Interpretation:
• Applicable to Kernel-based DR method

• Kernel-PCA
• Embedding dimensions
• Each 𝒖𝑖 is recont. from a selection of data
Q: all of them selected from one class?
• If Yes  dimension 𝒖𝑖 represents (or is related to) class q

• a & b: each dim. uses all classes
• c & d: each dim. uses all almost one class
• Separation of data in the label-space

Supervised K-based DR methods
• e.g.: K-FDA (kernel fisher discriminant analysis)
• Within-class (𝑆 𝑊) and between-class (𝑆 𝐵) covariance matrices
• Good class-separation
• Weak class-based interpretation

Notations:
• Training Matrix:
• Label matrix:
• Mapping to RKHS (rel. rep.)
• Embedding dimensions
• Embedding of 𝒙

Objectives:
• O1: Increasing the class-based interpretation of embedding dimensions.
• O2: The embedding should make the classes more separated in the LD space.
• O3: The classes should be locally more condensed in the embedded space.
• O4: Performing feature selection if a multiple kernel representation is provided.

Objectives:
• O1: Increasing the class-based interpretation of embedding dimensions.
• O2: The embedding should make the classes more separated in the LD space.
• O3: The classes should be locally more condensed in the embedded space.
• O4: Performing feature selection if a multiple kernel representation is provided.
Optimization framework:

Interpretability term (O1):
•
• Embedding vector:
1. 𝑎 𝑠𝑖, 𝑎 𝑡𝑖 non-zero  small
2. 𝑎 𝑠𝑖 = 0 or 𝑎 𝑡𝑖 = 0  large
Reconst.  close data points in RKHS
Smooth labeling in local neighborhoods

Inter-class dissimilarity (O2):
•
•
• Projected vectors
Goal:
• To reduce the similarity of 𝒙𝑖 and other classes in the embedded space

Intra-class similarity (O3):
•
•
• Works on non-zero entries in each 𝒂 𝑠 belonging to class(𝒙𝑖)
Goal:
• 𝒙𝑖: If 𝛾𝑠𝑖 is large  embedding dim 𝒖 𝑠: 𝒂 𝑠 is const. the class(𝒙𝑖)

Feature-selection (O4):
• m projections:
•
•
• Multiple-kernel representation of 𝑿

• : dim. 𝑚 in 𝒙
e.g.:
• multivariate time-series
• multi-view image data
• multi-domain information
• …
• Scaling of the RKHS:
Goal:
• Given the supervised information 𝑯
• 𝛽 𝑚 ≠ 0  dim. 𝑚 is chosen

Goal:
• Given the supervised information 𝑯
• 𝛽 𝑚 ≠ 0  dim. 𝑚 is chosen
• Injecting into the opt. framework
• affine-constraint + non-negativity cons.  interpretable solution 𝜷

Optimization scheme:
Convexity of the terms:
• PSD
• non-convex term (w.r.t. 𝑨):
• relaxation of the opt. problem
• Alternating opt. scheme

Optimization scheme:
• Close-form solution
• ADMM algorithm
• QP

Datasets:
Different domains:
• face, text, image, etc.
• UCI & feature-selection rep.
• A wide range of dimensions
Alternative methods:
• Supervised: K-FDA, LDR, SDR, KDR
• Unsupervised: JSE, S-KPCA, KEDR

Dimensionality reduction results:
• Classification accuracy (%)
• 1-nn classifier based on the projected data
• 10-fold CV

Dimensionality reduction results:

Interpretation of the embedding dimension:
• Interpretability measure 𝑰𝑖:
• becomes 𝟏 if 𝒂𝑖 is recon. using one class
• close to 𝟎. 𝟓 if 𝒂𝑖 is recon. using all the class

Interpretation of the embedding dimension:
• Projecting the emb. Dimensions on the label-space:
• 𝑳 = 𝑯𝑨

Feature selection result:
• MK representation of the data
• non-zero entries in beta:
• alternative methods:
• MKL algorithms: MKL-TR, MKL-DR, KNMF-MKL, and DMKL
• Classification accuracy &

Conclusion:
• A novel method for discriminative dimensionality reduction.
• Focused on the local neighborhoods in RKHS
• Aimed the class-based interpretation of the embedding dimensions.
• A good trade-off between interpretation and separation of classes.
• Feature-selection extension using multiple-kernel data representation.

•Thank you very much!
•Questions?
Twitter: @Babak_hss
3535

Interpretable Discriminative Dimensionality Reduction and Feature Selection on the Manifold

More Related Content

Recently uploaded (20)

Featured (20)

Interpretable Discriminative Dimensionality Reduction and Feature Selection on the Manifold

Editor's Notes