Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning

Deep Within-Class Covariance
Analysis for Robust Deep Audio
Representation Learning
Hamid Eghbal-zadeh 1,2
, Matthias Dorfer 1
, Gerhard Widmer 1,2
1 2

Motivation Covariance Analysis WCCN DWCCA Results Summary
Motivation

● Convolutional Neural Networks learn useful features and build good
representations

representations
● CNNs are also known to generalize on the unseen data

representations
● Many of the benchmark datasets have similar train/test distributions

representations
● Many of the benchmark datasets have similar train/test distributions
● How about a distribution mismatch between training and test?

Distribution mismatch:
When the distribution of the data in training and validation sets differ from
the test set

the test set
● Speaker Recognition: Training on English, testing on Chinese

the test set
● Speaker Recognition: Training on English, testing on Chinese
● Acoustic Scene Classification: Training on Scenes in one country, testing on
scenes of another country, in another period of time

Performance of end-to-end CNNs (no mismatch vs mismatched):
● We use DCASE2016 (no mismatch) and DCASE2017 (mismatched) datasets1
● Same training and validation, different test set
● Look at several end-to-end CNNs
1) Detection and Classification of Acoustic Scenes and Events, http://dcase.community

Covariance Analysis of
the representation

Covariance Eigenvalue Analysis:
● We train a VGG network on No mismatch and Mismatched using
spectrograms

spectrograms
● We analyse the internal representation of the VGG

spectrograms
● We analyse the internal representation of the VGG
● We use covariance analysis
○ Eigen-values of the covariances matrix
○ Visualisation of the representations projected via PCA

Nomismatch
Train Test
Mismatched
Validation

NomismatchVisualisation of the VGG representations:
Train Validation Test
Mismatched

Within-Class Covariance
Normalisation (WCCN)

Within-Class Covariance Normalization1,2
:
● Proposed for Speaker Recognition to reduce the false
positive/negatives
1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken
language processing. 2006.
2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and
Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.

:
● Proposed for Speaker Recognition to reduce the false
positive/negatives
● Used to reduce the within-class variability in features such as
GMM supervectors or i-vector features

:

Deep Within-Class
Covariance Analysis

Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN

● A statistical DL layer, trained end-to-end using SGD with minibatches

● Can be placed anywhere to reduce the within-class variability

● B in training is equal to Bb
in forward pass

in forward pass
● Gradients wrt B are computed and used in backward pass

in forward pass
● A running average is computed for test time (similar to batchnorm)

in forward pass
● Compatible with different supervised
tasks (Classification, Detection,
metric learning...) and data (raw audio...)

in forward pass
● Compatible with different supervised
tasks (Classification, Detection,
metric learning...) and data (raw audio...)
● Can be used with different supervised
losses (CCE, BCE, l2
, ...)

Results

Nomismatch
Within-Class Covariance Eigenvalue Analysis (Without DWCCA):
Train Validation Test
Mismatched

Nomismatch
Within-Class Covariance Eigenvalue Analysis (With DWCCA):
Train Test
Mismatched
Validation

Nomismatch
Eigenvalue Analysis (With vs without DWCCA):
Train Test
Mismatched
Validation

Nomismatch
K-NN classification results on VGG representations
Validation Test
Mismatched

*: Single model, Single-channel features
: Multi-channel features
:Ensemble of various models
NomismatchMismatched
End-to-end classification:

MismatchedNo mismatch
End-to-end class-wise F1:

Summary

Summary:
● We analysed covariance of the representations in a VGG
network
Nomismatch
Train Test
Mismatched
Validation

Summary:
network
● We showed that the more mismatch there is between
training and test, the more within-class variability increases
in the representation Nomismatch
Train Test
Mismatched
Validation

Summary:
network
in the representation
● We proposed Deep Within-class Covariance Analysis, a
deep learning compatible layer capable of significantly
reducing within-class variability of a network’s
representation

Summary:
network
in the representation
● We proposed Deep Within-class Covariance Analysis, a
deep learning compatible layer capable of significantly
reducing within-class variability of a network’s
representation
● We empirically showed that DWCCA improves the
generalisation when the training and test have mismatched
distributions.
Nomismatch
Validation Test
Mismatched

Thank you for your attention!
Come to the poster for more
discussions.
hamid.eghbal-zadeh@jku.at
heghbalz

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning

More Related Content

Similar to Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning (13)

Recently uploaded (20)

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning