SlideShare a Scribd company logo
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance
Analysis for Robust Deep Audio
Representation Learning
Hamid Eghbal-zadeh 1,2
, Matthias Dorfer 1
, Gerhard Widmer 1,2
1 2
Deep Within-Class Covariance
Analysis for Robust Deep Audio
Representation Learning
Hamid Eghbal-zadeh 1,2
, Matthias Dorfer 1
, Gerhard Widmer 1,2
1 2
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Motivation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
● Convolutional Neural Networks learn useful features and build good
representations
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
● Convolutional Neural Networks learn useful features and build good
representations
● CNNs are also known to generalize on the unseen data
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
● Convolutional Neural Networks learn useful features and build good
representations
● CNNs are also known to generalize on the unseen data
● Many of the benchmark datasets have similar train/test distributions
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
● Convolutional Neural Networks learn useful features and build good
representations
● CNNs are also known to generalize on the unseen data
● Many of the benchmark datasets have similar train/test distributions
● How about a distribution mismatch between training and test?
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Distribution mismatch:
When the distribution of the data in training and validation sets differ from
the test set
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Distribution mismatch:
When the distribution of the data in training and validation sets differ from
the test set
● Speaker Recognition: Training on English, testing on Chinese
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Distribution mismatch:
When the distribution of the data in training and validation sets differ from
the test set
● Speaker Recognition: Training on English, testing on Chinese
● Acoustic Scene Classification: Training on Scenes in one country, testing on
scenes of another country, in another period of time
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Distribution mismatch:
When the distribution of the data in training and validation sets differ from
the test set
● Speaker Recognition: Training on English, testing on Chinese
● Acoustic Scene Classification: Training on Scenes in one country, testing on
scenes of another country, in another period of time
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Performance of end-to-end CNNs (no mismatch vs mismatched):
● We use DCASE2016 (no mismatch) and DCASE2017 (mismatched) datasets1
● Same training and validation, different test set
● Look at several end-to-end CNNs
1) Detection and Classification of Acoustic Scenes and Events, http://dcase.community
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Covariance Analysis of
the representation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Covariance Eigenvalue Analysis:
● We train a VGG network on No mismatch and Mismatched using
spectrograms
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Covariance Eigenvalue Analysis:
● We train a VGG network on No mismatch and Mismatched using
spectrograms
● We analyse the internal representation of the VGG
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Covariance Eigenvalue Analysis:
● We train a VGG network on No mismatch and Mismatched using
spectrograms
● We analyse the internal representation of the VGG
● We use covariance analysis
○ Eigen-values of the covariances matrix
○ Visualisation of the representations projected via PCA
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Nomismatch
Covariance Eigenvalue Analysis:
Train Test
Mismatched
Validation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
NomismatchVisualisation of the VGG representations:
Train Validation Test
Mismatched
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Within-Class Covariance
Normalisation (WCCN)
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Within-Class Covariance Normalization1,2
:
● Proposed for Speaker Recognition to reduce the false
positive/negatives
1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken
language processing. 2006.
2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and
Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Within-Class Covariance Normalization1,2
:
● Proposed for Speaker Recognition to reduce the false
positive/negatives
● Used to reduce the within-class variability in features such as
GMM supervectors or i-vector features
1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken
language processing. 2006.
2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and
Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Within-Class Covariance Normalization1,2
:
1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken
language processing. 2006.
2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and
Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class
Covariance Analysis
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
● Can be placed anywhere to reduce the within-class variability
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
● Can be placed anywhere to reduce the within-class variability
● B in training is equal to Bb
in forward pass
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
● Can be placed anywhere to reduce the within-class variability
● B in training is equal to Bb
in forward pass
● Gradients wrt B are computed and used in backward pass
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
● Can be placed anywhere to reduce the within-class variability
● B in training is equal to Bb
in forward pass
● Gradients wrt B are computed and used in backward pass
● A running average is computed for test time (similar to batchnorm)
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
● Can be placed anywhere to reduce the within-class variability
● B in training is equal to Bb
in forward pass
● Gradients wrt B are computed and used in backward pass
● A running average is computed for test time (similar to batchnorm)
● Compatible with different supervised
tasks (Classification, Detection,
metric learning...) and data (raw audio...)
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Deep Within-Class Covariance Analysis (DWCCA):
● A deep learning compatible version of WCCN
● A statistical DL layer, trained end-to-end using SGD with minibatches
● Can be placed anywhere to reduce the within-class variability
● B in training is equal to Bb
in forward pass
● Gradients wrt B are computed and used in backward pass
● A running average is computed for test time (similar to batchnorm)
● Compatible with different supervised
tasks (Classification, Detection,
metric learning...) and data (raw audio...)
● Can be used with different supervised
losses (CCE, BCE, l2
, ...)
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Results
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Nomismatch
Within-Class Covariance Eigenvalue Analysis (Without DWCCA):
Train Validation Test
Mismatched
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Nomismatch
Within-Class Covariance Eigenvalue Analysis (With DWCCA):
Train Test
Mismatched
Validation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Nomismatch
Eigenvalue Analysis (With vs without DWCCA):
Train Test
Mismatched
Validation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Nomismatch
K-NN classification results on VGG representations
Validation Test
Mismatched
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
*: Single model, Single-channel features
: Multi-channel features
:Ensemble of various models
NomismatchMismatched
End-to-end classification:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
*: Single model, Single-channel features
: Multi-channel features
:Ensemble of various models
NomismatchMismatched
End-to-end classification:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
*: Single model, Single-channel features
: Multi-channel features
:Ensemble of various models
NomismatchMismatched
End-to-end classification:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
*: Single model, Single-channel features
: Multi-channel features
:Ensemble of various models
NomismatchMismatched
End-to-end classification:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
*: Single model, Single-channel features
: Multi-channel features
:Ensemble of various models
NomismatchMismatched
End-to-end classification:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
MismatchedNo mismatch
End-to-end class-wise F1:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
MismatchedNo mismatch
End-to-end class-wise F1:
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Summary
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Summary:
● We analysed covariance of the representations in a VGG
network
Nomismatch
Train Test
Mismatched
Validation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Summary:
● We analysed covariance of the representations in a VGG
network
● We showed that the more mismatch there is between
training and test, the more within-class variability increases
in the representation Nomismatch
Train Test
Mismatched
Validation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Summary:
● We analysed covariance of the representations in a VGG
network
● We showed that the more mismatch there is between
training and test, the more within-class variability increases
in the representation
● We proposed Deep Within-class Covariance Analysis, a
deep learning compatible layer capable of significantly
reducing within-class variability of a network’s
representation
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Summary:
● We analysed covariance of the representations in a VGG
network
● We showed that the more mismatch there is between
training and test, the more within-class variability increases
in the representation
● We proposed Deep Within-class Covariance Analysis, a
deep learning compatible layer capable of significantly
reducing within-class variability of a network’s
representation
● We empirically showed that DWCCA improves the
generalisation when the training and test have mismatched
distributions.
Nomismatch
Validation Test
Mismatched
Motivation Covariance Analysis WCCN DWCCA Results Summary
Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning
Thank you for your attention!
Come to the poster for more
discussions.
hamid.eghbal-zadeh@jku.at
heghbalz

More Related Content

PDF
Using a Manifold Vocoder for Spectral Voice and Style Conversion
PDF
Exploiting Distributional Semantic Models in Question Answering
PDF
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
PDF
Deep Learning, an interactive introduction for NLP-ers
PDF
Multi modal retrieval and generation with deep distributed models
PPTX
Presentation of eQNet to LRE subcommittee october 11, 2011
PDF
BoysTownJobTalk
PPTX
End-to-End Task-Completion Neural Dialogue Systems
Using a Manifold Vocoder for Spectral Voice and Style Conversion
Exploiting Distributional Semantic Models in Question Answering
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Deep Learning, an interactive introduction for NLP-ers
Multi modal retrieval and generation with deep distributed models
Presentation of eQNet to LRE subcommittee october 11, 2011
BoysTownJobTalk
End-to-End Task-Completion Neural Dialogue Systems

Similar to Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning (13)

PDF
Stable Diffusion path
PDF
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
PDF
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
PDF
Automatic Quality Assessment for Speech and Beyond
PDF
Introduction to deep learning based voice activity detection
PDF
Audio and Vision (D4L6 2017 UPC Deep Learning for Computer Vision)
PPT
Presentation based on the paper ”How Far Are We from Robust Voice Conversion:...
PPTX
Vitaly Bondar: Decoding Stable Diffusion: a journey through key concepts (UA)
PDF
Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Net...
PDF
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
PDF
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
PPTX
From Semantics to Self-supervised Learning for Speech and Beyond
PDF
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Stable Diffusion path
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
Automatic Quality Assessment for Speech and Beyond
Introduction to deep learning based voice activity detection
Audio and Vision (D4L6 2017 UPC Deep Learning for Computer Vision)
Presentation based on the paper ”How Far Are We from Robust Voice Conversion:...
Vitaly Bondar: Decoding Stable Diffusion: a journey through key concepts (UA)
Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Net...
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
From Semantics to Self-supervised Learning for Speech and Beyond
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Ad

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
BIOMOLECULES PPT........................
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
2. Earth - The Living Planet earth and life
PDF
The scientific heritage No 166 (166) (2025)
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
7. General Toxicologyfor clinical phrmacy.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Introduction to Cardiovascular system_structure and functions-1
Derivatives of integument scales, beaks, horns,.pptx
protein biochemistry.ppt for university classes
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
2. Earth - The Living Planet Module 2ELS
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
BIOMOLECULES PPT........................
The KM-GBF monitoring framework – status & key messages.pptx
Placing the Near-Earth Object Impact Probability in Context
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Comparative Structure of Integument in Vertebrates.pptx
2. Earth - The Living Planet earth and life
The scientific heritage No 166 (166) (2025)
INTRODUCTION TO EVS | Concept of sustainability
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Ad

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning

  • 2. Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Hamid Eghbal-zadeh 1,2 , Matthias Dorfer 1 , Gerhard Widmer 1,2 1 2
  • 3. Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Hamid Eghbal-zadeh 1,2 , Matthias Dorfer 1 , Gerhard Widmer 1,2 1 2
  • 4. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Motivation
  • 5. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations
  • 6. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations ● CNNs are also known to generalize on the unseen data
  • 7. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations ● CNNs are also known to generalize on the unseen data ● Many of the benchmark datasets have similar train/test distributions
  • 8. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations ● CNNs are also known to generalize on the unseen data ● Many of the benchmark datasets have similar train/test distributions ● How about a distribution mismatch between training and test?
  • 9. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set
  • 10. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set ● Speaker Recognition: Training on English, testing on Chinese
  • 11. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set ● Speaker Recognition: Training on English, testing on Chinese ● Acoustic Scene Classification: Training on Scenes in one country, testing on scenes of another country, in another period of time
  • 12. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set ● Speaker Recognition: Training on English, testing on Chinese ● Acoustic Scene Classification: Training on Scenes in one country, testing on scenes of another country, in another period of time
  • 13. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Performance of end-to-end CNNs (no mismatch vs mismatched): ● We use DCASE2016 (no mismatch) and DCASE2017 (mismatched) datasets1 ● Same training and validation, different test set ● Look at several end-to-end CNNs 1) Detection and Classification of Acoustic Scenes and Events, http://dcase.community
  • 14. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Analysis of the representation
  • 15. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Eigenvalue Analysis: ● We train a VGG network on No mismatch and Mismatched using spectrograms
  • 16. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Eigenvalue Analysis: ● We train a VGG network on No mismatch and Mismatched using spectrograms ● We analyse the internal representation of the VGG
  • 17. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Eigenvalue Analysis: ● We train a VGG network on No mismatch and Mismatched using spectrograms ● We analyse the internal representation of the VGG ● We use covariance analysis ○ Eigen-values of the covariances matrix ○ Visualisation of the representations projected via PCA
  • 18. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Covariance Eigenvalue Analysis: Train Test Mismatched Validation
  • 19. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning NomismatchVisualisation of the VGG representations: Train Validation Test Mismatched
  • 20. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalisation (WCCN)
  • 21. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalization1,2 : ● Proposed for Speaker Recognition to reduce the false positive/negatives 1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken language processing. 2006. 2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
  • 22. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalization1,2 : ● Proposed for Speaker Recognition to reduce the false positive/negatives ● Used to reduce the within-class variability in features such as GMM supervectors or i-vector features 1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken language processing. 2006. 2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
  • 23. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalization1,2 : 1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken language processing. 2006. 2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
  • 24. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis
  • 25. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN
  • 26. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches
  • 27. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability
  • 28. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass
  • 29. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass
  • 30. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass ● A running average is computed for test time (similar to batchnorm)
  • 31. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass ● A running average is computed for test time (similar to batchnorm) ● Compatible with different supervised tasks (Classification, Detection, metric learning...) and data (raw audio...)
  • 32. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass ● A running average is computed for test time (similar to batchnorm) ● Compatible with different supervised tasks (Classification, Detection, metric learning...) and data (raw audio...) ● Can be used with different supervised losses (CCE, BCE, l2 , ...)
  • 33. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Results
  • 34. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Within-Class Covariance Eigenvalue Analysis (Without DWCCA): Train Validation Test Mismatched
  • 35. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Within-Class Covariance Eigenvalue Analysis (With DWCCA): Train Test Mismatched Validation
  • 36. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Eigenvalue Analysis (With vs without DWCCA): Train Test Mismatched Validation
  • 37. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch K-NN classification results on VGG representations Validation Test Mismatched
  • 38. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  • 39. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  • 40. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  • 41. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  • 42. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  • 43. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning MismatchedNo mismatch End-to-end class-wise F1:
  • 44. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning MismatchedNo mismatch End-to-end class-wise F1:
  • 45. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary
  • 46. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network Nomismatch Train Test Mismatched Validation
  • 47. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network ● We showed that the more mismatch there is between training and test, the more within-class variability increases in the representation Nomismatch Train Test Mismatched Validation
  • 48. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network ● We showed that the more mismatch there is between training and test, the more within-class variability increases in the representation ● We proposed Deep Within-class Covariance Analysis, a deep learning compatible layer capable of significantly reducing within-class variability of a network’s representation
  • 49. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network ● We showed that the more mismatch there is between training and test, the more within-class variability increases in the representation ● We proposed Deep Within-class Covariance Analysis, a deep learning compatible layer capable of significantly reducing within-class variability of a network’s representation ● We empirically showed that DWCCA improves the generalisation when the training and test have mismatched distributions. Nomismatch Validation Test Mismatched
  • 50. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Thank you for your attention! Come to the poster for more discussions. hamid.eghbal-zadeh@jku.at heghbalz