Deep Learning of High-Level Representations

Deep Learning of High-Level Representations
Hamid Eghbal-zadeh
hamid.eghbal-zadeh@jku.at
Workshop on the Application of
Deep Learning for SV/CNV annotation
March, 2018, Labdia, Vienna

Outline
1. Motivation
2. Supervised Learning
a. Convolutional Layers
b. Multimodal Learning
c. Statistical Layers
d. Attention Mechanism
3. Unsupervised Learning
a. Objectives
b. Generative Adversarial Networks
2/36

Motivation
• Representation Learning
– The features
– Factors and Causes
– High-level concepts
4/36

Motivation
• Classic ML
Raw data Handcrafted features Machine Learning
Raw data
Neural Networks
5/36
• Modern ML
[Alam 2018]

Motivation
• A Good Representation [Bengio 2013]
– Captures posterior belief
about explanatory causes
– Disentangles the factors of
variation
6/36

• Training with labeled data
– Feature learning (Encoder)
– Classifier (Decoder)
– End-to-end training
Supervised Learning
8/36

Deep Neural Networks with Convolutional Layers
9/36

DNNs with Convolutional Layers
• Convolutional Layers [LeCun 1995]
Feature mapsInput image
Pooling
(Down-sampling)
10/36

• CNNs for Acoustic Scene Classification
IEEE DCASE-2016 challenge
(http://dcase.community)
11/36

• CNNs for ASC [Eghbal-zadeh 2016,
arXiv:1706.06525]
Feature maps
Input
spectrograms
Pooling
(Down-sampling) Global Average Pooling
Softmax output
probabilities
Trained CCE loss,
optimized with SGD
12/36

• Hybrid: CNNs + Factor Analysis for ASC
[Eghbal-zadeh 2016, arXiv:1706.06525]
Spectrograms
Engineered features
(MFCCs)
Deep CNN Predicted
probabilities
I-Vectors
(Unsupervised
Representation Learning
with Factor Analysis)
Predicted
probabilities
Late Fusion
(Linear Regression)
FeatureengineeringFeaturelearning
13/36

• CNNs + Factor Analysis Hybrid for ASC
14/36

Multimodal Learning
16/36
• Audio-Sheet Music Correspondences [Dorfer 2017]

Statistical Layers for Deep Neural Networks
17/36

Statistical Layers for DNNs
• Apply specialized processing methods in an
end-to-end fashion
statisticallayer
18/36

• Deep Within-Class Covariance Analysis
DWCCA
19/36

B =
Computational graph for B is differentiable,
therefore DWCCA layer can be trained
end-to-end with SGD and backpropagation.
DWCCA
20/36

No DWCCA W/ DWCCAEigenvalues of Cov
21/36

Attention Mechanism
Attention Mechanism allows the decoder part to attend
to different parts of the learned representation.
Sequence to sequence Autoencoders:
23/36

Attention Mechanism
Predicting strong labels from weak labels in acoustic
event detection [DCASE-2017/2018 Task4].
Training: Testing:
24/36

Attention Mechanism
Predicting strong labels from weak labels in acoustic
event detection [Xu 2017].
25/36

Unsupervised Learning
• Objectives for unsupervised learning
– Compression-Reconstruction
(Autoencoders, ...)
– Local similarity (between adjacent frames)
• Defined distance
– L1, L2
– Wasserstein
• Learn the distance
– Adversarial training
27/36

• Generative Adversarial Networks [Goodfellow
2014]
Discriminator
Generator
fake
real
Real/Fake?
How to fool
Discriminator...
How to catch
Generator….
Generator learns to generate images that Discriminator
can not distinguish from real images
random vector
28/36

• Generative Adversarial Networks [Goodfellow
2014]
– Learn a Generator for data augmentation
[Antoniou 2017]
– Learn image features in the discriminator
[Radford 2016]
– Design new adversarial objectives for
unsupervised/semi-supervised learning
(Bi-directional GANs [Donahue et al])
29/36

• Probabilistic Generative Adversarial Networks
– We integrate a probabilistic model inside
the discriminator with a GMM
– Using Gauss. lk instead of classifier
– We tackle the mode-collapse problem
• When generator generates only
some of the classes (modes) in data
30/36

– Creates clusters in the discriminator
• compares real clusters vs fake clusters
– Draws fake clusters towards real clusters
31/36

32/36

CIFAR-10 CelebA Fashion-MNIST
33/36

[1] Deep Learning of Representations: a AAAI 2013 Tutorial, Yoshua Bengio
[2] Cover, Thomas M., and Joy A. Thomas. "Elements of information theory 2nd edition." (2006).
[3] Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint
arXiv:1703.00810 (2017).
[4] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and
neural networks 3361.10 (1995): 1995.
[5] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556 (2014).
[6] Eghbal-Zadeh, Hamid, et al. "CP-JKU submissions for DCASE-2016: A hybrid approach using binaural i-vectors and deep
convolutional neural networks." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)
(2016).
[7] Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen. "TUT database for acoustic scene classification and sound event
detection." Signal Processing Conference (EUSIPCO), 2016 24th European. IEEE, 2016.
[8] Eghbal-zadeh, Hamid, Matthias Dorfer, and Gerhard Widmer. "Deep Within-Class Covariance Analysis for Acoustic Scene
Classification." arXiv preprint arXiv:1711.04022 (2017).
[9] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
References
35/36

[10] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative
adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
[11] Donahue, Jeff, Philipp Krähenbühl, and Trevor Darrell. "Adversarial feature learning." arXiv preprint arXiv:1605.09782 (2016).
[12] Dumoulin, Vincent, et al. "Adversarially learned inference." arXiv preprint arXiv:1606.00704 (2016).
[13] Eghbal-zadeh, Hamid, and Gerhard Widmer. "Probabilistic Generative Adversarial Networks." arXiv preprint arXiv:1708.01886
(2017).
[14] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. "Data Augmentation Generative Adversarial Networks." arXiv preprint
arXiv:1711.04340 (2017).
[15] Xu, Yong, et al. "Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging."
arXiv preprint arXiv:1703.06052 (2017).
[16] Md Zahangir AlomMd Zahangir AlomTarek M. TahaChristopher YakopcicStefan WestbergVijayan K. AsariVijayan K., “The History
Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches”, Asari,arXiv:1803.01164 (2018).
[17] Dorfer, Matthias, Andreas Arzt, and Gerhard Widmer. "Learning Audio-Sheet Music Correspondences for Score Identification and
Offline Alignment." arXiv preprint arXiv:1707.09887 (2017).
References
36/36

Deep Learning of High-Level Representations

More Related Content

Similar to Deep Learning of High-Level Representations (20)

Recently uploaded (20)

Deep Learning of High-Level Representations