SlideShare a Scribd company logo
Deep Learning of High-Level Representations
Hamid Eghbal-zadeh
hamid.eghbal-zadeh@jku.at
Workshop on the Application of
Deep Learning for SV/CNV annotation
March, 2018, Labdia, Vienna
Outline
1. Motivation
2. Supervised Learning
a. Convolutional Layers
b. Multimodal Learning
c. Statistical Layers
d. Attention Mechanism
3. Unsupervised Learning
a. Objectives
b. Generative Adversarial Networks
2/36
Motivation
3/36
Motivation
• Representation Learning
– The features
– Factors and Causes
– High-level concepts
4/36
Motivation
• Classic ML
Raw data Handcrafted features Machine Learning
Raw data
Neural Networks
5/36
• Modern ML
[Alam 2018]
Motivation
• A Good Representation [Bengio 2013]
– Captures posterior belief
about explanatory causes
– Disentangles the factors of
variation
6/36
Supervised Learning
7/36
• Training with labeled data
– Feature learning (Encoder)
– Classifier (Decoder)
– End-to-end training
Supervised Learning
8/36
Deep Neural Networks with Convolutional Layers
9/36
DNNs with Convolutional Layers
• Convolutional Layers [LeCun 1995]
Feature mapsInput image
Pooling
(Down-sampling)
10/36
DNNs with Convolutional Layers
• CNNs for Acoustic Scene Classification
IEEE DCASE-2016 challenge
(http://dcase.community)
11/36
DNNs with Convolutional Layers
• CNNs for ASC [Eghbal-zadeh 2016,
arXiv:1706.06525]
Feature maps
Input
spectrograms
Pooling
(Down-sampling) Global Average Pooling
Softmax output
probabilities
Trained CCE loss,
optimized with SGD
12/36
DNNs with Convolutional Layers
• Hybrid: CNNs + Factor Analysis for ASC
[Eghbal-zadeh 2016, arXiv:1706.06525]
Spectrograms
Engineered features
(MFCCs)
Deep CNN Predicted
probabilities
I-Vectors
(Unsupervised
Representation Learning
with Factor Analysis)
Predicted
probabilities
Late Fusion
(Linear Regression)
FeatureengineeringFeaturelearning
13/36
DNNs with Convolutional Layers
• CNNs + Factor Analysis Hybrid for ASC
[Eghbal-zadeh 2016, arXiv:1706.06525]
14/36
Multimodal Learning
15/36
Multimodal Learning
16/36
• Audio-Sheet Music Correspondences [Dorfer 2017]
Statistical Layers for Deep Neural Networks
17/36
Statistical Layers for DNNs
• Apply specialized processing methods in an
end-to-end fashion
statisticallayer
18/36
Statistical Layers for DNNs
• Deep Within-Class Covariance Analysis
[Eghbal-zadeh 2017, arXiv:1711.04022]
DWCCA
19/36
Statistical Layers for DNNs
• Deep Within-Class Covariance Analysis
[Eghbal-zadeh 2017, arXiv:1711.04022]
B =
Computational graph for B is differentiable,
therefore DWCCA layer can be trained
end-to-end with SGD and backpropagation.
DWCCA
20/36
Statistical Layers for DNNs
• Deep Within-Class Covariance Analysis
[Eghbal-zadeh 2017, arXiv:1711.04022]
No DWCCA W/ DWCCAEigenvalues of Cov
21/36
Attention Mechanism
22/36
Attention Mechanism
Attention Mechanism allows the decoder part to attend
to different parts of the learned representation.
Sequence to sequence Autoencoders:
23/36
Attention Mechanism
Predicting strong labels from weak labels in acoustic
event detection [DCASE-2017/2018 Task4].
Training: Testing:
24/36
Attention Mechanism
Predicting strong labels from weak labels in acoustic
event detection [Xu 2017].
25/36
Unsupervised Learning
26/36
Unsupervised Learning
• Objectives for unsupervised learning
– Compression-Reconstruction
(Autoencoders, ...)
– Local similarity (between adjacent frames)
• Defined distance
– L1, L2
– Wasserstein
• Learn the distance
– Adversarial training
27/36
Unsupervised Learning
• Generative Adversarial Networks [Goodfellow
2014]
Discriminator
Generator
fake
real
Real/Fake?
How to fool
Discriminator...
How to catch
Generator….
Generator learns to generate images that Discriminator
can not distinguish from real images
random vector
28/36
Unsupervised Learning
• Generative Adversarial Networks [Goodfellow
2014]
– Learn a Generator for data augmentation
[Antoniou 2017]
– Learn image features in the discriminator
[Radford 2016]
– Design new adversarial objectives for
unsupervised/semi-supervised learning
(Bi-directional GANs [Donahue et al])
29/36
Unsupervised Learning
• Probabilistic Generative Adversarial Networks
[Eghbal-zadeh 2017, arXiv:1708.01886]
– We integrate a probabilistic model inside
the discriminator with a GMM
– Using Gauss. lk instead of classifier
– We tackle the mode-collapse problem
• When generator generates only
some of the classes (modes) in data
30/36
Unsupervised Learning
• Probabilistic Generative Adversarial Networks
[Eghbal-zadeh 2017, arXiv:1708.01886]
– Creates clusters in the discriminator
• compares real clusters vs fake clusters
– Draws fake clusters towards real clusters
31/36
Unsupervised Learning
• Probabilistic Generative Adversarial Networks
[Eghbal-zadeh 2017, arXiv:1708.01886]
32/36
Unsupervised Learning
• Probabilistic Generative Adversarial Networks
[Eghbal-zadeh 2017, arXiv:1708.01886]
CIFAR-10 CelebA Fashion-MNIST
33/36
Thank you!
34/36
[1] Deep Learning of Representations: a AAAI 2013 Tutorial, Yoshua Bengio
[2] Cover, Thomas M., and Joy A. Thomas. "Elements of information theory 2nd edition." (2006).
[3] Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint
arXiv:1703.00810 (2017).
[4] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and
neural networks 3361.10 (1995): 1995.
[5] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556 (2014).
[6] Eghbal-Zadeh, Hamid, et al. "CP-JKU submissions for DCASE-2016: A hybrid approach using binaural i-vectors and deep
convolutional neural networks." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE)
(2016).
[7] Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen. "TUT database for acoustic scene classification and sound event
detection." Signal Processing Conference (EUSIPCO), 2016 24th European. IEEE, 2016.
[8] Eghbal-zadeh, Hamid, Matthias Dorfer, and Gerhard Widmer. "Deep Within-Class Covariance Analysis for Acoustic Scene
Classification." arXiv preprint arXiv:1711.04022 (2017).
[9] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
References
35/36
[10] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative
adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
[11] Donahue, Jeff, Philipp Krähenbühl, and Trevor Darrell. "Adversarial feature learning." arXiv preprint arXiv:1605.09782 (2016).
[12] Dumoulin, Vincent, et al. "Adversarially learned inference." arXiv preprint arXiv:1606.00704 (2016).
[13] Eghbal-zadeh, Hamid, and Gerhard Widmer. "Probabilistic Generative Adversarial Networks." arXiv preprint arXiv:1708.01886
(2017).
[14] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. "Data Augmentation Generative Adversarial Networks." arXiv preprint
arXiv:1711.04340 (2017).
[15] Xu, Yong, et al. "Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging."
arXiv preprint arXiv:1703.06052 (2017).
[16] Md Zahangir AlomMd Zahangir AlomTarek M. TahaChristopher YakopcicStefan WestbergVijayan K. AsariVijayan K., “The History
Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches”, Asari,arXiv:1803.01164 (2018).
[17] Dorfer, Matthias, Andreas Arzt, and Gerhard Widmer. "Learning Audio-Sheet Music Correspondences for Score Identification and
Offline Alignment." arXiv preprint arXiv:1707.09887 (2017).
References
36/36

More Related Content

PDF
Predicting the future with social media
PPTX
Diffusion models beat gans on image synthesis
PDF
Multimedia data mining using deep learning
PPTX
Weave-D - 2nd Progress Evaluation Presentation
PDF
A new approachto image classification based on adeep multiclass AdaBoosting e...
PDF
Proposing a new method of image classification based on the AdaBoost deep bel...
PDF
Image Classification using Deep Learning
PDF
Teaching AI through Machine Learning Projects
Predicting the future with social media
Diffusion models beat gans on image synthesis
Multimedia data mining using deep learning
Weave-D - 2nd Progress Evaluation Presentation
A new approachto image classification based on adeep multiclass AdaBoosting e...
Proposing a new method of image classification based on the AdaBoost deep bel...
Image Classification using Deep Learning
Teaching AI through Machine Learning Projects

Similar to Deep Learning of High-Level Representations (20)

PPTX
Talk@rmit 09112017
PPT
Applying Deep Learning with Weak and Noisy labels
PPT
Ben Shneiderman: Thrill of Discovery
PDF
H2O with Erin LeDell at Portland R User Group
PDF
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
PDF
Introduction to the Artificial Intelligence and Computer Vision revolution
PDF
Neural Networks and Deep Learning Syllabus
PDF
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
PDF
Interactive Video Search: Where is the User in the Age of Deep Learning?
PPTX
Multilabel Image Retreval Using Hashing
PDF
An Extensive Review on Generative Adversarial Networks GAN’s
PDF
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
PDF
Visual reasoning
PPTX
Leveraging Deep Learning Representation for search-based Image Annotation
PDF
Automatic Attendance System using Deep Learning Framework
PDF
Introduction to Data Mining
PDF
Generative AI: Shifting the AI Landscape
PDF
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
PPTX
Introduction to Deep learning
PPTX
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Talk@rmit 09112017
Applying Deep Learning with Weak and Noisy labels
Ben Shneiderman: Thrill of Discovery
H2O with Erin LeDell at Portland R User Group
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Introduction to the Artificial Intelligence and Computer Vision revolution
Neural Networks and Deep Learning Syllabus
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Interactive Video Search: Where is the User in the Age of Deep Learning?
Multilabel Image Retreval Using Hashing
An Extensive Review on Generative Adversarial Networks GAN’s
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...
Visual reasoning
Leveraging Deep Learning Representation for search-based Image Annotation
Automatic Attendance System using Deep Learning Framework
Introduction to Data Mining
Generative AI: Shifting the AI Landscape
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Introduction to Deep learning
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Ad

Recently uploaded (20)

PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPT
protein biochemistry.ppt for university classes
PDF
An interstellar mission to test astrophysical black holes
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
neck nodes and dissection types and lymph nodes levels
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Sciences of Europe No 170 (2025)
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
famous lake in india and its disturibution and importance
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
protein biochemistry.ppt for university classes
An interstellar mission to test astrophysical black holes
AlphaEarth Foundations and the Satellite Embedding dataset
neck nodes and dissection types and lymph nodes levels
POSITIONING IN OPERATION THEATRE ROOM.ppt
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
7. General Toxicologyfor clinical phrmacy.pptx
Sciences of Europe No 170 (2025)
Introduction to Cardiovascular system_structure and functions-1
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
HPLC-PPT.docx high performance liquid chromatography
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
famous lake in india and its disturibution and importance
Ad

Deep Learning of High-Level Representations

  • 1. Deep Learning of High-Level Representations Hamid Eghbal-zadeh hamid.eghbal-zadeh@jku.at Workshop on the Application of Deep Learning for SV/CNV annotation March, 2018, Labdia, Vienna
  • 2. Outline 1. Motivation 2. Supervised Learning a. Convolutional Layers b. Multimodal Learning c. Statistical Layers d. Attention Mechanism 3. Unsupervised Learning a. Objectives b. Generative Adversarial Networks 2/36
  • 4. Motivation • Representation Learning – The features – Factors and Causes – High-level concepts 4/36
  • 5. Motivation • Classic ML Raw data Handcrafted features Machine Learning Raw data Neural Networks 5/36 • Modern ML [Alam 2018]
  • 6. Motivation • A Good Representation [Bengio 2013] – Captures posterior belief about explanatory causes – Disentangles the factors of variation 6/36
  • 8. • Training with labeled data – Feature learning (Encoder) – Classifier (Decoder) – End-to-end training Supervised Learning 8/36
  • 9. Deep Neural Networks with Convolutional Layers 9/36
  • 10. DNNs with Convolutional Layers • Convolutional Layers [LeCun 1995] Feature mapsInput image Pooling (Down-sampling) 10/36
  • 11. DNNs with Convolutional Layers • CNNs for Acoustic Scene Classification IEEE DCASE-2016 challenge (http://dcase.community) 11/36
  • 12. DNNs with Convolutional Layers • CNNs for ASC [Eghbal-zadeh 2016, arXiv:1706.06525] Feature maps Input spectrograms Pooling (Down-sampling) Global Average Pooling Softmax output probabilities Trained CCE loss, optimized with SGD 12/36
  • 13. DNNs with Convolutional Layers • Hybrid: CNNs + Factor Analysis for ASC [Eghbal-zadeh 2016, arXiv:1706.06525] Spectrograms Engineered features (MFCCs) Deep CNN Predicted probabilities I-Vectors (Unsupervised Representation Learning with Factor Analysis) Predicted probabilities Late Fusion (Linear Regression) FeatureengineeringFeaturelearning 13/36
  • 14. DNNs with Convolutional Layers • CNNs + Factor Analysis Hybrid for ASC [Eghbal-zadeh 2016, arXiv:1706.06525] 14/36
  • 16. Multimodal Learning 16/36 • Audio-Sheet Music Correspondences [Dorfer 2017]
  • 17. Statistical Layers for Deep Neural Networks 17/36
  • 18. Statistical Layers for DNNs • Apply specialized processing methods in an end-to-end fashion statisticallayer 18/36
  • 19. Statistical Layers for DNNs • Deep Within-Class Covariance Analysis [Eghbal-zadeh 2017, arXiv:1711.04022] DWCCA 19/36
  • 20. Statistical Layers for DNNs • Deep Within-Class Covariance Analysis [Eghbal-zadeh 2017, arXiv:1711.04022] B = Computational graph for B is differentiable, therefore DWCCA layer can be trained end-to-end with SGD and backpropagation. DWCCA 20/36
  • 21. Statistical Layers for DNNs • Deep Within-Class Covariance Analysis [Eghbal-zadeh 2017, arXiv:1711.04022] No DWCCA W/ DWCCAEigenvalues of Cov 21/36
  • 23. Attention Mechanism Attention Mechanism allows the decoder part to attend to different parts of the learned representation. Sequence to sequence Autoencoders: 23/36
  • 24. Attention Mechanism Predicting strong labels from weak labels in acoustic event detection [DCASE-2017/2018 Task4]. Training: Testing: 24/36
  • 25. Attention Mechanism Predicting strong labels from weak labels in acoustic event detection [Xu 2017]. 25/36
  • 27. Unsupervised Learning • Objectives for unsupervised learning – Compression-Reconstruction (Autoencoders, ...) – Local similarity (between adjacent frames) • Defined distance – L1, L2 – Wasserstein • Learn the distance – Adversarial training 27/36
  • 28. Unsupervised Learning • Generative Adversarial Networks [Goodfellow 2014] Discriminator Generator fake real Real/Fake? How to fool Discriminator... How to catch Generator…. Generator learns to generate images that Discriminator can not distinguish from real images random vector 28/36
  • 29. Unsupervised Learning • Generative Adversarial Networks [Goodfellow 2014] – Learn a Generator for data augmentation [Antoniou 2017] – Learn image features in the discriminator [Radford 2016] – Design new adversarial objectives for unsupervised/semi-supervised learning (Bi-directional GANs [Donahue et al]) 29/36
  • 30. Unsupervised Learning • Probabilistic Generative Adversarial Networks [Eghbal-zadeh 2017, arXiv:1708.01886] – We integrate a probabilistic model inside the discriminator with a GMM – Using Gauss. lk instead of classifier – We tackle the mode-collapse problem • When generator generates only some of the classes (modes) in data 30/36
  • 31. Unsupervised Learning • Probabilistic Generative Adversarial Networks [Eghbal-zadeh 2017, arXiv:1708.01886] – Creates clusters in the discriminator • compares real clusters vs fake clusters – Draws fake clusters towards real clusters 31/36
  • 32. Unsupervised Learning • Probabilistic Generative Adversarial Networks [Eghbal-zadeh 2017, arXiv:1708.01886] 32/36
  • 33. Unsupervised Learning • Probabilistic Generative Adversarial Networks [Eghbal-zadeh 2017, arXiv:1708.01886] CIFAR-10 CelebA Fashion-MNIST 33/36
  • 35. [1] Deep Learning of Representations: a AAAI 2013 Tutorial, Yoshua Bengio [2] Cover, Thomas M., and Joy A. Thomas. "Elements of information theory 2nd edition." (2006). [3] Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017). [4] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and neural networks 3361.10 (1995): 1995. [5] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). [6] Eghbal-Zadeh, Hamid, et al. "CP-JKU submissions for DCASE-2016: A hybrid approach using binaural i-vectors and deep convolutional neural networks." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016). [7] Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen. "TUT database for acoustic scene classification and sound event detection." Signal Processing Conference (EUSIPCO), 2016 24th European. IEEE, 2016. [8] Eghbal-zadeh, Hamid, Matthias Dorfer, and Gerhard Widmer. "Deep Within-Class Covariance Analysis for Acoustic Scene Classification." arXiv preprint arXiv:1711.04022 (2017). [9] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014. References 35/36
  • 36. [10] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015). [11] Donahue, Jeff, Philipp Krähenbühl, and Trevor Darrell. "Adversarial feature learning." arXiv preprint arXiv:1605.09782 (2016). [12] Dumoulin, Vincent, et al. "Adversarially learned inference." arXiv preprint arXiv:1606.00704 (2016). [13] Eghbal-zadeh, Hamid, and Gerhard Widmer. "Probabilistic Generative Adversarial Networks." arXiv preprint arXiv:1708.01886 (2017). [14] Antoniou, Antreas, Amos Storkey, and Harrison Edwards. "Data Augmentation Generative Adversarial Networks." arXiv preprint arXiv:1711.04340 (2017). [15] Xu, Yong, et al. "Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging." arXiv preprint arXiv:1703.06052 (2017). [16] Md Zahangir AlomMd Zahangir AlomTarek M. TahaChristopher YakopcicStefan WestbergVijayan K. AsariVijayan K., “The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches”, Asari,arXiv:1803.01164 (2018). [17] Dorfer, Matthias, Andreas Arzt, and Gerhard Widmer. "Learning Audio-Sheet Music Correspondences for Score Identification and Offline Alignment." arXiv preprint arXiv:1707.09887 (2017). References 36/36