This document outlines recent advances in deep learning of high-level representations from unlabeled data. It discusses using convolutional neural networks to learn representations from images and audio, as well as statistical layers and attention mechanisms. Generative adversarial networks are introduced for unsupervised representation learning, including a probabilistic GAN model to address mode collapse. The document provides motivation and examples of deep learning approaches for tasks like acoustic scene classification, audio-sheet music alignment, and weakly supervised audio tagging.
Related topics: