SlideShare a Scribd company logo
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
Automatic Tagging Using
Deep Convolutional Neural Networks
Keunwoo.Choi
@qmul.ac.uk
Centre for Digital Music, Queen Mary University of London, UK
Ɓ
@keunwoochoi
11 Aug 2016, ISMIR 2016, NY
1/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
1 Problem definition
What is auto-tagging?
2 The proposed architectures
But why?
3 Experiments
MagnaTagATune
MSD: Reported (and incorrect) results
MSD: Correct results
Conclusions
2/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
What is
auto-tagging?
The proposed
architectures
Experiments
Problem definition
What is auto-tagging?
Tags
Descriptive keywords that people (just) put on music
Multi-label nature
E.g. {rock, guitar, drive, 90’s}
Music tags include Genres (rock, pop, alternative, indie),
Instruments (vocalists, guitar, violin), Emotions (mellow,
chill), Activities (party, drive), Eras (00’s, 90’s, 80’s).
Collaboratively created (Last.fm ) → noisy and
ill-defined (of course)
false negative
synonyms (vocal/vocals/vocalist/vocalists/voice/voices.
guitar/guitars)
popularity bias
typo (harpsicord)
irrelevant tags (abcd, ilikeit, fav)
3/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
What is
auto-tagging?
The proposed
architectures
Experiments
Problem definition
What is auto-tagging?
Multi-label classification
Criteria: AUC-ROC (Area Under an ROC Curve)
0.5 <= AUC-ROC <= 1.0
Robust to unbalanced datasets
Higher if lower false positive rate
Higher if higher true positive rate
4/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
But why?
Experiments
The proposed architectures
1×96×1366 melgram → conv’s/pooling’s → 2048×1×1
All ReLU
All 3x3 convolutions
2048 feature maps at the end
3,4,5,6,7 layers
5/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
But why?
Experiments
Assumptions
Why (I think) would it work?
conv-MP-conv-MP-conv-MP..
N × M Convolution: There are some useful patterns in
input and feature maps that are local, location-invariant,
and equal or smaller than N × M.
L × K Max-Pooling: We are generous up to L × K so we
allow variances within this range.
Which means,
We see big picture, some macroscopic patterns
...assuming/hoping that they are related to tag 6/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
MTT MSD
# tracks 25k 214K (out of total 1M)
# songs 5-6k 214K (out of total 1M)
Length 29.1s 30-60s
Benchmarks 10+ 0
Labels Tags, genres
Tags, genres,
EchoNest features,
bag-of-word lyrics,...
7/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
For Dataset Specificaions
Input representation MTT STFT/MFCC/Melgram
# Layers MTT 3/4/5/6/7
Benchmark MTT FCN-4 vs 5 previous methods
# Layers1 MSD 3/4/5
# Layers2 MSD 3/4/5, Narrower structure
1
Different from the paper
2
Not in the paper
8/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
MagnaTagATune - Input representations
Same depth (l=4), melgram>MFCC>STFT
melgram: 96 mel-frequency bins
STFT: 128 frequency bins
MFCC: 90 (30 MFCC, 30 MFCCd, 30 MFCCdd)
Methods AUC
FCN-3, mel-spectrogram .852
FCN-4, mel-spectrogram .894
FCN-5, mel-spectrogram .890
FCN-4, STFT .846
FCN-4, MFCC .862
Still, ConvNet may outperform frequency aggregation than
mel-frequency (if there’s more data). But not yet.
ConvNet outperformed MFCC
9/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
MagnaTagATune - Number of layers
Methods AUC
FCN-3, mel-spectrogram .852
FCN-4, mel-spectrogram .894
FCN-5, mel-spectrogram .890
FCN-4, STFT .846
FCN-4, MFCC .862
FCN-4>FCN-3: Depth worked!
FCN-4>FCN-5 by .004
Deeper model might make it equal after ages of training
Deeper models requires more data
Deeper models take more time (deep residual network[4])
4 layers are enough vs. matter of size(data)?
10/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
MagnaTagATune
Methods AUC
The proposed system, FCN-4 .894
2015, Bag of features and RBM [5] .888
2014, 1-D convolutions[2] .882
2014, Transferred learning [6] .88
2012, Multi-scale approach [1] .898
2011, Pooling MFCC [3] .861
All deep and NN approaches are around .88-.89
Are we touching the glass ceiling?
Perhaps due to the noise of MTT, but tricky to prove it
26K tracks are not enough for millions of parameters
11/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
Million Song Dataset - on the paper
Methods AUC
FCN-3, mel-spectrogram .786
FCN-4, — .808
FCN-5, — .848
FCN-6, — .851
FCN-7, — .845
12/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
WARNING!
The MSD results are not reproduced.
I suspect a incorrect learning rate controlling
and this is why we shouldn’t rush before deadline..
Ran the experiments again
without weird learning rate controlling,
and more epochs (240→480)
13/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Experiments and discussions
Million Song Dataset - re-run
Methods AUC
FCN-3, mel-spectrogram .839
FCN-4, — .852
FCN-5, — .855
100 101 102
Number of epochs
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
AUC
MSD in Log
FCN3
FCN4
FCN5
14/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Smaller (narrower) convnet
No. of feature maps: [128@1 – 2048@5] → [32@1 – 256@5],
i.e.narrower network, because there’s no difference between
FCN-4 and FCN-5.
100 101 102
Number of epochs
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
AUC MSD, small convnet, log
FCN3_small
FCN4_small
FCN5_small
15/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Conclusions
Assumptions - about macroscopic view seems fine
In general, the behaviour agrees with computer vision
community, which are..
the deeper, the better (or equal)
the wider, the better (or equal), but not as much as depth
Melgram+feature learning > MFCC
Melgram > STFT
At some point, we will argue STFT + learning > melgram
MTT is too small, even MSD might be small
Future work: More investigation, variable input length,
better dataset, re-thinking the problem...
16/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
Thank you for listening and...
You can plug-and-predict
The pre-trained weights and model is open!
https://github.com/keunwoochoi/music-auto_tagging-keras
17/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
References I
Dieleman, S., Schrauwen, B.: Multiscale approaches to
music audio feature learning. In: ISMIR. pp. 3–8 (2013)
Dieleman, S., Schrauwen, B.: End-to-end learning for
music audio. In: Acoustics, Speech and Signal Processing
(ICASSP), 2014 IEEE International Conference on. pp.
6964–6968. IEEE (2014)
Hamel, P., Lemieux, S., Bengio, Y., Eck, D.: Temporal
pooling and multiscale learning for automatic annotation
and ranking of music audio. In: ISMIR. pp. 729–734 (2011)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning
for image recognition. arXiv preprint arXiv:1512.03385
(2015)
18/19
Automatic
Tagging Using
Deep
Convolutional
Neural
Networks
Keunwoo.Choi
@qmul.ac.uk
Problem
definition
The proposed
architectures
Experiments
MagnaTagATune
MSD: Reported
(and incorrect)
results
MSD: Correct
results
Conclusions
References II
Nam, J., Herrera, J., Lee, K.: A deep bag-of-features
model for music auto-tagging. arXiv preprint
arXiv:1508.04999 (2015)
Van Den Oord, A., Dieleman, S., Schrauwen, B.: Transfer
learning by supervised pre-training for audio-based music
classification. In: Conference of the International Society
for Music Information Retrieval (ISMIR 2014) (2014)
19/19

More Related Content

PDF
딥러닝 개요 (2015-05-09 KISTEP)
PDF
Deep Convolutional Neural Networks - Overview
PDF
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
PDF
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry: Statisti...
PDF
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
PDF
Deep Learning Based Voice Activity Detection and Speech Enhancement
PDF
IRJET- Musical Instrument Recognition using CNN and SVM
PDF
Introduction to deep learning based voice activity detection
딥러닝 개요 (2015-05-09 KISTEP)
Deep Convolutional Neural Networks - Overview
캡슐 네트워크를 이용한 엔드투엔드 음성 단어 인식, 배재성(KAIST 석사과정)
Tutorial @ IEEE ICC 2019 : Machine Learning and Stochastic Geometry: Statisti...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Deep Learning Based Voice Activity Detection and Speech Enhancement
IRJET- Musical Instrument Recognition using CNN and SVM
Introduction to deep learning based voice activity detection

What's hot (20)

PPTX
Voice Activity Detection using Single Frequency Filtering
PPTX
Recurrent neural networks for sequence learning and learning human identity f...
PPTX
Text-Independent Speaker Verification
PDF
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
PPTX
Supervised sequence labelling with recurrent neural networks ch1 6
PDF
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
PDF
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
PDF
Convolutional Neural Networks (CNN)
PDF
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
PDF
Dissertation wonchae kim
PPTX
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
PPTX
Tutorial on convolutional neural networks
PDF
Convolutional Neural Network
PPTX
Deep Learning in Computer Vision
PDF
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
PPTX
Image Segmentation Using Deep Learning : A survey
PPTX
Deep learning lecture - part 1 (basics, CNN)
PDF
Multiuser MIMO Vector Perturbation Precoding
PPT
Exploiting Dissimilarity Representations for Person Re-Identification
PDF
Recurrent Neural Networks, LSTM and GRU
Voice Activity Detection using Single Frequency Filtering
Recurrent neural networks for sequence learning and learning human identity f...
Text-Independent Speaker Verification
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Supervised sequence labelling with recurrent neural networks ch1 6
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
Convolutional Neural Networks (CNN)
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Dissertation wonchae kim
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
Tutorial on convolutional neural networks
Convolutional Neural Network
Deep Learning in Computer Vision
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
Image Segmentation Using Deep Learning : A survey
Deep learning lecture - part 1 (basics, CNN)
Multiuser MIMO Vector Perturbation Precoding
Exploiting Dissimilarity Representations for Person Re-Identification
Recurrent Neural Networks, LSTM and GRU
Ad

Viewers also liked (20)

PDF
Deep learning for music classification, 2016-05-24
PDF
Understanding Music Playlists
PDF
Deep Learning Meetup #5
PDF
101: Convolutional Neural Networks
PPTX
Talwar_Rakshak_2016URD
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PDF
Comparing Incremental Learning Strategies for Convolutional Neural Networks
PDF
[261] 실시간 추천엔진 머신한대에 구겨넣기
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PPTX
Deep Learning - Convolutional Neural Networks - Architectural Zoo
PDF
Deep Learning - Convolutional Neural Networks
PPT
Intelligent analysis of environmental data: an introduction Mikhail Kanevski ...
PDF
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
PPTX
ISMIR 2016_Melody Extraction
PDF
ICML Talk on deep learning for music recommendation
PDF
Understanding Convolutional Neural Networks
PDF
Learning Convolutional Neural Networks for Graphs
PDF
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
PDF
Pycon apac 2014
PDF
Audio chord recognition using deep neural networks
Deep learning for music classification, 2016-05-24
Understanding Music Playlists
Deep Learning Meetup #5
101: Convolutional Neural Networks
Talwar_Rakshak_2016URD
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Comparing Incremental Learning Strategies for Convolutional Neural Networks
[261] 실시간 추천엔진 머신한대에 구겨넣기
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks
Intelligent analysis of environmental data: an introduction Mikhail Kanevski ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
ISMIR 2016_Melody Extraction
ICML Talk on deep learning for music recommendation
Understanding Convolutional Neural Networks
Learning Convolutional Neural Networks for Graphs
Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
Pycon apac 2014
Audio chord recognition using deep neural networks
Ad

Similar to Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016 (20)

PPTX
PPTX
PPTX
yueyrhansdfeddjrjrje9ojdmneojkrjfjei.pptx
PPTX
Data Science Salon: MCL Clustering of Sparse Graphs
PDF
Architecture neural network deep optimizing based on self organizing feature ...
PDF
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systems
PDF
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systems
PDF
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
PDF
IEEE 802.11a Physical Layer Simulation
PDF
COMPARATIVE ANALYSIS OF SIMULATION TECHNIQUES: SCAN COMPRESSION AND INTERNAL ...
PDF
CMAC Neural Networks
PDF
upload2
PDF
680report final
PDF
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
PDF
Heuristic approach to optimize the number of test cases for simple circuits
PDF
Heuristic approach to optimize the number of test cases for simple circuits
PDF
Heuristic approach to optimize the number of test cases for simple circuits
PDF
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
DOC
ECET 375 Invent Yourself/newtonhelp.com
PDF
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
yueyrhansdfeddjrjrje9ojdmneojkrjfjei.pptx
Data Science Salon: MCL Clustering of Sparse Graphs
Architecture neural network deep optimizing based on self organizing feature ...
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systems
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systems
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
IEEE 802.11a Physical Layer Simulation
COMPARATIVE ANALYSIS OF SIMULATION TECHNIQUES: SCAN COMPRESSION AND INTERNAL ...
CMAC Neural Networks
upload2
680report final
Ijaems apr-2016-22TDMA- MAC Protocol based Energy- Potency for Periodic Sensi...
Heuristic approach to optimize the number of test cases for simple circuits
Heuristic approach to optimize the number of test cases for simple circuits
Heuristic approach to optimize the number of test cases for simple circuits
IRJET- Synchronization Scheme of MIMO-OFDM using Monte Carlo Method
ECET 375 Invent Yourself/newtonhelp.com
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0

More from Keunwoo Choi (8)

PDF
"All you need is AI and music" by Keunwoo Choi
PDF
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
PDF
가상현실을 위한 오디오 기술
PPTX
Conditional generative model for audio
PDF
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
PDF
Convolutional recurrent neural networks for music classification
PDF
The effects of noisy labels on deep convolutional neural networks for music t...
PDF
dl4mir tutorial at ETRI, Korea
"All you need is AI and music" by Keunwoo Choi
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)
가상현실을 위한 오디오 기술
Conditional generative model for audio
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Convolutional recurrent neural networks for music classification
The effects of noisy labels on deep convolutional neural networks for music t...
dl4mir tutorial at ETRI, Korea

Recently uploaded (20)

PPTX
2. Earth - The Living Planet earth and life
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
BIOMOLECULES PPT........................
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
famous lake in india and its disturibution and importance
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Microbiology with diagram medical studies .pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
2. Earth - The Living Planet earth and life
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
2. Earth - The Living Planet Module 2ELS
BIOMOLECULES PPT........................
Introduction to Fisheries Biotechnology_Lesson 1.pptx
famous lake in india and its disturibution and importance
neck nodes and dissection types and lymph nodes levels
Comparative Structure of Integument in Vertebrates.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
2Systematics of Living Organisms t-.pptx
Microbiology with diagram medical studies .pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Cell Membrane: Structure, Composition & Functions
INTRODUCTION TO EVS | Concept of sustainability
lecture 2026 of Sjogren's syndrome l .pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
The KM-GBF monitoring framework – status & key messages.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...

Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016