SlideShare a Scribd company logo
Emotion and Theme Recognition in Music
Using Jamendo
Dmitry Bogdanov, Alastair Porter, Philip Tovstogan, Minz Won
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Multimedia Evaluation Benchmark 2020 Workshop
14-15 December 2020, Online
Emotions and Themes in Music
● Same format as Emotions and Themes in Music 2019
● A popular task in Music Information Retrieval: Music Emotion Recognition (MER)
● Emotion in Music Task at MediaEval: 2013 - 2015 (Aljanaki et al. 2015)
○ Per-second arousal/valence annotations
● Audio mood classification task at MIREX (since 2007)
○ 600 tracks, 5 emotion clusters from tags
● Theme recognition is not explored enough (Bischoff et al. 2009)
○ Epic, dark, christmas, etc.
Downie, X. H. J. S., Laurier, C., & Ehmann, M. B. A. F. (2008). The 2007 MIREX audio mood classification task: Lessons learned. In Proc. 9th Int. Conf. Music Inf. Retrieval (pp. 462-467).
Aljanaki, A., Yang, Y. H., & Soleymani, M. (2015, October). Emotion in Music Task at MediaEval 2015. In MediaEval.
Bischoff, K., Firan, C. S., Paiu, R., Nejdl, W., Laurier, C., & Sordo, M. (2009, October). Music mood and theme classification-a hybrid approach. In ISMIR (pp. 657-662).
Jamendo.com
MTG-Jamendo Dataset
● Creative Commons license
● Quality audio and labels (curated by Jamendo)
● Tag categories and subsets (top50, toy)
● Tag pre-processing (“sadness” → “sad”)
● Standardized 5 splits without artist effect
● Reproducible pre-processing and baseline
Bogdanov, D., Won, M., Tovstogan, P., Porter, A. & Serra, X. The MTG-Jamendo dataset for automatic music tagging. In Proceedings of the Machine Learning for Music Discovery
Workshop, 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (2019)
github.com/MTG/
mtg-jamendo-dataset
Moods and themes
● Mood/theme subset: 56 tags (after splits), fixed split: split-0
● Wide spectrum of interesting tags
○ 224 before selecting tags that have at least 50 different artists
○ E.g. ethereal (23 artists, 25 tracks), water (28, 43), suspense (32, 108), halloween (24, 101)
● What is the difference between mood and theme?
○ Deep? Soft vs calm? Dark?
○ No distinction in this task
Emotions and Themes in Music Task
Content-based emotion and theme recognition in music
Goal: Automatically recognize the emotions and themes conveyed in a music
recording using machine learning algorithms.
Task introduced in 2019:
● New category of tags: themes
● New open dataset with better quality of tags and audio
○ Multiple categories of tags including mood/theme
● Audio is available under CC licenses
Examples
● Example 1
● Example 2
● Example 3
Emotions?
Moods?
Themes?
○ Veaceslav Draganov - Motivation
○ commercial, corporate, happy, motivational
○ XCYRIL - Wandering Heart
○ documentary, emotional, space
○ Bendjamin Lambert - Everyone’s Disease
○ melancholic, melodic, sad
Data
Available: 18,486 tracks with 56 tags
● 320kbps MP3 (152 GB)
● Compressed NPY with spectrograms (68GB)
● Essentia features (0.4GB)
Pre-processing:
● Only tracks with duration > 30s
● Merging similar tags (stemming, translation)
● Tag is used by at least 50 different artists
D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. R. Zapata, and X. Serra. 2013. Essentia: An Audio Analysis
Library for Music Information Retrieval. In International Society for Music Information Retrieval Conference. Curitiba, Brazil
Submission and evaluation metrics
Submissions: on the test partition
● Tag activation values
● Binary tag predictions (optional)
● Basic script provided to generate binary predictions from activations
Evaluation metrics:
● Macro ROC-AUC and PR-AUC on tag activations
● Micro- and macro-averaged precision, recall and F-score for binary predictions
● Main leaderboard metric: Macro PR-AUC
VGG-ish Baseline
● 5 CNN layers + dense
Keunwoo Choi, George Fazekas, and Mark Sandler. 2016. Automatic tagging using deep
convolutional neural networks.arXiv preprint arXiv:1606.00298 (2016)
Use only a centered 29.1s audio segment
Mel-spectrograms (12KHz SR, 96 bands, 21ms hop)
ADAM, SGD, optimize ROC-AUC
VGG-ish Baseline: best and worst tags
Top 10 PR-AUC # tracks*
deep 0.5761 429
summer 0.4466 261
film 0.3441 746
corporate 0.3373 349
epic 0.308 601
happy 0.2534 927
advertising 0.2477 363
dark 0.2183 620
motivational 0.2012 372
meditative 0.1669 374
Bottom 10 PR-AUC # tracks*
travel 0.0097 89
holiday 0.0153 98
cool 0.0215 170
groovy 0.0238 78
sexy 0.0238 59
retro 0.0247 139
movie 0.025 207
drama 0.0253 448
fast 0.0266 62
funny 0.0279 109
* in training set
Proposed solutions
● 21 runs by 6 teams
● All submissions use deep learning
(expected from the task design)
Comparing to 2019:
● Attention everywhere
● SpecAugment is very popular
● Late fusion even more frequent
Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and
Quoc V Le. 2019. Specaugment: A simple data augmentation method for automatic speech
recognition. arXiv preprint arXiv:1904.08779 (2019).
Results (best run per team)
Team PR-AUC ROC-AUC F-Score Approach, Focus, External datasets
1 SAIL-MiM-USC 0.1609 0.7812 0.2203 VGGish (MSD, Music4All), Mixup, experimenting w/ losses
4 SAIL-MiM-USC 0.1421 0.7625 0.1976 Same, without external data
5 HCMUS 0.1414 0.7663 0.0594 EfficientNet, WaveNet (NSynth), MobileNetV2, SpecAug
8 AugsBurger 0.1313 0.7533 0.1901 CBAM, Self-Attention + RNNs
9 UAI-CNRL 0.1275 0.7360 0.1883 ResNet, Self-Attention, Mixup, SpecAug
12 AUGment 0.1178 0.7353 0.1738 VGGish, Self-Attention, AReLU, smaller nets
15 baseline-vgg 0.1077 0.7258 0.1656 VGGish
19 UIBK-DBIS 0.0965 0.7043 0.1040 CRNN, pre-processing, moods vs themes
24 baseline-pop 0.0319 0.5 0.0026 -
Results: Precision vs Recall
Results: PR-AUC and ROC-AUC
PR-AUC ROC-AUC F-Score Approach
1 Best 2020 0.1609 0.7812 0.2203 VGGish (MSD, Music4All), Mixup, experimenting w/ losses
3 Best 2019 0.1546 0.7729 0.2124 Shake-FA-ResNet + FA-ResNet
15 Baseline VGG 0.1077 0.7258 0.1656 VGG
+ 0.0063 + 0.0083 + 0.0079
Conclusions: The Task is Challenging!
● Improvements are present, but are they significant?
● Is it about having more data vs having a better approach?
● Dataset not large enough, thus data augmentation is commonly used.
● Winning team SAIL-MiM-USC used ensemble with different losses designed
to compensate for unbalanced tag representation, maybe that is the
direction?
● Team AUGment focused on more lightweight models with less FLOPs
Future directions
● Considering the 2021 edition of the task
● Several datasets?
● Improved tag balance of the dataset?
● Individual tag analysis?
● Additional tag metadata (genre, instruments)
Reproducibility
● Open auto-tagging dataset - MTG-Jamendo
● Pre-processing, download, baseline scripts available on GitHub
● Audio and metadata available under Creative Commons licence
● Open source code for the baseline and submissions
○ 4 out of 6 teams published their source code
All info is available on the website: tinyurl.com/mediaeval2020emotions
Questions?
Thank you!
Multimedia Evaluation Benchmark 2020 Workshop
14-15 December 2020, Online

More Related Content

PDF
Emotion and Theme Recognition in Music Using Jamendo
PDF
Exploring the Relationship Between Multi-Modal Emotion Semantics of Music
PDF
MediaEval 2018: Emotion and theme recognition in music using jamendo
PDF
MediaEval 2015 - Emotion in Music: Task Overview
PDF
Emotion in Music Task at MediaEval 2014
PDF
MediaEval 2020 Emotion and Theme Recognition in Music Task: Loss Function App...
PDF
MediaEval 2018: AcousticBrainz Genre Task: Content-based Music Genre Recognit...
PDF
Mood Sensitive Music Recommendation System
Emotion and Theme Recognition in Music Using Jamendo
Exploring the Relationship Between Multi-Modal Emotion Semantics of Music
MediaEval 2018: Emotion and theme recognition in music using jamendo
MediaEval 2015 - Emotion in Music: Task Overview
Emotion in Music Task at MediaEval 2014
MediaEval 2020 Emotion and Theme Recognition in Music Task: Loss Function App...
MediaEval 2018: AcousticBrainz Genre Task: Content-based Music Genre Recognit...
Mood Sensitive Music Recommendation System

Similar to MediaEval 2020: Emotion and Theme Recognition in Music Using Jamendo (20)

PDF
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
PPTX
Emotion based music player
PPTX
Interdisciplinary Perspectives on Emotion, Music and Technology
PDF
Emotion classification for musical data using deep learning techniques
PDF
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
PDF
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
PDF
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
PPTX
major ppt 1 final.pptx
PDF
Emotion Based Music Player
PPTX
PPTX
B8_Mini project_Final review ppt.pptx
PDF
Detect Mood and Play Song Accordingly
PDF
machine learning x music
PDF
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
PDF
Mehfil : Song Recommendation System Using Sentiment Detected
PDF
Personalized Music Emotion Recognition via Model Adaptation
PPTX
Computational models of symphonic music
PDF
Moodify – Music Player Based on Mood
PDF
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
PPTX
Face2mus 1437580648936
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
Emotion based music player
Interdisciplinary Perspectives on Emotion, Music and Technology
Emotion classification for musical data using deep learning techniques
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
major ppt 1 final.pptx
Emotion Based Music Player
B8_Mini project_Final review ppt.pptx
Detect Mood and Play Song Accordingly
machine learning x music
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
Mehfil : Song Recommendation System Using Sentiment Detected
Personalized Music Emotion Recognition via Model Adaptation
Computational models of symphonic music
Moodify – Music Player Based on Mood
MOODetector: A System for Mood-based Classification and Retrieval of Audio Mu...
Face2mus 1437580648936
Ad

More from multimediaeval (20)

PPTX
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
PDF
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
PDF
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
PDF
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
PPTX
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
PDF
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
PDF
Fooling an Automatic Image Quality Estimator
PDF
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
PDF
Pixel Privacy: Quality Camouflage for Social Images
PDF
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
PPTX
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
PDF
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
PDF
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
PPTX
Deep Conditional Adversarial learning for polyp Segmentation
PPTX
A Temporal-Spatial Attention Model for Medical Image Detection
PPTX
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
PDF
Fine-tuning for Polyp Segmentation with Attention
PPTX
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
PPTX
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
PDF
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Fooling an Automatic Image Quality Estimator
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Pixel Privacy: Quality Camouflage for Social Images
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Deep Conditional Adversarial learning for polyp Segmentation
A Temporal-Spatial Attention Model for Medical Image Detection
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
Fine-tuning for Polyp Segmentation with Attention
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Ad

Recently uploaded (20)

PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
Fluid dynamics vivavoce presentation of prakash
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPT
Mutation in dna of bacteria and repairss
PPTX
BIOMOLECULES PPT........................
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
gene cloning powerpoint for general biology 2
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
Microbes in human welfare class 12 .pptx
PDF
Science Form five needed shit SCIENEce so
The Land of Punt — A research by Dhani Irwanto
Fluid dynamics vivavoce presentation of prakash
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Mutation in dna of bacteria and repairss
BIOMOLECULES PPT........................
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
6.1 High Risk New Born. Padetric health ppt
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
Biomechanics of the Hip - Basic Science.pptx
gene cloning powerpoint for general biology 2
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
TORCH INFECTIONS in pregnancy with toxoplasma
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
A powerpoint on colorectal cancer with brief background
Microbes in human welfare class 12 .pptx
Science Form five needed shit SCIENEce so

MediaEval 2020: Emotion and Theme Recognition in Music Using Jamendo

  • 1. Emotion and Theme Recognition in Music Using Jamendo Dmitry Bogdanov, Alastair Porter, Philip Tovstogan, Minz Won Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain Multimedia Evaluation Benchmark 2020 Workshop 14-15 December 2020, Online
  • 2. Emotions and Themes in Music ● Same format as Emotions and Themes in Music 2019 ● A popular task in Music Information Retrieval: Music Emotion Recognition (MER) ● Emotion in Music Task at MediaEval: 2013 - 2015 (Aljanaki et al. 2015) ○ Per-second arousal/valence annotations ● Audio mood classification task at MIREX (since 2007) ○ 600 tracks, 5 emotion clusters from tags ● Theme recognition is not explored enough (Bischoff et al. 2009) ○ Epic, dark, christmas, etc. Downie, X. H. J. S., Laurier, C., & Ehmann, M. B. A. F. (2008). The 2007 MIREX audio mood classification task: Lessons learned. In Proc. 9th Int. Conf. Music Inf. Retrieval (pp. 462-467). Aljanaki, A., Yang, Y. H., & Soleymani, M. (2015, October). Emotion in Music Task at MediaEval 2015. In MediaEval. Bischoff, K., Firan, C. S., Paiu, R., Nejdl, W., Laurier, C., & Sordo, M. (2009, October). Music mood and theme classification-a hybrid approach. In ISMIR (pp. 657-662).
  • 4. MTG-Jamendo Dataset ● Creative Commons license ● Quality audio and labels (curated by Jamendo) ● Tag categories and subsets (top50, toy) ● Tag pre-processing (“sadness” → “sad”) ● Standardized 5 splits without artist effect ● Reproducible pre-processing and baseline Bogdanov, D., Won, M., Tovstogan, P., Porter, A. & Serra, X. The MTG-Jamendo dataset for automatic music tagging. In Proceedings of the Machine Learning for Music Discovery Workshop, 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (2019) github.com/MTG/ mtg-jamendo-dataset
  • 5. Moods and themes ● Mood/theme subset: 56 tags (after splits), fixed split: split-0 ● Wide spectrum of interesting tags ○ 224 before selecting tags that have at least 50 different artists ○ E.g. ethereal (23 artists, 25 tracks), water (28, 43), suspense (32, 108), halloween (24, 101) ● What is the difference between mood and theme? ○ Deep? Soft vs calm? Dark? ○ No distinction in this task
  • 6. Emotions and Themes in Music Task Content-based emotion and theme recognition in music Goal: Automatically recognize the emotions and themes conveyed in a music recording using machine learning algorithms. Task introduced in 2019: ● New category of tags: themes ● New open dataset with better quality of tags and audio ○ Multiple categories of tags including mood/theme ● Audio is available under CC licenses
  • 7. Examples ● Example 1 ● Example 2 ● Example 3 Emotions? Moods? Themes? ○ Veaceslav Draganov - Motivation ○ commercial, corporate, happy, motivational ○ XCYRIL - Wandering Heart ○ documentary, emotional, space ○ Bendjamin Lambert - Everyone’s Disease ○ melancholic, melodic, sad
  • 8. Data Available: 18,486 tracks with 56 tags ● 320kbps MP3 (152 GB) ● Compressed NPY with spectrograms (68GB) ● Essentia features (0.4GB) Pre-processing: ● Only tracks with duration > 30s ● Merging similar tags (stemming, translation) ● Tag is used by at least 50 different artists D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. R. Zapata, and X. Serra. 2013. Essentia: An Audio Analysis Library for Music Information Retrieval. In International Society for Music Information Retrieval Conference. Curitiba, Brazil
  • 9. Submission and evaluation metrics Submissions: on the test partition ● Tag activation values ● Binary tag predictions (optional) ● Basic script provided to generate binary predictions from activations Evaluation metrics: ● Macro ROC-AUC and PR-AUC on tag activations ● Micro- and macro-averaged precision, recall and F-score for binary predictions ● Main leaderboard metric: Macro PR-AUC
  • 10. VGG-ish Baseline ● 5 CNN layers + dense Keunwoo Choi, George Fazekas, and Mark Sandler. 2016. Automatic tagging using deep convolutional neural networks.arXiv preprint arXiv:1606.00298 (2016) Use only a centered 29.1s audio segment Mel-spectrograms (12KHz SR, 96 bands, 21ms hop) ADAM, SGD, optimize ROC-AUC
  • 11. VGG-ish Baseline: best and worst tags Top 10 PR-AUC # tracks* deep 0.5761 429 summer 0.4466 261 film 0.3441 746 corporate 0.3373 349 epic 0.308 601 happy 0.2534 927 advertising 0.2477 363 dark 0.2183 620 motivational 0.2012 372 meditative 0.1669 374 Bottom 10 PR-AUC # tracks* travel 0.0097 89 holiday 0.0153 98 cool 0.0215 170 groovy 0.0238 78 sexy 0.0238 59 retro 0.0247 139 movie 0.025 207 drama 0.0253 448 fast 0.0266 62 funny 0.0279 109 * in training set
  • 12. Proposed solutions ● 21 runs by 6 teams ● All submissions use deep learning (expected from the task design) Comparing to 2019: ● Attention everywhere ● SpecAugment is very popular ● Late fusion even more frequent Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019).
  • 13. Results (best run per team) Team PR-AUC ROC-AUC F-Score Approach, Focus, External datasets 1 SAIL-MiM-USC 0.1609 0.7812 0.2203 VGGish (MSD, Music4All), Mixup, experimenting w/ losses 4 SAIL-MiM-USC 0.1421 0.7625 0.1976 Same, without external data 5 HCMUS 0.1414 0.7663 0.0594 EfficientNet, WaveNet (NSynth), MobileNetV2, SpecAug 8 AugsBurger 0.1313 0.7533 0.1901 CBAM, Self-Attention + RNNs 9 UAI-CNRL 0.1275 0.7360 0.1883 ResNet, Self-Attention, Mixup, SpecAug 12 AUGment 0.1178 0.7353 0.1738 VGGish, Self-Attention, AReLU, smaller nets 15 baseline-vgg 0.1077 0.7258 0.1656 VGGish 19 UIBK-DBIS 0.0965 0.7043 0.1040 CRNN, pre-processing, moods vs themes 24 baseline-pop 0.0319 0.5 0.0026 -
  • 15. Results: PR-AUC and ROC-AUC PR-AUC ROC-AUC F-Score Approach 1 Best 2020 0.1609 0.7812 0.2203 VGGish (MSD, Music4All), Mixup, experimenting w/ losses 3 Best 2019 0.1546 0.7729 0.2124 Shake-FA-ResNet + FA-ResNet 15 Baseline VGG 0.1077 0.7258 0.1656 VGG + 0.0063 + 0.0083 + 0.0079
  • 16. Conclusions: The Task is Challenging! ● Improvements are present, but are they significant? ● Is it about having more data vs having a better approach? ● Dataset not large enough, thus data augmentation is commonly used. ● Winning team SAIL-MiM-USC used ensemble with different losses designed to compensate for unbalanced tag representation, maybe that is the direction? ● Team AUGment focused on more lightweight models with less FLOPs
  • 17. Future directions ● Considering the 2021 edition of the task ● Several datasets? ● Improved tag balance of the dataset? ● Individual tag analysis? ● Additional tag metadata (genre, instruments)
  • 18. Reproducibility ● Open auto-tagging dataset - MTG-Jamendo ● Pre-processing, download, baseline scripts available on GitHub ● Audio and metadata available under Creative Commons licence ● Open source code for the baseline and submissions ○ 4 out of 6 teams published their source code All info is available on the website: tinyurl.com/mediaeval2020emotions
  • 19. Questions? Thank you! Multimedia Evaluation Benchmark 2020 Workshop 14-15 December 2020, Online