SlideShare a Scribd company logo
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
.
......
Automatically Estimating Emotion in Music with
Deep Long-Short Term Memory Recurrent Neural
Networks
George Trigeorgis, Eduardo Coutinho, Stefanos Zafeiriou,
Bj¨orn Schuller
Department of Computing
Imperial College London
MediaEval 2015, Wurzen, Germany
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Method: feature sets
Feature Set 1 (FS1):
2013 INTERSPEECH Computational Paralinguistics Challenge
65 (energy, spectrum and voice-related) LLDs (plus first order
derivates) covering a broad set of descriptors from the fields
of speech processing, Music Information Retrieval, and general
sound analysis
We computed the mean and standard deviation functionals of
each feature over 1s time windows with 50% overlap
Final set: 260 features extracted at a rate of 2Hz.
All features were extracted openSMILE
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Method: feature sets
Feature Set 2 (FS2):
FS1 plus four new features
Roughness (R) and Sensory Dissonance (SDiss)
Tempo (T) and Event Density (ED).
Correspond to two psychoacoustic dimensions consistently
associated with the communication of emotion in music and
speech - Roughness and Duration (Coutinho & Dibben, 2013)
The four features were extracted with the MIR Toolbox
mirroughness: SDiss (Sethares formula) and R (Vassilakis
algorithm) mirtempo (T) mireventdensity (ED)
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Method: regressor
Given the importance of the temporal context to the
perception of emotion in music we consider temporal models.
Deep Recurrent Neural Networks. RNNs are neural nets which
operate also on the time-domain instead only the spatial
domain.
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Model training
Joint learning of Arousal and Valence time-continuous values
(multitask).
Cross-validation
...1 The fold subdivision followed a modulus based scheme
instance ID modulus 11
...2 The instances yielding a remainder of 10 were left out to
create a small test set for performance estimation
...3 On the remaining instances, a 10-fold cross-validation was
performed.
...4 We computed 4 trials of the same model each with
randomized initial weights in the range [-0.1,0.1].
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Model training (cont.)
Basic architecture: Deep LSTM-RNN (2 hidden layers) Optimised
parameters
number of LSTM blocks in each hidden layer,
learning rate
standard deviation of the Gaussian noise applied to the input
activations
used to alleviate the effects of over-fitting
A momentum of 0.9 was used for all tests Early stopping strategy
(to avoid overfitting the training data)
training was stopped after 20 iterations without improvement
of the validation set performance
For each fold, instances were presented in random order
The input (acoustic features) and output (emotion features) data
were standardised to zero mean and unit variance (on the
correspondent training sets used in each cross-validation fold)
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Pretraining using denoising autoencoders
The unsupervised pre-training strategy consisted of denoising
LSTM-RNN auto-encoders.
We first created a LSTM-RNN with a single hidden layer
trained to predict the input features (y(t) = x(t)).
In order to avoid over-fitting, in each training epoch and
timestep t, we added a noise vector n to x(t), sampled from a
Gaussian distribution with zero mean and variance σ2.
Both the development and test set instances were used to
train the DAE.
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Results
Results on the official test set.
CB: challenge baseline features
Run Arousal Valence
a)
RMSE
2 0.242±0.116 0.373±0.195
3 0.234±0.114 0.372±0.190
4 0.236±0.114 0.375±0.191
CB 0.270±0.110 0.366±0.180
r
2 0.611±0.254 0.004±0.505
3 0.599±0.287 0.017±0.492
4 0.613±0.278 0.026±0.500
CB 0.360±0.260 0.010±0.380
.....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
....
.
....
.
.....
.
....
.
.....
.
....
.
....
.
Questions.

More Related Content

PDF
feilner0201
PDF
xlelke00
PDF
M2 - Graphene on-chip THz
PDF
Program wcci-final[1]
PDF
M1 - Photoconductive Emitters
PDF
Casio EXILIM EX-ZR200 Manual
feilner0201
xlelke00
M2 - Graphene on-chip THz
Program wcci-final[1]
M1 - Photoconductive Emitters
Casio EXILIM EX-ZR200 Manual

Viewers also liked (16)

PDF
Predictive apps for startups
PPT
Cross cultural study of reading support
PPTX
Presentation_Final
PDF
Startups are about learning, SW Startup Day at TUT
PPT
Heart of Darkness, Emotion and Your Career
PDF
Cities and Startups: Cultivating Deep Engagement
PPT
Deep Learning
PDF
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
DOC
Emotional Intelligence
PPTX
Teaching that sticks! Christina slides
PDF
Chakrabarti alpha go analysis
PPTX
Museum festival 2013 moscow rbhs
PPTX
Deep Learning on Rescale - Oct/11/2016 at Rescale night
PPT
Museums & the Mind II, ASTC 2008
PPTX
Jay Turcot - Emotion AI Developer Day 2016
PPTX
Improving Data Quality with Active Learning for Emotion Analysis
Predictive apps for startups
Cross cultural study of reading support
Presentation_Final
Startups are about learning, SW Startup Day at TUT
Heart of Darkness, Emotion and Your Career
Cities and Startups: Cultivating Deep Engagement
Deep Learning
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Emotional Intelligence
Teaching that sticks! Christina slides
Chakrabarti alpha go analysis
Museum festival 2013 moscow rbhs
Deep Learning on Rescale - Oct/11/2016 at Rescale night
Museums & the Mind II, ASTC 2008
Jay Turcot - Emotion AI Developer Day 2016
Improving Data Quality with Active Learning for Emotion Analysis
Ad

Similar to MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Short Term Memory Recurrent Neural Networks (20)

PDF
edc_adaptivity
PDF
Neural Networks on Steroids
PDF
Neural networks and deep learning
PDF
Inglis PhD Thesis
PDF
A Sense of Place-A Model of Synaptic Plasticity in the Hippocampus
PDF
Emona tims-analog-communication-part1 2
PDF
Deep Learning Basics (lecture notes).pdf
PDF
PDF
Lab04_Signals_Systems.pdf
PDF
exjobb Telia
PDF
HaiqingWang-MasterThesis
PDF
Micro robotic cholesteatoma surgery
PDF
Algorithms for Sparse Signal Recovery in Compressed Sensing
PDF
978 1-4615-6311-2 fm
PDF
PDF
Audio Equalization Using LMS Adaptive Filtering
PDF
2019 imta bouklihacene-ghouthi
PDF
Master Thesis - A Distributed Algorithm for Stateless Load Balancing
PDF
Largescale Kernel Machines Lon Bottou Olivier Chapelle Dennis Decoste Jason W...
PDF
Thesis_Sebastian_Ånerud_2015-06-16
edc_adaptivity
Neural Networks on Steroids
Neural networks and deep learning
Inglis PhD Thesis
A Sense of Place-A Model of Synaptic Plasticity in the Hippocampus
Emona tims-analog-communication-part1 2
Deep Learning Basics (lecture notes).pdf
Lab04_Signals_Systems.pdf
exjobb Telia
HaiqingWang-MasterThesis
Micro robotic cholesteatoma surgery
Algorithms for Sparse Signal Recovery in Compressed Sensing
978 1-4615-6311-2 fm
Audio Equalization Using LMS Adaptive Filtering
2019 imta bouklihacene-ghouthi
Master Thesis - A Distributed Algorithm for Stateless Load Balancing
Largescale Kernel Machines Lon Bottou Olivier Chapelle Dennis Decoste Jason W...
Thesis_Sebastian_Ånerud_2015-06-16
Ad

More from multimediaeval (20)

PPTX
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
PDF
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
PDF
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
PDF
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
PPTX
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
PDF
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
PDF
Fooling an Automatic Image Quality Estimator
PDF
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
PDF
Pixel Privacy: Quality Camouflage for Social Images
PDF
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
PPTX
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
PDF
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
PDF
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
PPTX
Deep Conditional Adversarial learning for polyp Segmentation
PPTX
A Temporal-Spatial Attention Model for Medical Image Detection
PPTX
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
PDF
Fine-tuning for Polyp Segmentation with Attention
PPTX
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
PPTX
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
PDF
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Fooling an Automatic Image Quality Estimator
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Pixel Privacy: Quality Camouflage for Social Images
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Deep Conditional Adversarial learning for polyp Segmentation
A Temporal-Spatial Attention Model for Medical Image Detection
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
Fine-tuning for Polyp Segmentation with Attention
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Lesson notes of climatology university.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Yogi Goddess Pres Conference Studio Updates
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Anesthesia in Laparoscopic Surgery in India
Pharmacology of Heart Failure /Pharmacotherapy of CHF
human mycosis Human fungal infections are called human mycosis..pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Cell Types and Its function , kingdom of life
Chinmaya Tiranga quiz Grand Finale.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Lesson notes of climatology university.
Final Presentation General Medicine 03-08-2024.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
A systematic review of self-coping strategies used by university students to ...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Weekly quiz Compilation Jan -July 25.pdf
Yogi Goddess Pres Conference Studio Updates

MediaEval 2015 - Automatically Estimating Emotion in Music with Deep Long-Short Term Memory Recurrent Neural Networks