SlideShare a Scribd company logo
1
Exploring the Relationship
Between Multi-Modal Emotion
Semantics of Music
Ju-Chiang Wang, Yi-Hsuan Yang, Kaichun
Chang, Hsin-Min Wang, and Skyh-Kang Jeng
Academia Sinica,
National Taiwan University,
Taipei, Taiwan
2
Outline
• Introduction and Potentiality
• Methodology
– The ATB and AEG models
– Framework to combine the two models
• Evaluation and Result
• Conclusion
• In this presentation, mood and emotion
are exchangeable
3
Introduction – Tag and Valence-Arousal (VA)
• Music emotion modeling, two approaches:
• Share a unified goal of
understanding the emotion
semantics of music
• (Arbitrary) mood tags can be
mapped into the VA space
in an unsupervised and
content-based manner,
without any training ground
truth for the semantic mapping
• Automatically generate a
semantically structured tag cloud
in the VA space
Categorical
Dimensional
Arousal
2 1
3 4
(high )
(low )
Valence
(positive )(negative )
4
Visualization of Music Mood (Laurier et al.)
Generated by SOM
5
Potentiality (Clarifying the Debate)
• A novice user may be unfamiliar with VA model, it
would be helpful to display mood tags in the VA space
• Facilitate applications such as tag-based music search
and browsing interface
• Dimension reduction for tag visualization may result
dimensions not conforming to valence and arousal
• The VA values of some affective terms can be found,
but not elicited from music
• Affective terms are not cross lingual and not always
have exact translations in different languages
• Cultural-dependent, corpus-dependent
6
Taxonomy of Music Mood (Xiao Hu, et al.)
Aggressive 侵略的;好鬥
Amiable 和藹可親的;厚道的
Autumnal 秋的;像秋天的
Bittersweet 苦樂參半的
Boisterous 喧鬧的;狂暴的
Brooding 徘徊不去的;沈思的
Calm 冷靜;鎮定
Campy 裝模作樣;
Cheerful 興高采烈的;情緒好的
Confident 有信心的,自負的
Dreamy 夢幻般的;愛作白日夢的;
Fiery (感情)激烈的,熱烈的
Fun 有趣的
Humorous 幽默的;滑稽的
Intense 強烈的;熱情的
Literate 有文化修養的
Nostalgic 鄉愁的
Passionate 熱情的;熱烈的;易怒的
Poignant 深刻的;辛酸的
Quirky 詭詐的;多變的;古怪的
Relaxed 鬆懈的;放鬆的
Rollicking 嬉耍的;愉快的
Rousing 使覺醒的;使奮起的
Rowdy 粗暴的;喧鬧的
Silly 愚蠢的;糊塗的;無聊的
Soothing 慰藉的;使人寬心的
Sweet 甜的;悅耳的
Tense 緊張的;引起緊張的
Visceral 出自內心深處的
Volatile 易發作的;輕浮的;飛逝的
Whimsical 想入非非的,怪誕的,古怪的
Wistful 渴望的;想往的;留戀的
Witty 機智的;說話風趣的
Wry 歪斜的;曲解的;堅持錯誤的
GAP GAP
7
Potentiality (Clarifying the Debate)
Machine Learning is necessary for such a task
8
Methodology of the Framework
• A probabilistic framework with two component models,
Acoustic Tag Bernoullis (ATB) and Acoustic Emotion
Gaussians (AEG)
– Computationally model the generative processes from acoustic
features to a mood tag and a VA value, respectively
• Based on the same acoustic feature space, the ATB and
AEG models can share and transit the semantic
information to each other
• Bridged by the acoustic feature space, we can align one
emotion modality to the other
• The first attempt to establish a joint model for exploring
between discrete mood categories and continuous
emotion space
9
Construct Feature Reference Model
A1 A2
AK-1
AK A3A4
Global GMM for acoustic
feature encoding
EM Training
A Universal
Music Database
Acoustic GMM
Music Tracks
& Audio Signal
Frame-based Features
… …
… …
Global Set  of frame
vectors randomly
selected from each track
…
Music Tracks
& Audio Signal
A Universal
Music Database
Music Tracks
& Audio Signal
10
Represent a Song into Probabilistic Space
1
2
K-1
K…
Posterior
Probabilities over
the Acoustic GMM
…
A1
A2
AK-1
Acoustic GMM
AK
…
Feature Vectors
Histogram:
Acoustic GMM Posterior
prob
Each dim corresponds to a specific acoustic pattern
1 2 K-1 K…
11
Acoustic Tag Bernoullis (ATB)
• Given an mood-tagged music dataset with the binary
label for a mood tag
• Learn ATB that describes the generative process of each
song in the dataset from acoustic features to mood tag
• Won (AUC Clip) in Mood Tag Classification (MIREX2009,
2010)
12
Acoustic Emotion Gaussians (AEG)
• Given a VA-annotated music dataset
• Learn AEG that describes the generative process of
each song in the dataset from acoustic features to the
VA space
• Presented in OS2, superior to its rivals, SVR and MLR
13
The Learning of VA GMM on MER60
14
Multi-Modal Emotion Semantic Mapping
• Three models are aligned, ATB, Acoustic GMM, and AEG
• Transit the weights from a mood tag to the VA GMM
• The semantic mapping processes are transparent and
easy to be observed and interpreted
Mapping a tag into a VA Gaussian distribution
15
Evaluation – Corpora and Settings
• Two corpora used: MER60 and AMG1644
• MER60: jointly annotated corpus (MER60-alone setting)
– 60 music clips, each is 30-second
– 99 subjects in total, each clip annotated by 40 subjects
– The VA values are entered by clicking on the emotion space
on a computer display
– Query Last.fm and leave 50 top mood tags for the 60 songs
• AMG1644: used for the separately annotated corpora
(AMG1644-MER60 setting)
– Crawl the audio of the “top songs” for 33 mood tags (AMG),
most of the tags are used in MIREX mood classification task
– Leading to 1,644 clips, each is about 30-second
16
Acoustic Features
• Adopt the bag-of-frames representation
• Extracting frame-based musical features from audio
using the MIRToolbox 1.3
• All the frames of a clip are aggregated into the acoustic
GMM posterior and perform the analysis of emotion at
the clip-level, instead of frame-level
• Frame-based features
– Dynamic, spectral, timbre, and tonal
– 70-dim concatenated feature vector for a frame
17
Result for the MER60-Alone Setting
• Graphviz for visualization, Voronoi diagram-based
heuristic to avoid tag overlapping
18
• Graphviz for visualization, Voronoi diagram-based
heuristic to avoid tag overlapping
Result for the AMG-MER Setting
19
Comparison with Psychologist
• Quantitative comparison
– Refer to the VA values of 30 affective terms proposed by
Whissell and Plutchik (WP) and by the Affective Norms for
English Words (ANEW)
– For a tag, measure the Euclidean distance between the
generated VA value and the psychologists’ one
• Baseline
– Set the generated VA values of each tag to the origin
– Represent a non-effective tag-VA mapping
20
Discussion
• The result is not sensitive to K
• Such a learning-based framework is scalable and can do
better if more annotated data is available
• Automatic discovering
– For instance, construct a balance audio music corpus and let
Chinese to label the Chinese mood tags
– Generate a Chinese mood tag cloud
• Inverse correlation between the VA intensity and the
covariance of a tag
– Tags lying on the outer circle would have larger font sizes
21
Result for the MER60-Alone Setting
22
Conclusion
• A novel framework that unifies the categorical and
dimensional emotion semantics of music
• Demonstrated how to map a mood tag to a 2-D VA
Gaussian and generate the corresponding tag cloud,
and this can be further extended to arbitrary tags
• Verify whether an arbitrary tag is mood-related or not
• We will conduct user studies for the result
• More investigations in acoustic feature
representations for better generalization of the
emotion modeling
23
Arbitrary Tag - MajorMiner Not Mood-related
24
Arbitrary Tag - MajorMiner Mood-related

More Related Content

PPTX
Sentiment, Opinion & Emotion on the Multilingual Web
PDF
Positive words carry less information than negative words
PPT
An In-Depth Evaluation of Multimodal Video Genre Categorization
PDF
How to Mind Map for Study Success
PDF
MediaEval 2018: Emotion and theme recognition in music using jamendo
PDF
MediaEval 2020: Emotion and Theme Recognition in Music Using Jamendo
PDF
Emotion and Theme Recognition in Music Using Jamendo
PDF
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
Sentiment, Opinion & Emotion on the Multilingual Web
Positive words carry less information than negative words
An In-Depth Evaluation of Multimodal Video Genre Categorization
How to Mind Map for Study Success
MediaEval 2018: Emotion and theme recognition in music using jamendo
MediaEval 2020: Emotion and Theme Recognition in Music Using Jamendo
Emotion and Theme Recognition in Music Using Jamendo
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...

Similar to Exploring the Relationship Between Multi-Modal Emotion Semantics of Music (20)

PDF
Personalized Music Emotion Recognition via Model Adaptation
PPTX
Developing music mood taxonomies
PDF
Emotion in Music Task at MediaEval 2014
PDF
MediaEval 2015 - Emotion in Music: Task Overview
PPTX
Brave New Task: Musiclef Multimodal Music Tagging
PDF
Audio Features for Music Emotion Recognition A Survey.pdf
PDF
Dimensional Music Emotion Recognition
PDF
ACM Multimedia 2012 Grand Challenge: Music Video Generation
PDF
IRJET - EMO-MUSIC(Emotion based Music Player)
PDF
Emotion analysis of songs based on lyrical
PDF
Mehfil : Song Recommendation System Using Sentiment Detected
PPTX
Interdisciplinary Perspectives on Emotion, Music and Technology
PDF
Emotion classification for musical data using deep learning techniques
PDF
Mood Detection
PDF
MOODetector: Automatic Music Emotion Recognition
PDF
Emotion Recognition in Classical Music
PDF
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...
PDF
A new alley in Opinion Mining using Senti Audio Visual Algorithm
PDF
MediaEval 2020 Emotion and Theme Recognition in Music Task: Loss Function App...
PDF
27063-97761-1-PB
Personalized Music Emotion Recognition via Model Adaptation
Developing music mood taxonomies
Emotion in Music Task at MediaEval 2014
MediaEval 2015 - Emotion in Music: Task Overview
Brave New Task: Musiclef Multimodal Music Tagging
Audio Features for Music Emotion Recognition A Survey.pdf
Dimensional Music Emotion Recognition
ACM Multimedia 2012 Grand Challenge: Music Video Generation
IRJET - EMO-MUSIC(Emotion based Music Player)
Emotion analysis of songs based on lyrical
Mehfil : Song Recommendation System Using Sentiment Detected
Interdisciplinary Perspectives on Emotion, Music and Technology
Emotion classification for musical data using deep learning techniques
Mood Detection
MOODetector: Automatic Music Emotion Recognition
Emotion Recognition in Classical Music
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...
A new alley in Opinion Mining using Senti Audio Visual Algorithm
MediaEval 2020 Emotion and Theme Recognition in Music Task: Loss Function App...
27063-97761-1-PB
Ad

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
1. Introduction to Computer Programming.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Modernising the Digital Integration Hub
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Assigned Numbers - 2025 - Bluetooth® Document
1. Introduction to Computer Programming.pptx
STKI Israel Market Study 2025 version august
A novel scalable deep ensemble learning framework for big data classification...
Univ-Connecticut-ChatGPT-Presentaion.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Zenith AI: Advanced Artificial Intelligence
A contest of sentiment analysis: k-nearest neighbor versus neural network
NewMind AI Weekly Chronicles - August'25-Week II
A comparative study of natural language inference in Swahili using monolingua...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Web App vs Mobile App What Should You Build First.pdf
Chapter 5: Probability Theory and Statistics
observCloud-Native Containerability and monitoring.pptx
1 - Historical Antecedents, Social Consideration.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
WOOl fibre morphology and structure.pdf for textiles
Modernising the Digital Integration Hub
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Ad

Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

  • 1. 1 Exploring the Relationship Between Multi-Modal Emotion Semantics of Music Ju-Chiang Wang, Yi-Hsuan Yang, Kaichun Chang, Hsin-Min Wang, and Skyh-Kang Jeng Academia Sinica, National Taiwan University, Taipei, Taiwan
  • 2. 2 Outline • Introduction and Potentiality • Methodology – The ATB and AEG models – Framework to combine the two models • Evaluation and Result • Conclusion • In this presentation, mood and emotion are exchangeable
  • 3. 3 Introduction – Tag and Valence-Arousal (VA) • Music emotion modeling, two approaches: • Share a unified goal of understanding the emotion semantics of music • (Arbitrary) mood tags can be mapped into the VA space in an unsupervised and content-based manner, without any training ground truth for the semantic mapping • Automatically generate a semantically structured tag cloud in the VA space Categorical Dimensional Arousal 2 1 3 4 (high ) (low ) Valence (positive )(negative )
  • 4. 4 Visualization of Music Mood (Laurier et al.) Generated by SOM
  • 5. 5 Potentiality (Clarifying the Debate) • A novice user may be unfamiliar with VA model, it would be helpful to display mood tags in the VA space • Facilitate applications such as tag-based music search and browsing interface • Dimension reduction for tag visualization may result dimensions not conforming to valence and arousal • The VA values of some affective terms can be found, but not elicited from music • Affective terms are not cross lingual and not always have exact translations in different languages • Cultural-dependent, corpus-dependent
  • 6. 6 Taxonomy of Music Mood (Xiao Hu, et al.) Aggressive 侵略的;好鬥 Amiable 和藹可親的;厚道的 Autumnal 秋的;像秋天的 Bittersweet 苦樂參半的 Boisterous 喧鬧的;狂暴的 Brooding 徘徊不去的;沈思的 Calm 冷靜;鎮定 Campy 裝模作樣; Cheerful 興高采烈的;情緒好的 Confident 有信心的,自負的 Dreamy 夢幻般的;愛作白日夢的; Fiery (感情)激烈的,熱烈的 Fun 有趣的 Humorous 幽默的;滑稽的 Intense 強烈的;熱情的 Literate 有文化修養的 Nostalgic 鄉愁的 Passionate 熱情的;熱烈的;易怒的 Poignant 深刻的;辛酸的 Quirky 詭詐的;多變的;古怪的 Relaxed 鬆懈的;放鬆的 Rollicking 嬉耍的;愉快的 Rousing 使覺醒的;使奮起的 Rowdy 粗暴的;喧鬧的 Silly 愚蠢的;糊塗的;無聊的 Soothing 慰藉的;使人寬心的 Sweet 甜的;悅耳的 Tense 緊張的;引起緊張的 Visceral 出自內心深處的 Volatile 易發作的;輕浮的;飛逝的 Whimsical 想入非非的,怪誕的,古怪的 Wistful 渴望的;想往的;留戀的 Witty 機智的;說話風趣的 Wry 歪斜的;曲解的;堅持錯誤的 GAP GAP
  • 7. 7 Potentiality (Clarifying the Debate) Machine Learning is necessary for such a task
  • 8. 8 Methodology of the Framework • A probabilistic framework with two component models, Acoustic Tag Bernoullis (ATB) and Acoustic Emotion Gaussians (AEG) – Computationally model the generative processes from acoustic features to a mood tag and a VA value, respectively • Based on the same acoustic feature space, the ATB and AEG models can share and transit the semantic information to each other • Bridged by the acoustic feature space, we can align one emotion modality to the other • The first attempt to establish a joint model for exploring between discrete mood categories and continuous emotion space
  • 9. 9 Construct Feature Reference Model A1 A2 AK-1 AK A3A4 Global GMM for acoustic feature encoding EM Training A Universal Music Database Acoustic GMM Music Tracks & Audio Signal Frame-based Features … … … … Global Set  of frame vectors randomly selected from each track … Music Tracks & Audio Signal A Universal Music Database Music Tracks & Audio Signal
  • 10. 10 Represent a Song into Probabilistic Space 1 2 K-1 K… Posterior Probabilities over the Acoustic GMM … A1 A2 AK-1 Acoustic GMM AK … Feature Vectors Histogram: Acoustic GMM Posterior prob Each dim corresponds to a specific acoustic pattern 1 2 K-1 K…
  • 11. 11 Acoustic Tag Bernoullis (ATB) • Given an mood-tagged music dataset with the binary label for a mood tag • Learn ATB that describes the generative process of each song in the dataset from acoustic features to mood tag • Won (AUC Clip) in Mood Tag Classification (MIREX2009, 2010)
  • 12. 12 Acoustic Emotion Gaussians (AEG) • Given a VA-annotated music dataset • Learn AEG that describes the generative process of each song in the dataset from acoustic features to the VA space • Presented in OS2, superior to its rivals, SVR and MLR
  • 13. 13 The Learning of VA GMM on MER60
  • 14. 14 Multi-Modal Emotion Semantic Mapping • Three models are aligned, ATB, Acoustic GMM, and AEG • Transit the weights from a mood tag to the VA GMM • The semantic mapping processes are transparent and easy to be observed and interpreted Mapping a tag into a VA Gaussian distribution
  • 15. 15 Evaluation – Corpora and Settings • Two corpora used: MER60 and AMG1644 • MER60: jointly annotated corpus (MER60-alone setting) – 60 music clips, each is 30-second – 99 subjects in total, each clip annotated by 40 subjects – The VA values are entered by clicking on the emotion space on a computer display – Query Last.fm and leave 50 top mood tags for the 60 songs • AMG1644: used for the separately annotated corpora (AMG1644-MER60 setting) – Crawl the audio of the “top songs” for 33 mood tags (AMG), most of the tags are used in MIREX mood classification task – Leading to 1,644 clips, each is about 30-second
  • 16. 16 Acoustic Features • Adopt the bag-of-frames representation • Extracting frame-based musical features from audio using the MIRToolbox 1.3 • All the frames of a clip are aggregated into the acoustic GMM posterior and perform the analysis of emotion at the clip-level, instead of frame-level • Frame-based features – Dynamic, spectral, timbre, and tonal – 70-dim concatenated feature vector for a frame
  • 17. 17 Result for the MER60-Alone Setting • Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping
  • 18. 18 • Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping Result for the AMG-MER Setting
  • 19. 19 Comparison with Psychologist • Quantitative comparison – Refer to the VA values of 30 affective terms proposed by Whissell and Plutchik (WP) and by the Affective Norms for English Words (ANEW) – For a tag, measure the Euclidean distance between the generated VA value and the psychologists’ one • Baseline – Set the generated VA values of each tag to the origin – Represent a non-effective tag-VA mapping
  • 20. 20 Discussion • The result is not sensitive to K • Such a learning-based framework is scalable and can do better if more annotated data is available • Automatic discovering – For instance, construct a balance audio music corpus and let Chinese to label the Chinese mood tags – Generate a Chinese mood tag cloud • Inverse correlation between the VA intensity and the covariance of a tag – Tags lying on the outer circle would have larger font sizes
  • 21. 21 Result for the MER60-Alone Setting
  • 22. 22 Conclusion • A novel framework that unifies the categorical and dimensional emotion semantics of music • Demonstrated how to map a mood tag to a 2-D VA Gaussian and generate the corresponding tag cloud, and this can be further extended to arbitrary tags • Verify whether an arbitrary tag is mood-related or not • We will conduct user studies for the result • More investigations in acoustic feature representations for better generalization of the emotion modeling
  • 23. 23 Arbitrary Tag - MajorMiner Not Mood-related
  • 24. 24 Arbitrary Tag - MajorMiner Mood-related