Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

1
Exploring the Relationship
Between Multi-Modal Emotion
Semantics of Music
Ju-Chiang Wang, Yi-Hsuan Yang, Kaichun
Chang, Hsin-Min Wang, and Skyh-Kang Jeng
Academia Sinica,
National Taiwan University,
Taipei, Taiwan

2
Outline
• Introduction and Potentiality
• Methodology
– The ATB and AEG models
– Framework to combine the two models
• Evaluation and Result
• Conclusion
• In this presentation, mood and emotion
are exchangeable

3
Introduction – Tag and Valence-Arousal (VA)
• Music emotion modeling, two approaches:
• Share a unified goal of
understanding the emotion
semantics of music
• (Arbitrary) mood tags can be
mapped into the VA space
in an unsupervised and
content-based manner,
without any training ground
truth for the semantic mapping
• Automatically generate a
semantically structured tag cloud
in the VA space
Categorical
Dimensional
Arousal
2 1
3 4
(high )
(low )
Valence
(positive )(negative )

4
Visualization of Music Mood (Laurier et al.)
Generated by SOM

5
Potentiality (Clarifying the Debate)
• A novice user may be unfamiliar with VA model, it
would be helpful to display mood tags in the VA space
• Facilitate applications such as tag-based music search
and browsing interface
• Dimension reduction for tag visualization may result
dimensions not conforming to valence and arousal
• The VA values of some affective terms can be found,
but not elicited from music
• Affective terms are not cross lingual and not always
have exact translations in different languages
• Cultural-dependent, corpus-dependent

6
Taxonomy of Music Mood (Xiao Hu, et al.)
Aggressive 侵略的;好鬥
Amiable 和藹可親的;厚道的
Autumnal 秋的;像秋天的
Bittersweet 苦樂參半的
Boisterous 喧鬧的;狂暴的
Brooding 徘徊不去的;沈思的
Calm 冷靜;鎮定
Campy 裝模作樣;
Cheerful 興高采烈的;情緒好的
Confident 有信心的,自負的
Dreamy 夢幻般的;愛作白日夢的;
Fiery (感情)激烈的,熱烈的
Fun 有趣的
Humorous 幽默的;滑稽的
Intense 強烈的;熱情的
Literate 有文化修養的
Nostalgic 鄉愁的
Passionate 熱情的;熱烈的;易怒的
Poignant 深刻的;辛酸的
Quirky 詭詐的;多變的;古怪的
Relaxed 鬆懈的;放鬆的
Rollicking 嬉耍的;愉快的
Rousing 使覺醒的;使奮起的
Rowdy 粗暴的;喧鬧的
Silly 愚蠢的;糊塗的;無聊的
Soothing 慰藉的;使人寬心的
Sweet 甜的;悅耳的
Tense 緊張的;引起緊張的
Visceral 出自內心深處的
Volatile 易發作的;輕浮的;飛逝的
Whimsical 想入非非的,怪誕的,古怪的
Wistful 渴望的;想往的;留戀的
Witty 機智的;說話風趣的
Wry 歪斜的;曲解的;堅持錯誤的
GAP GAP

7
Potentiality (Clarifying the Debate)
Machine Learning is necessary for such a task

8
Methodology of the Framework
• A probabilistic framework with two component models,
Acoustic Tag Bernoullis (ATB) and Acoustic Emotion
Gaussians (AEG)
– Computationally model the generative processes from acoustic
features to a mood tag and a VA value, respectively
• Based on the same acoustic feature space, the ATB and
AEG models can share and transit the semantic
information to each other
• Bridged by the acoustic feature space, we can align one
emotion modality to the other
• The first attempt to establish a joint model for exploring
between discrete mood categories and continuous
emotion space

9
Construct Feature Reference Model
A1 A2
AK-1
AK A3A4
Global GMM for acoustic
feature encoding
EM Training
A Universal
Music Database
Acoustic GMM
Music Tracks
& Audio Signal
Frame-based Features
… …
… …
Global Set  of frame
vectors randomly
selected from each track
…
Music Tracks
& Audio Signal
A Universal
Music Database
Music Tracks
& Audio Signal

10
Represent a Song into Probabilistic Space
1
2
K-1
K…
Posterior
Probabilities over
the Acoustic GMM
…
A1
A2
AK-1
Acoustic GMM
AK
…
Feature Vectors
Histogram:
Acoustic GMM Posterior
prob
Each dim corresponds to a specific acoustic pattern
1 2 K-1 K…

11
Acoustic Tag Bernoullis (ATB)
• Given an mood-tagged music dataset with the binary
label for a mood tag
• Learn ATB that describes the generative process of each
song in the dataset from acoustic features to mood tag
• Won (AUC Clip) in Mood Tag Classification (MIREX2009,
2010)

12
Acoustic Emotion Gaussians (AEG)
• Given a VA-annotated music dataset
• Learn AEG that describes the generative process of
each song in the dataset from acoustic features to the
VA space
• Presented in OS2, superior to its rivals, SVR and MLR

13
The Learning of VA GMM on MER60

14
Multi-Modal Emotion Semantic Mapping
• Three models are aligned, ATB, Acoustic GMM, and AEG
• Transit the weights from a mood tag to the VA GMM
• The semantic mapping processes are transparent and
easy to be observed and interpreted
Mapping a tag into a VA Gaussian distribution

15
Evaluation – Corpora and Settings
• Two corpora used: MER60 and AMG1644
• MER60: jointly annotated corpus (MER60-alone setting)
– 60 music clips, each is 30-second
– 99 subjects in total, each clip annotated by 40 subjects
– The VA values are entered by clicking on the emotion space
on a computer display
– Query Last.fm and leave 50 top mood tags for the 60 songs
• AMG1644: used for the separately annotated corpora
(AMG1644-MER60 setting)
– Crawl the audio of the “top songs” for 33 mood tags (AMG),
most of the tags are used in MIREX mood classification task
– Leading to 1,644 clips, each is about 30-second

16
Acoustic Features
• Adopt the bag-of-frames representation
• Extracting frame-based musical features from audio
using the MIRToolbox 1.3
• All the frames of a clip are aggregated into the acoustic
GMM posterior and perform the analysis of emotion at
the clip-level, instead of frame-level
• Frame-based features
– Dynamic, spectral, timbre, and tonal
– 70-dim concatenated feature vector for a frame

17
Result for the MER60-Alone Setting
• Graphviz for visualization, Voronoi diagram-based
heuristic to avoid tag overlapping

18
• Graphviz for visualization, Voronoi diagram-based
heuristic to avoid tag overlapping
Result for the AMG-MER Setting

19
Comparison with Psychologist
• Quantitative comparison
– Refer to the VA values of 30 affective terms proposed by
Whissell and Plutchik (WP) and by the Affective Norms for
English Words (ANEW)
– For a tag, measure the Euclidean distance between the
generated VA value and the psychologists’ one
• Baseline
– Set the generated VA values of each tag to the origin
– Represent a non-effective tag-VA mapping

20
Discussion
• The result is not sensitive to K
• Such a learning-based framework is scalable and can do
better if more annotated data is available
• Automatic discovering
– For instance, construct a balance audio music corpus and let
Chinese to label the Chinese mood tags
– Generate a Chinese mood tag cloud
• Inverse correlation between the VA intensity and the
covariance of a tag
– Tags lying on the outer circle would have larger font sizes

21
Result for the MER60-Alone Setting

22
Conclusion
• A novel framework that unifies the categorical and
dimensional emotion semantics of music
• Demonstrated how to map a mood tag to a 2-D VA
Gaussian and generate the corresponding tag cloud,
and this can be further extended to arbitrary tags
• Verify whether an arbitrary tag is mood-related or not
• We will conduct user studies for the result
• More investigations in acoustic feature
representations for better generalization of the
emotion modeling

23
Arbitrary Tag - MajorMiner Not Mood-related

24
Arbitrary Tag - MajorMiner Mood-related

Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

More Related Content

Similar to Exploring the Relationship Between Multi-Modal Emotion Semantics of Music (20)

Recently uploaded (20)

Exploring the Relationship Between Multi-Modal Emotion Semantics of Music