Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Vector Machines and Extended Semantic Analysis

Outline
Problem Formulation
Motivation
Proposed Method
Experimental Results
Future Work

Music Genre Classiﬁcation Using Explicit
Semantic Analysis and Sparsity-Eager Support
Vector Machines

Kamelia Aryafar

Drexel University
Computer Science Department

February 18, 2012

Kamelia Aryafar Music Genre Classiﬁcation Using Explicit Semantic Analysis

Outline
Problem Formulation
Motivation
Proposed Method
Future Work

1 Problem Formulation
2 Motivation
Challenges
Related Work
3 Proposed Method
Feature Selection
Fractional TF-IDF
Sparsity-Eager SVM Genre Classiﬁcation
4 Experimental Results
Benchmark Data set
Results
5 Future Work


Outline
Problem Formulation
Motivation Challenges
Proposed Method Related Work
Future Work

Motivation

Many systems are exposed to high-dimensional data, e.g.
images, image sequences and even scalar signals.
The high dimensional data could be also multimodal.


Outline
Problem Formulation
Future Work

Motivation

(Multimodal Mixture)

(Source I) (Source II)


Outline
Problem Formulation
Future Work

BSS Illustration
Artiﬁcial gaussian mixture of two audio sources:

(Violin mixture)

(I)

(II)


Outline
Problem Formulation
Future Work

Motivation

The problem of genre classiﬁcation:

(Violin playing)


Outline
Problem Formulation
Future Work

Motivation

The problem of genre classiﬁcation:

(Violin playing)

Genre Label: Classic Music/Violin


Outline
Problem Formulation
Future Work

Music Genre Classification

Goal
Music genre classification is the problem of categorization of a
piece of music into its corresponding categorical labels. The
goal of automatic music genre classification is to estimate
genre labels for test audio sequences in large data sets.


Outline
Problem Formulation
Future Work

Music Genre Classification

Goal
Music genre classification is the problem of categorization of a
piece of music into its corresponding categorical labels. The
goal of automatic music genre classification is to estimate
genre labels for test audio sequences in large data sets.

Motivation
Exponential growth in available music data sets
Cost reduction
Extension to similar tasks


Outline
Problem Formulation
Future Work

Challenges


Outline
Problem Formulation
Future Work

Challenges

The robust representation of audio signals in terms of
low-level features or high-level audio keywords
The construction of an automatic learning schema to
classify these representative features into music genres.

Outline
Problem Formulation
Future Work

Proposed Method


Outline
Problem Formulation
Future Work

Proposed Method

Abstract layer to represent features in terms of concepts
Invariant to feature selection


Outline
Problem Formulation
Future Work

TF-IDF Representation

Goal
Create a high-level abstraction of low-level audio features
(codewords of MFCCs) to enhance music genre classiﬁcation.


Outline
Problem Formulation
Future Work


Goal
Create a high-level abstraction of low-level audio features
(codewords of MFCCs) to enhance music genre classiﬁcation.

ESA Model
Explicit semantic analysis (ESA) utilizes term-frequency (tf) and
inverse document frequency (idf) weighting schemata to
represent low-level textual information in terms of concepts in
higher-dimensional space.


Outline
Problem Formulation
Future Work


EC,D [i, j] = tﬁdf (Ci , δj ).


Outline
Problem Formulation
Future Work


EC,D [i, j] = tﬁdf (Ci , δj ).

TF-IDF
The relationship between a codeword and a concept
(document) pair will be captured through the so-called tf-idf
value of the word-concept pair.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Mel Frequency Cepstral Coefﬁcients

MFCCs represent short-term power spectrum of sound and are
known to be effective for music classiﬁcation systems.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Mel Frequency Cepstral Coefﬁcients

MFCCs represent short-term power spectrum of sound and are
known to be effective for music classiﬁcation systems.

Pre-processing
For a large data set, k-means clustering
of MFCCs creates the audio code-book,
D = {δ1 , ..., δk }, using the cosine
similarity distance measure to reduce the
complexity of the feature space.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Fractional TF-IDF [2]


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Fractional TF-IDF [2]

tﬁdf (C, δ) = tf (C, δ) × idfδ
EC,D [i, j] = tﬁdf (Ci , δj )

Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Concept-based Representation of Audio Features


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Training the Classiﬁer

ESA representation of the training set
The set E(T ) of (ESA-vector, label) pairs will be provided as the
training data to a supervised classiﬁer algorithm.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Training the Classifier

ESA representation of the training set
The set E(T ) of (ESA-vector, label) pairs will be provided as the
training data to a supervised classifier algorithm.

Outcome
The set of hyperplanes that define the gaps between genres,
are the outcome of the training on E(T ).


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Genre Classiﬁcation

Classiﬁer selection
Sparsity-Eager support vector machine ( 1 -SVM) is used to
assign samples to their genre categories.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Genre Classification

Classifier selection
Sparsity-Eager support vector machine ( 1 -SVM) is used to
assign samples to their genre categories.

1 -SVM
In contrast to the the original 2 -SVM, only a small subset of the
training examples contribute to the formation of the final
classifier.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Sparsity-Eager SVM[1]

Classification
Given a set of M training examples, we aim to find a sample
subset such that (i) subset is sufficiently sparse, and (ii) the
classifier has a sufficiently low empirical loss and therefore
sufficiently large separating margin.


Outline
Problem Formulation
Feature Selection
Motivation
Fractional TF-IDF
Proposed Method
Future Work

Sparsity-Eager SVM[1]

Classification
Given a set of M training examples, we aim to find a sample
subset such that (i) subset is sufficiently sparse, and (ii) the
classifier has a sufficiently low empirical loss and therefore
sufficiently large separating margin.

Why 1 -SVM
(i) obtaining higher generalization accuracy on new (test)
examples, (ii) increasing the robustness against overfitting to
the training examples, and (iii) providing scalability in terms of
the classification complexity.


Outline
Problem Formulation
Motivation Benchmark Data set
Proposed Method Results
Future Work

Data set Description

Data set: Genre Samples
We use the publicly alternative 145
available benchmark blues 120
dataset for audio electronic 113
classiﬁcation and folk-country 222
clustering proposed by funk soul/R&B 47
Homburg et al [3]. The jazz 319
dataset contains pop 116
samples of 1886 songs rap/hip-hop 300
obtained from the rock 504
Garageband site.


Outline
Problem Formulation
Future Work

Experimental Setup

Parameters setup
Validation method: 10-fold cross validation
Performance measure: classiﬁcation accuracy rate
Similarity measure: cosine distance


Outline
Problem Formulation
Future Work

Experimental Setup

Parameters setup
Validation method: 10-fold cross validation
Performance measure: classiﬁcation accuracy rate
Similarity measure: cosine distance

Comparative features
Aggregation of MFCC features (AM)
Temporal, spectral and phase (TSPS)


Outline
Problem Formulation
Future Work

Genre Classiﬁcation Accuracy Results

ESA
Classiﬁer AM TSPS
k = 1000 k = 5000
Random 22.39 21.68 29.51 25.40
k-NN 35.83 47.40 48.59 51.88
SVM 40.81 51.81 53.76 57.81

Comparison
Aggregation of MFCC features (AM) and temporal, spectral and
phase (TSPS) features are compared to the ESA
representation of MFCC features.


Outline
Problem Formulation
Future Work

True Positive Accuracy Rate

50
l1−SVM
log−regression
45
l2−SVM
l1−regression
40
classification accuracy rate (%) per genre

35

30

25

20

15

10

5

0
1 2 3 4 5 6 7 8
Alternative Blues Electronic Folk−Country Jazz Pop Rock Rap/Hip−hop

Figure: True positive genre classiﬁcation rate


Outline
Problem Formulation
Future Work

Classiﬁer Convergence Time

Figure: Classiﬁer convergence time


Outline
Problem Formulation
Future Work

Classiﬁcation Accuracy vs. Training Samples

Figure: Accuracy rate for different samples


Outline
Problem Formulation
Motivation
Proposed Method
Future Work

Future Work

MFCC Representation

CCA Space

Audio Signals ESA-Encoding
(concepts)
...

CCA

Lyrics Data TF-IDF
TF Representation
(concepts) Representation


Outline
Problem Formulation
Motivation
Proposed Method
Future Work

Future Work...

MFCC Representation

CCA Space

Audio Query ESAENCODING
...

Paired Textual
Data (Lyrics)


Outline
Problem Formulation
Motivation
Proposed Method
Future Work

Questions?

Thank you!
[1] Kamelia Aryafar, sina Jafarpour, and Ali Shokoufandeh.
Automatic musical genre classification using sparsity-eager support vector machines.
In NIME’12, 2012.
[2] Kamelia Aryafar and Ali Shokoufandeh.
Music genre classification using explicit semantic analysis.
In Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and
multimodal strategies, MIRUM ’11, pages 33–38, New York, NY, USA, 2011. ACM.
¨
[3] Helge Homburg, Ingo Mierswa, Bulent Moller, Katharina Morik, and Michael Wurst.
¨
A benchmark dataset for audio classification and clustering.
In ISMIR, pages 528–531, 2005.

Acknowledments

This work was funded in part by Office of Naval Research (ONR) grant N00014-04-1-0363 and United States
National Science Foundation grant N0803670.


Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Vector Machines and Extended Semantic Analysis

More Related Content

Similar to Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Vector Machines and Extended Semantic Analysis (20)

More from pamselle (18)

Recently uploaded (20)

Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Vector Machines and Extended Semantic Analysis