SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1061
Music Genre Classification using GMM
R. Thiruvengatanadhan1
1Assistant Professor/Lecturer (on Deputation), Department of Computer Science and Engineering,
Annamalai University, Annamalainagar, Tamil Nadu, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Automatic music genre classification is very useful
in music indexing. Tempogram is one of the feature extraction
method uses in classification of musical genre that is based
temporal structure of music signals. Searchingandorganizing
are the main characteristics of the music genre classification
system these days. This paper describes a new technique that
uses support vector machines to classify songs. Gaussian
mixture model classify music audio into their respective
classes by learning from training data. The proposed feature
extraction and classification models results in betteraccuracy
in music genre classification.
Key Words: Feature Extraction, Tempogram and Gaussian
mixture model (GMM).
1. INTRODUCTION
Musical genres have no strict definitions and boundaries as
they arise through a complex interaction betweenthe public,
marketing, historical, and cultural factors. This observation
has led some researchers to suggest the definition of a new
genre classification scheme purely for the purposesofmusic
information retrieval [1]. In addition to this, the
advancement in digital signal processing and data mining
techniques has led to intensive study on music signal
analysis like , content-based music retrieval, music genre
classification, duet analysis, Musical transcription , Musical
Information retrieval and musical instrumentdetectionand
classification. Musical Instrumentdetectiontechniqueshave
many potential applications such as detecting and analyzing
solo passages, audio and videoretrieval,music transcription,
playlist generation, acoustic environment classification,
video scene analysis and annotation etc.
Automatically extracting music information is gaining
importance as a way to structure and organize the
increasingly large numbers of music files available digitally
on the Web. It is very likely that in the near future all
recorded music in human history will be available on the
Web. Automatic music analysis will be one of the services
that music content distribution vendors will use to attract
customers. Due to improvements in internet services and
network bandwidth there is also an increase in number of
people involving with the audio libraries. But with large
music database the warehouses require an exhausting and
time consuming work, particularly when categorizing audio
genre manually. Music has also been divided into Genres
and sub genres not only on the basis on music butalsoonthe
lyrics as well [2]. This makes classification harder. To make
things more complicate the definition of music genre may
have very well changed over time [3]. For instance, rock
songs that were made fifty years ago are different from the
rock songs we have today.
2. ACOUSTIC FEATURES FOR AUDIO CLASSIFICATION
An important objective of extracting the features is to
compress the speech signal to a vector that is representative
of the meaningful information it is trying to characterize. In
these works, musicfeatures namelyTempogramfeatures are
extracted.
2.1 Tempogram
An element which gives shape to the music in temporal
dimension is the rhythm. Rhythmic feature arranges sounds
and silences in time. A predominatedpulsecalledbeat which
serves as basis for temporal structure of music is induced
[242]. Tempogram captures the local tempo and beat
characteristics of musicsignals.TheFouriertempograms are
used in the research work.
Fig -1: Novelty Curve Computations.
Human perceives rhythm as a regular pattern of pulses as a
result of moments of musical stress. Abrupt changes in
loudness, timbre and harmonic causes the occurrences of
musical accents [4]. In instruments like piano, percussion
instruments and guitar, occurs a sudden change in signal
energy accompanied by very sharp attacks. A novelty curve
is based on this observation and is computed for extracting
meaningful information regarding note onset e.g. pieces of
songs which are dominated by instruments [5]. In the pre-
processing, stage short segmented frames have been
extracted and windowed. The novelty curve computed, as
described above, indicatespeakswhichrepresentnoteonset
values [6]. A hamming window function is applied to avoid
boundary problems as smoothing [7]. Novelty curve
computation is shown in Fig. 1. The Fourier tempogram is
calculated. Tempo related to musical context is a measure of
beats per minute. Finally,the histogramiscomputedforeach
frame resulting in 12 dimensional feature vectors.
3. CLASSIFICATION MODEL
3.1 Gaussian Mixture Models
Parametric or non-parametric methods are used to model
the distribution of feature vectors. Parametric models are
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1062
based on the shape of probability density function [8]. In
non-parametric modeling only minimal or no assumption
regarding the probability density function of feature vector
is made [9]. The Gaussian mixture model (GMM) is used in
classifying different audio classes. The Gaussian classifier is
an example of a parametric classifier. It is an intuitive
approach when the model consists of several Gaussian
components, which can be seen to model acoustic features.
In classification, each class is represented by a GMM and
refers to its model. Once the GMM is trained, itcan beusedto
predict which class a new sample probably belongs to [10].
The probability distribution of feature vectors ismodeled by
parametric or non-parametric methods. Models which
assume the shape of probability density functionaretermed
parametric. In non-parametric modeling, minimal or no
assumptions are maderegardingtheprobability distribution
of feature vectors. The potential of Gaussian mixturemodels
to represent an underlying set of acoustic classes by
individual Gaussian components,inwhichthespectral shape
of the acoustic class is parameterized by the mean vector
and the covariance matrix, is significant.
Also, these models have the ability to form a smooth
approximation to the arbitrarily-shaped observation
densities in the absence of other information [11]. With
Gaussian mixture models, each sound is modeled as a
mixture of several Gaussianclustersinthefeaturespace. The
basis for using GMM is that the distribution of feature
vectors extracted from a class can be modeled by a mixture
of Gaussian densities.
The motivation for using Gaussian densities as the
representation of audio features is the potential of GMMs to
represent an underlying set of acoustic classes by individual
Gaussian components in which the spectral shape of the
acoustic class is parameterized by the mean vector and the
covariance matrix[12]. Also, GMMs have the abilitytoforma
smooth approximation to the arbitrarilyshapedobservation
densities in the absence of other information. With GMMs,
each sound is modeled as a mixture of several Gaussian
clusters in the feature space [13].
A variety of approaches to the problem of mixture
decomposition have been proposed, many of whichfocus on
maximum likelihood methods such as expectation
maximization (EM) or maximum a posteriori estimation
(MAP). Generally these methods consider separately the
question of parameter estimation and system identification,
that is to say a distinction is made between the
determination of the number and functional form of
components within a mixture and the estimation of the
corresponding parameter values.
4. IMPLEMENTATION
4.1 Dataset Collection
The music data is collected from music channels using a TV
tuner card. A total dataset of 100 different songsisrecorded,
which is sampled at 22 kHz and encoded by 16-bit. In order
to make training results statisticallysignificant,trainingdata
should be sufficient and cover various genres of music.
4.2 Feature Extraction
In this work fixed length frames with duration of 20 ms and
50 percentages overlap (i.e., 10 ms) are used. An input wav
file is given to the feature extractiontechniques.Tempogram
12 dimensional feature values will be calculated for the
given wav file. The above process is continued for 100
number of wav files.
4.3 Classification
When the featureextraction processisdonetheaudioshould
be classified genremusic.Theextractedfeaturevectorisused
to classify whether the audio is speech or music. A mean
vector is calculated for the whole audio and it is compared
either to results from training data or to predefined
thresholds. We select 75 music samples as training data
including 25 classic music, 25 pop music and 25 rock music.
The rest 25 samples are used as a test set.
Gaussian mixtures for the three classes are modeled for the
features extracted. For classification the feature vectors are
extracted and each of the feature vectors is given as input to
the GMM model. The distribution of the acoustic features is
captured using GMM. We have chosen a mixture of 2, 5, 10
mixturemodels. The class to which the audiosamplebelongs
is decided based on the highest output.
The performance of the system for 2, 5 and 10 Gaussian
mixtures is shown in Table.1. Thedistributionoftheacoustic
features is captured using GMM. The class to which the
speech and music sample belongs is decided based on the
highest output. Table.1 shows the performance of GMM for
speech and music classification based on the number of
mixtures.
Table -1: Performance of GMM for different mixtures.
GMM 2 5 10
Classic 94% 93% 94%
Pop 89% 87% 87%
Rock 90% 91% 93%
Chart -1: Performance of audio classification for different
duration of speech and music clips
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1063
Audio classification using GMM gives an accuracy of 94.9%.
The performance of GMM for different duration as shown in
Chart 1 shows that when the mixtures were increasedfrom5
to 10 there was no considerable increase in theperformance.
With GMM, the best performance was achieved with 10
Gaussian mixtures.
In this paper, we have proposed an automatic music genre
classification system using GMM. Tempogram is calculated
as features to characterize audio content. GMM learning
algorithm has been used for the classification of genre
classes of music by learning from training data. The
proposed classification method is implemented using EM
algorithm approach to fit the GMM parameters for
classification between classic, pop androck bylearningfrom
training data. Experimental results show that the proposed
audio GMM method has good performance in musical genre
classification scheme is very effective and the accuracy rate
is 94%.
REFERENCES
[1] F. Pachet and D. Cazaly, “A classification of musical
genre,”inProc.RIAO Content-Based Multimedia
Information Access Conf., Paris,France, Mar. 2000.
[2] Serwach, M., & Stasiak, B., GA-based parameterization
and feature selection for automatic music genre
recognition. In Proceedings of 2016 17th International
Conference Computational Problems of Electrical
Engineering, CPEE 2016.
[3] Dijk, L. Van., Radboud Universiteit Nijmegen
Bachelorthesis Information Science Finding musical
genre similarity using machine learning techniques, 1–
25, 2014.
[4] Mi Tian, Gy ¨ orgy Fazekas, Dawn A. A. Black, Mark
Sandler, On the use of the tempogram to describe audio
content and its Application to music structural
segmentation, IEEE International Conference on
Acoustics, Speech and Signal Processing(ICASSP),2015.
[5] Venkatesh Kulkarni, Towards Automatic Audio
Segmentation of Indian Carnatic Music, Master Thesis,
Friedrich Alexander University, 2014.
[6] Eronen, A. and Klapuri, A., “Music Tempo Estimation
with k-NN regression,” IEEE Transactions on Audio,
Speech and Language Processing, vol. 18, no. 1, pp. 50-
57, 2010.
[7] Gui, Wenming & Sun, Yao & Tao, Yuting & Li, Yanping &
Meng, Lun & Zhang, Jinglan., A Novel Tempogram
Generating Algorithm Based on Matching Pursuit.
Applied Sciences, 2018.
[8] Tang, H., Chu, S. M., Hasegawa-Johnson, M. andHuang,T.
S., “Partially Supervised Speaker Clustering”, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 34, no. 5, pp. 959-971, 2012.
[9] Chunhui Wang, Qianqian Zhu, Zhenyu Shan, Yingjie Xia
and Yuncai Liu, “Fusing Heterogeneous Traffic Data by
Kalman Filters and Gaussian Mixture Models,” IEEE
International Conference on Intelligent Transportation
Systems, pp. 276-281, 2014.
[10] Sourabh Ravindran,KristopherSchlemmer,andDavidV.
Anderson, “A physiologi-cally inspiredmethodforaudio
classification,” Journal on AppliedSignal Processing,vol.
9, pp. 1374–1381, 2005.
[11] Menaka Rajapakse and Lonce Wyse, “Generic audio
classification using a hybrid model based on GMMs and
HMMs,” in IEEE Int’l Conf. Multimedia Modeling,
February 2005, pp. 1550–1555.
[12] Poonam Sharma and Anjali Garg.FeatureExtractionand
Recognition of Hindi Spoken Words using Neural
Networks. International Journal of Computer
Applications 142(7):12-17, May 2016.
[13] Sujay G Kakodkar and Samarth Borkar. Speech Emotion
Recognition of Sanskrit Language using Machine
Learning. International Journal of Computer
Applications 179(51):23-28, June 2018
5. CONCLUSION

More Related Content

PDF
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
PDF
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
PDF
IRJET- A Review of Music Analysis Techniques
PDF
18 15993 31427-1-sm(edit)nn
PPTX
Oceans13 Presentation
PDF
Performance Comparison of Modified Variable Step Size Leaky LMS Algorithm for...
PDF
IRJET- Performance Evaluation of DOA Estimation using MUSIC and Beamformi...
PDF
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
Half Gaussian-based wavelet transform for pooling layer for convolution neura...
IRJET- A Review of Music Analysis Techniques
18 15993 31427-1-sm(edit)nn
Oceans13 Presentation
Performance Comparison of Modified Variable Step Size Leaky LMS Algorithm for...
IRJET- Performance Evaluation of DOA Estimation using MUSIC and Beamformi...
IRJET- Study of MIMO Precoding Techniques and their Application using Joi...

What's hot (10)

PDF
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...
PDF
Performance Analysis of PAPR Reduction in MIMO-OFDM
PDF
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
PDF
Mc cdma performance on single
PDF
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systems
PDF
Traffic models and estimation
PDF
Impact of Partial Demand Increase on the Performance of IP Networks and Re-op...
PDF
Efficient Planning and Offline Routing Approaches for IP Networks
PDF
Sound event detection using deep neural networks
PDF
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...
Performance Analysis of PAPR Reduction in MIMO-OFDM
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
Mc cdma performance on single
BER Performance of MPSK and MQAM in 2x2 Almouti MIMO Systems
Traffic models and estimation
Impact of Partial Demand Increase on the Performance of IP Networks and Re-op...
Efficient Planning and Offline Routing Approaches for IP Networks
Sound event detection using deep neural networks
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...
Ad

Similar to IRJET- Music Genre Classification using GMM (20)

PDF
IRJET- Music Genre Classification using SVM
PDF
IRJET- Music Genre Classification using MFCC and AANN
PDF
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
PDF
IRJET- Musical Instrument Recognition using CNN and SVM
PDF
Recognition of music genres using deep learning.
PDF
IRJET- Voice based Gender Recognition
PDF
1801 1805
PDF
1801 1805
PDF
FORECASTING MUSIC GENRE (RNN - LSTM)
PDF
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
PDF
Music Engendering Algorithm With Diverse Characteristic By Adjusting The Para...
PDF
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
PDF
IRJET- A Survey on Sound Recognition
PDF
IRJET- Segmentation in Digital Signal Processing
PDF
IRJET- A Personalized Music Recommendation System
PDF
Enhanced modulation spectral subtraction for IOVT speech recognition application
PDF
Music Genre Classification using Machine Learning
PDF
CASR-Report
PDF
A novel automatic voice recognition system based on text-independent in a noi...
PDF
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
IRJET- Music Genre Classification using SVM
IRJET- Music Genre Classification using MFCC and AANN
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- Musical Instrument Recognition using CNN and SVM
Recognition of music genres using deep learning.
IRJET- Voice based Gender Recognition
1801 1805
1801 1805
FORECASTING MUSIC GENRE (RNN - LSTM)
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
Music Engendering Algorithm With Diverse Characteristic By Adjusting The Para...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- A Survey on Sound Recognition
IRJET- Segmentation in Digital Signal Processing
IRJET- A Personalized Music Recommendation System
Enhanced modulation spectral subtraction for IOVT speech recognition application
Music Genre Classification using Machine Learning
CASR-Report
A novel automatic voice recognition system based on text-independent in a noi...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
additive manufacturing of ss316l using mig welding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
bas. eng. economics group 4 presentation 1.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
CH1 Production IntroductoryConcepts.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Lecture Notes Electrical Wiring System Components
Operating System & Kernel Study Guide-1 - converted.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Internet of Things (IOT) - A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
additive manufacturing of ss316l using mig welding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Foundation to blockchain - A guide to Blockchain Tech
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
bas. eng. economics group 4 presentation 1.pptx

IRJET- Music Genre Classification using GMM

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1061 Music Genre Classification using GMM R. Thiruvengatanadhan1 1Assistant Professor/Lecturer (on Deputation), Department of Computer Science and Engineering, Annamalai University, Annamalainagar, Tamil Nadu, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Automatic music genre classification is very useful in music indexing. Tempogram is one of the feature extraction method uses in classification of musical genre that is based temporal structure of music signals. Searchingandorganizing are the main characteristics of the music genre classification system these days. This paper describes a new technique that uses support vector machines to classify songs. Gaussian mixture model classify music audio into their respective classes by learning from training data. The proposed feature extraction and classification models results in betteraccuracy in music genre classification. Key Words: Feature Extraction, Tempogram and Gaussian mixture model (GMM). 1. INTRODUCTION Musical genres have no strict definitions and boundaries as they arise through a complex interaction betweenthe public, marketing, historical, and cultural factors. This observation has led some researchers to suggest the definition of a new genre classification scheme purely for the purposesofmusic information retrieval [1]. In addition to this, the advancement in digital signal processing and data mining techniques has led to intensive study on music signal analysis like , content-based music retrieval, music genre classification, duet analysis, Musical transcription , Musical Information retrieval and musical instrumentdetectionand classification. Musical Instrumentdetectiontechniqueshave many potential applications such as detecting and analyzing solo passages, audio and videoretrieval,music transcription, playlist generation, acoustic environment classification, video scene analysis and annotation etc. Automatically extracting music information is gaining importance as a way to structure and organize the increasingly large numbers of music files available digitally on the Web. It is very likely that in the near future all recorded music in human history will be available on the Web. Automatic music analysis will be one of the services that music content distribution vendors will use to attract customers. Due to improvements in internet services and network bandwidth there is also an increase in number of people involving with the audio libraries. But with large music database the warehouses require an exhausting and time consuming work, particularly when categorizing audio genre manually. Music has also been divided into Genres and sub genres not only on the basis on music butalsoonthe lyrics as well [2]. This makes classification harder. To make things more complicate the definition of music genre may have very well changed over time [3]. For instance, rock songs that were made fifty years ago are different from the rock songs we have today. 2. ACOUSTIC FEATURES FOR AUDIO CLASSIFICATION An important objective of extracting the features is to compress the speech signal to a vector that is representative of the meaningful information it is trying to characterize. In these works, musicfeatures namelyTempogramfeatures are extracted. 2.1 Tempogram An element which gives shape to the music in temporal dimension is the rhythm. Rhythmic feature arranges sounds and silences in time. A predominatedpulsecalledbeat which serves as basis for temporal structure of music is induced [242]. Tempogram captures the local tempo and beat characteristics of musicsignals.TheFouriertempograms are used in the research work. Fig -1: Novelty Curve Computations. Human perceives rhythm as a regular pattern of pulses as a result of moments of musical stress. Abrupt changes in loudness, timbre and harmonic causes the occurrences of musical accents [4]. In instruments like piano, percussion instruments and guitar, occurs a sudden change in signal energy accompanied by very sharp attacks. A novelty curve is based on this observation and is computed for extracting meaningful information regarding note onset e.g. pieces of songs which are dominated by instruments [5]. In the pre- processing, stage short segmented frames have been extracted and windowed. The novelty curve computed, as described above, indicatespeakswhichrepresentnoteonset values [6]. A hamming window function is applied to avoid boundary problems as smoothing [7]. Novelty curve computation is shown in Fig. 1. The Fourier tempogram is calculated. Tempo related to musical context is a measure of beats per minute. Finally,the histogramiscomputedforeach frame resulting in 12 dimensional feature vectors. 3. CLASSIFICATION MODEL 3.1 Gaussian Mixture Models Parametric or non-parametric methods are used to model the distribution of feature vectors. Parametric models are
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1062 based on the shape of probability density function [8]. In non-parametric modeling only minimal or no assumption regarding the probability density function of feature vector is made [9]. The Gaussian mixture model (GMM) is used in classifying different audio classes. The Gaussian classifier is an example of a parametric classifier. It is an intuitive approach when the model consists of several Gaussian components, which can be seen to model acoustic features. In classification, each class is represented by a GMM and refers to its model. Once the GMM is trained, itcan beusedto predict which class a new sample probably belongs to [10]. The probability distribution of feature vectors ismodeled by parametric or non-parametric methods. Models which assume the shape of probability density functionaretermed parametric. In non-parametric modeling, minimal or no assumptions are maderegardingtheprobability distribution of feature vectors. The potential of Gaussian mixturemodels to represent an underlying set of acoustic classes by individual Gaussian components,inwhichthespectral shape of the acoustic class is parameterized by the mean vector and the covariance matrix, is significant. Also, these models have the ability to form a smooth approximation to the arbitrarily-shaped observation densities in the absence of other information [11]. With Gaussian mixture models, each sound is modeled as a mixture of several Gaussianclustersinthefeaturespace. The basis for using GMM is that the distribution of feature vectors extracted from a class can be modeled by a mixture of Gaussian densities. The motivation for using Gaussian densities as the representation of audio features is the potential of GMMs to represent an underlying set of acoustic classes by individual Gaussian components in which the spectral shape of the acoustic class is parameterized by the mean vector and the covariance matrix[12]. Also, GMMs have the abilitytoforma smooth approximation to the arbitrarilyshapedobservation densities in the absence of other information. With GMMs, each sound is modeled as a mixture of several Gaussian clusters in the feature space [13]. A variety of approaches to the problem of mixture decomposition have been proposed, many of whichfocus on maximum likelihood methods such as expectation maximization (EM) or maximum a posteriori estimation (MAP). Generally these methods consider separately the question of parameter estimation and system identification, that is to say a distinction is made between the determination of the number and functional form of components within a mixture and the estimation of the corresponding parameter values. 4. IMPLEMENTATION 4.1 Dataset Collection The music data is collected from music channels using a TV tuner card. A total dataset of 100 different songsisrecorded, which is sampled at 22 kHz and encoded by 16-bit. In order to make training results statisticallysignificant,trainingdata should be sufficient and cover various genres of music. 4.2 Feature Extraction In this work fixed length frames with duration of 20 ms and 50 percentages overlap (i.e., 10 ms) are used. An input wav file is given to the feature extractiontechniques.Tempogram 12 dimensional feature values will be calculated for the given wav file. The above process is continued for 100 number of wav files. 4.3 Classification When the featureextraction processisdonetheaudioshould be classified genremusic.Theextractedfeaturevectorisused to classify whether the audio is speech or music. A mean vector is calculated for the whole audio and it is compared either to results from training data or to predefined thresholds. We select 75 music samples as training data including 25 classic music, 25 pop music and 25 rock music. The rest 25 samples are used as a test set. Gaussian mixtures for the three classes are modeled for the features extracted. For classification the feature vectors are extracted and each of the feature vectors is given as input to the GMM model. The distribution of the acoustic features is captured using GMM. We have chosen a mixture of 2, 5, 10 mixturemodels. The class to which the audiosamplebelongs is decided based on the highest output. The performance of the system for 2, 5 and 10 Gaussian mixtures is shown in Table.1. Thedistributionoftheacoustic features is captured using GMM. The class to which the speech and music sample belongs is decided based on the highest output. Table.1 shows the performance of GMM for speech and music classification based on the number of mixtures. Table -1: Performance of GMM for different mixtures. GMM 2 5 10 Classic 94% 93% 94% Pop 89% 87% 87% Rock 90% 91% 93% Chart -1: Performance of audio classification for different duration of speech and music clips
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1063 Audio classification using GMM gives an accuracy of 94.9%. The performance of GMM for different duration as shown in Chart 1 shows that when the mixtures were increasedfrom5 to 10 there was no considerable increase in theperformance. With GMM, the best performance was achieved with 10 Gaussian mixtures. In this paper, we have proposed an automatic music genre classification system using GMM. Tempogram is calculated as features to characterize audio content. GMM learning algorithm has been used for the classification of genre classes of music by learning from training data. The proposed classification method is implemented using EM algorithm approach to fit the GMM parameters for classification between classic, pop androck bylearningfrom training data. Experimental results show that the proposed audio GMM method has good performance in musical genre classification scheme is very effective and the accuracy rate is 94%. REFERENCES [1] F. Pachet and D. Cazaly, “A classification of musical genre,”inProc.RIAO Content-Based Multimedia Information Access Conf., Paris,France, Mar. 2000. [2] Serwach, M., & Stasiak, B., GA-based parameterization and feature selection for automatic music genre recognition. In Proceedings of 2016 17th International Conference Computational Problems of Electrical Engineering, CPEE 2016. [3] Dijk, L. Van., Radboud Universiteit Nijmegen Bachelorthesis Information Science Finding musical genre similarity using machine learning techniques, 1– 25, 2014. [4] Mi Tian, Gy ¨ orgy Fazekas, Dawn A. A. Black, Mark Sandler, On the use of the tempogram to describe audio content and its Application to music structural segmentation, IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP),2015. [5] Venkatesh Kulkarni, Towards Automatic Audio Segmentation of Indian Carnatic Music, Master Thesis, Friedrich Alexander University, 2014. [6] Eronen, A. and Klapuri, A., “Music Tempo Estimation with k-NN regression,” IEEE Transactions on Audio, Speech and Language Processing, vol. 18, no. 1, pp. 50- 57, 2010. [7] Gui, Wenming & Sun, Yao & Tao, Yuting & Li, Yanping & Meng, Lun & Zhang, Jinglan., A Novel Tempogram Generating Algorithm Based on Matching Pursuit. Applied Sciences, 2018. [8] Tang, H., Chu, S. M., Hasegawa-Johnson, M. andHuang,T. S., “Partially Supervised Speaker Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 5, pp. 959-971, 2012. [9] Chunhui Wang, Qianqian Zhu, Zhenyu Shan, Yingjie Xia and Yuncai Liu, “Fusing Heterogeneous Traffic Data by Kalman Filters and Gaussian Mixture Models,” IEEE International Conference on Intelligent Transportation Systems, pp. 276-281, 2014. [10] Sourabh Ravindran,KristopherSchlemmer,andDavidV. Anderson, “A physiologi-cally inspiredmethodforaudio classification,” Journal on AppliedSignal Processing,vol. 9, pp. 1374–1381, 2005. [11] Menaka Rajapakse and Lonce Wyse, “Generic audio classification using a hybrid model based on GMMs and HMMs,” in IEEE Int’l Conf. Multimedia Modeling, February 2005, pp. 1550–1555. [12] Poonam Sharma and Anjali Garg.FeatureExtractionand Recognition of Hindi Spoken Words using Neural Networks. International Journal of Computer Applications 142(7):12-17, May 2016. [13] Sujay G Kakodkar and Samarth Borkar. Speech Emotion Recognition of Sanskrit Language using Machine Learning. International Journal of Computer Applications 179(51):23-28, June 2018 5. CONCLUSION