SlideShare a Scribd company logo
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 1
Audio Art Authentication and Classification with Wavelet
Statistics
Joel Martin jrm2107@columbia.edu
Columbia University
Department of Electrical Engineering
New York, NY, 10027, USA
Abstract
An experimental computation technique for audio art authentication is presented. Specifically, the
computational techniques used by painting/drawings art authentication are transformed from two-
dimensional (image) into one-dimensional (audio) methods. The statistical model consists of first
and higher-order wavelet statistics. Classification is performed with a multi-dimensional scaled
3D visual model. The results from the analyses of music/silence discrimination, audio art
authentication, genre classification, and audio fingerprinting are demonstrated.
Keywords: Audio Classification, Feature Extraction, Musical Genre Classification, Wavelets.
1. INTRODUCTION
The detection of forgery and authenticity of fine art paintings has been in demand for centuries.
In the past, the intuitive trained eye of an art professional was relied upon to detect fakes with
good, relative success. However, the digital age has spawned computer-imaging techniques that
use advanced statistical algorithms for art content analysis.
In particular, the intriguing work of Lyu, Rockmore, and Farid [1] of Dartmouth University has
propelled the art authentication industry into the 21st century. Various works of Flemish artist
Pieter Bruegel the Elder and Italian painter Pietro di Cristoforo Vannucci were analyzed for
authenticity using statistical wavelet techniques. The results were quite remarkable, discerning
forgeries and the works of assistants in the paintings with fairly accurately.
These techniques could also be used to distinguish music authenticity and distinguish between
musical genres. Although a fake Beethoven recording cannot be purchased on the internet
(since Beethoven did not record any music), the idea is still intriguing. For example, statistical
components of an audio file could be used to describe the finger-picking style of a folk guitar artist
or the vocal-trends of a pop singer. These results could be used for commercial authentication,
classification purposes, or simply as musical theory analysis for the aspiring musician.
2. RELATED WORK
The demand for audio classification has initiated some very innovative research. Speech/music
discrimination [2], audio violence detection [3], audio search and retrieval systems [4], and music
genre classification [5] are a few examples of audio content analysis research. The basic
principle behind each topic is the pertinent feature extraction from the audio data, i.e. extracting
an N-length vector that sufficiently summarizes the content of the audio data.
The differentiations between music, speech, and silence/environmental sounds have been proven
fairly effective by using a variety of speech-processing techniques [6]. However, the classification
of musical genres has been quite tedious [7]. Previous research has used a variety of digital
musical analysis. Digital MIDI data was used to classify song melodies [8]. Irish, German, and
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 2
Austrian folk music have been distinguished using hidden Markov models [9]. Even neural
networks have been used to differentiate between pop and classical music [10].
Music experts agree that wavelet-statistics is an excellent method for audio analysis, because
music is accurately described using a combination of wavelets [11]. Lambrou, Kudumakis,
Speller, Sandler, and Linney used wavelet-statistics for genre classification for rock, piano, and
jazz styles. First order statistics, such as mean, variance, skewness, and kurtosis; and second
order statistics, such as angular second moment, correlation, and entropy, were used to build a
feature vector. Finally, classification produced excellent accuracy, using either the minimum
distance classifier or the least squares minimum distance classifier [12].
This paper attempts to use wavelet statistics for genre classification, music/silence differentiation,
audio art authentication, and audio fingerprinting. The same wavelet-based techniques used by
two-dimensional (image) painting/drawing analysis are transformed into one-dimensional (audio)
methods. As stated, the feature extraction is done using wavelet-statistics, and classification is
performed using multi-dimensional scaling [1].
3. FEATURE SELECTION
The music data is conditioned before the feature vectors are extracted. If needed, a digital filter is
used to extract the frequency range of a specific musical instrument. For example, the guitar
frequencies can be extracted from other instruments in a five-piece band. Then the data is read
from wave files and auto-scaled to fill the full intensity range (0,255).
The pertinent musical features are extracted after decomposing the audio file into basis wavelets.
Each audio file is transformed by using a five-level, wavelet-like decomposition. As illustrated in
Fig, 1, this decomposition splits the frequency space into multiple scales and two orientations,
whereas a two-dimensional image would be split into four orientations (lowpass, vertical,
horizontal, and diagonal subbands) [1]. Subsequent scales are created by subsampling the
lowpass basis by a factor of two and recursively filtering. The highpass subbands at scale i =
1,…,n are denoted as Di(x,y). Shown in Fig. 2, is a three-level decomposition of an audio
sample.
The feature vector is composed of two parts. The first part is a statistical model composed of the
mean, variance, skewness, and kurtosis of the highpass subband coefficients at each scale i =
1,…, n – 2. However, whereas the feature vector for a 2D image results in 12(n – 2) values, the
1D audio vector consists of 4(n – 2) values [1].
The second half of the feature vector consists of 4(n – 2) statistical values based on the errors of
an optimal linear predictor of coefficient magnitude. For the highpass subband Di(x,y), a linear
predictor for the magnitude of these coefficients in a subset of all possible neighbors may be
given by (1).
)1()(
)1()1(
)()1(
)1()()(
428427
426215
214213
21
++
+−++
++−
+++=
−−
−−
−−
x
i
x
i
x
i
x
i
x
i
x
i
iii
DwDw
DwDw
DwDw
xDwxDwxD
(1)
The linear predictor takes the form (2), in which the neighbors are arranged in a matrix Q. Next,
the predictors w
r
are found with (3). Lastly, the log error in the linear predictors is calculated from
(4).
wQD
rr
= , (2)
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 3
DQQQw TT
rr 1
)( −
= . (3)
)(log)(log 22 wQDED
rrr
−= . (4)
The mean, variance, skewness, and kurtosis of the highpass predictor log error is found at each
scale i = 1,…, n – 2; resulting in 4(n – 2) values [1].
Combining both sets of elements with a five-level frequency-domain decomposition projects a 24-
length feature vector. Our assumption is that the wavelet domain statistics are indicative to the
inherent characteristics of the musician. For example, the finger-picking style or favorite notes of
a musician may be extracted from a frequency-based analysis. This assumption proved to be
suitable for the painting art analysis of Lyu, Rockmore, and Farid [1].
FIGURE 1: Multi-scale and orientation decomposition of frequency space for 1-dimensional. Shown, from
top to bottom, are levels 0, 1, and 2, and from left to right, are the lowpass and highpass subbands.
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 4
FIGURE 2: Time-domain (a) and frequency-domain (b) decomposition of a 16kpbs wav. Shown from top to
bottom, are levels 0, 1, and 2, and from left to right are the lowpass and highpass subbands.
4. CLASSIFICATION
The 24-length feature vector is computed for each of the N-songs. Next, the Hausdorff Distance
is computed between all sets of feature vectors to search for similarities between the audio
samples [1]. Finally, the resulting NxN distance matrix is transformed into 3D space using
classical multidimensional scaling [13]. A visual 3D model can show the clustering and
separation between the audio files.
5. EXPERIMENTAL RESULTS
5.1 Music/Silence Discrimination
Three sets of analysis were done. The first consisted of a set of three song recordings from a
musician and three sets of silence recordings. All files were 16 kHz, 8 bit wave files.
The Hausdorff Distance between all sets of vectors produced 100% accurate results. That is, the
smallest distances (besides 0) were accurate in identifying the author of the content (i.e., whether
the data correlates with the musician or the silence). The multi-dimensional scaled (MDS)
vectors were plotted in a 3D domain, shown in Fig. 3a.
The silence recordings clustered very closely together, while the song vectors were sparsely
located. The distances from songs A,B,C from silence 1,2,3 were all near ~1.15. The simple plot
illustrates the apparent differences between the two sets. The scattering of the musical data
could have been due to low sound quality. Therefore, a higher bit rate should be used during
recording.
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 5
FIGURE 3: 3D MDS for (a) Music/Silence, (b) Authentication, (c) Genre Classification.
5.2 Audio Art Authentication
The second analysis consisted of three songs recorded by two different rock artists, a respected
artist and an amateur (cover band) artist. All data files were 44.1 kHz, 16 bit wave files. The
guitar signals of the files were extracted by band-passing the data from 80 to 5 kHz.
The Hausdorff Distance was computed between all sets of vectors, and produced 50% accuracy
as to identifying the correct musician. Fig. 3b shows the multi-dimensional scaled (MDS) vectors.
Musician B had songs A and C cluster together, but most of the results were sparsely located.
The distances of Musician A (songs A,B,C) and Musician B (songs A, B) from Musician B (song
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 6
C) were 1.19, 0.98, 1.22, 0.17, and 0.82 respectively. Thus, the songs for Musician B were
indeed closer together, while Musician A’s vectors remained scattered.
The less-than-satisfactory results for the second experiment may have been due to a range of
factors. For example, the professional recording may have undergone a compression technique
that would have altered the frequency spectrum of the wave file.
5.3 Genre Identification
The third analysis consisted of three songs from each of the genres: jazz, rock, rap, and classical.
Again, all files were 44.1kHz, 16-bit wave files. For this analysis, no bandpass filtering was
performed on any of the song files.
The Hausdorff distances resulted in only 33% accurate genre classification. However, the 3D plot
in Fig. 3c shows an interesting outcome. All three of the jazz and classical songs clustered very
closely together, while the rock and rap songs were scattered in one primary direction. The
distance of the rock 1,2,3 and rap 1,2,3 recordings from the cluster were approximately 0.48,
0.85, 0.78, 1.05, 0.26, and 1.05, respectively. The jazz and classical songs were all within ~0.06
distance from each other.
These results are quite remarkable. That is, the jazz and classical songs produced very similar
wavelet domain statistics, while the rock and rap songs were independent of each other. The
similar instrumentation and timing arrangements of jazz and classical music may have caused the
clustering. On the other hand, the distortion and spontaneity of rap and rock music may have
caused the scattering. Nevertheless, the classification of jazz and classical music from rock and
rap music is quite apparent from the results.
5.4 Audio Fingerprinting
The final analysis consisted of a variety of eight pop or rock songs performed by nineteen
different artists. The Hausdorff distance was computed to test whether the similar songs
correlated together; e.g., the Hausdorff distance between four artists performing the same song.
The results, shown in Fig. 4, were mediocre. In the graph, each song is represented as a shape.
As can be seen, almost all of the shapes clustered together.
Next, 10% white noise distortion was added to each song, and the Hausdorff distance was
computed between each of the undistorted and distorted songs. The fingerprint should match the
distorted song with its correct undistorted version. The results were better, matching five of the
eight songs - 63% accuracy. The distortion in each of the five songs was very noticeable, but did
not alter the fingerprint enough to disassociate it with its original form.
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 7
FIGURE 4: 3D MDS for Audio Fingerprinting
6. CONCLUDING REMARKS
A wavelet-based digital technique for audio art identification was presented. The presumption of
the technique was that the statistical features of the frequency domain provide sufficient data to
identify and classify audio art data. As shown in the paper, the technique was very effective in
classifying music from silence. Sufficient results were also found for classifying two sets from a
respectable band and a corresponding amateur (cover) band. Additionally, the fingerprinting
proved 65% correct for 10% added distortion. These results could be used for commercial
authentication purposes or simply as musical theory analysis for the aspiring musician.
7. ACKNOWLEDGEMENT
Songs by Led Zeppelin and tribute band Led-By-Zeppelin were used for the authentication test.
Songs by Miles Davis, Eqsuivel, Angelo Badalamenti, Bob Dylan, The Rolling Stones, Snoop
Dogg, Ice Cube, Ugly Duckling, Erik Satie, and Felix Mendelssohn were used for the genre
classification test. Songs by the Beatles, Ray Charles, James Taylor, Boys 2 Men, Pink Floyd,
Creedance Clearwater Revival, Boonie Tyler, the Animals, Madonna, Kylie Minoge, Tina Turner,
Grand Funk Railroad, Bob Dylan, Bob Marley, and Eric Clapton were used for the audio
fingerprinting test.
8. REFERENCES
1. S. Lyu, D. Rockmore, H. Farid. “A Digital Technique for Art Authentication.” Proc Natl Acad Sci
USA. 101(49): 17006-10. 2004.
2. E. S. Parris, M. J. Carey, H. Lloyd-Thomas, “A comparison of features for speech, music
discrimination.” Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing. 1:149-152. 1999.
3. S. Pfeiffer, S. Fischer, W. Effelsberg, “Automatic audio content analysis.” Proc. of 4th ACM
Multimedia Conference. 1:21-30. 1996.
4. E. Wold, T. Blum, D. Keislar, J. Wheaton. “Content-Based Classification, search, and Retrieval
of Audio.” IEEE Multimedia. 3(3):27-36. 1996.
Joel Martin
Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 8
5. G. Tzanetakis, P. Cook. “Music genre classification of audio signals.” IEEE Transactions on
Speech and Audio Processing. 10(5):293-302. 2002.
6. L. Lu, H. Jiang, H. J. Zhang. “A robust audio classification and segmentation method.” Proc.
9th ACM Int. Conf. Multimedia. 1:203--211. 2001.
7. T. Li, G. Tzanetakis. “Factors in automatic musical genre classification of audio signals.” Proc.
of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
2003.
8. M. Shan, F. Kuo. “Music style mining and classification by melody.” Proc. IEEE International
Conference on Multimedia and Expo. 1:97-100. 2002.
9. W. Chai, B. Vercoe. “Folk music classification using hidden Markov models.” Proc
International Conference on Artificial Intelligence. 2001.
10. B. Matityaho, M. Furst. “Neural network based model for classification of music type.” Proc.
18th Conv. Electrical and Electronic Engineers in Israel. 4.3.4/1-5. 1995.
11. L. Endelt, A. Cour-Harbo, “Wavelets for sparse representation of music.” Proc of
Wedelmusic. 2004.
12. T. Lambrou, P. Kudumakis, R. Speller, M. Sandler, A. Linney. "Classification of audio signals
using statistical features on time and wavelet transform domains.” Proc. IEEE ICASSP. 6:3621-
24. 1998
13. T. Cox, M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994

More Related Content

PDF
Knn a machine learning approach to recognize a musical instrument
PDF
IRJET- Music Genre Classification using GMM
PDF
Audio Features Based Steganography Detection in WAV File
PDF
IRJET- A Review of Music Analysis Techniques
PDF
Music analyzer and plagiarism
PDF
IRJET- Musical Instrument Recognition using CNN and SVM
PDF
1801 1805
PDF
1801 1805
Knn a machine learning approach to recognize a musical instrument
IRJET- Music Genre Classification using GMM
Audio Features Based Steganography Detection in WAV File
IRJET- A Review of Music Analysis Techniques
Music analyzer and plagiarism
IRJET- Musical Instrument Recognition using CNN and SVM
1801 1805
1801 1805

Similar to Audio Art Authentication and Classification with Wavelet Statistics (20)

PDF
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
PDF
IRJET- Music Genre Classification using SVM
PDF
timeSeriesClassificationLDA
PDF
Investigating Multi-Feature Selection and Ensembling for Audio Classification
PPTX
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
PDF
MusicData for basis of data science in context.pdf
PPTX
PDF
Poster vega north
PDF
Recognition of music genres using deep learning.
PDF
Classification of Vehicles Based on Audio Signals using Quadratic Discriminan...
PDF
Music genres classification by deep learning
PDF
Anvita Ncvpripg 2008 Presentation
PDF
Anvita Wisp 2007 Presentation
PDF
Performance Comparison of Musical Instrument Family Classification Using Soft...
PDF
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
PDF
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
PDF
Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging
PDF
Audio descriptive analysis of singer and musical instrument identification in...
PDF
Computational Approaches for Melodic Description in Indian Art Music Corpora
PPTX
Music data analysis big data presentation
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Music Genre Classification using SVM
timeSeriesClassificationLDA
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
MusicData for basis of data science in context.pdf
Poster vega north
Recognition of music genres using deep learning.
Classification of Vehicles Based on Audio Signals using Quadratic Discriminan...
Music genres classification by deep learning
Anvita Ncvpripg 2008 Presentation
Anvita Wisp 2007 Presentation
Performance Comparison of Musical Instrument Family Classification Using Soft...
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging
Audio descriptive analysis of singer and musical instrument identification in...
Computational Approaches for Melodic Description in Indian Art Music Corpora
Music data analysis big data presentation
Ad

More from Waqas Tariq (20)

PDF
The Use of Java Swing’s Components to Develop a Widget
PDF
3D Human Hand Posture Reconstruction Using a Single 2D Image
PDF
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...
PDF
A Proposed Web Accessibility Framework for the Arab Disabled
PDF
Real Time Blinking Detection Based on Gabor Filter
PDF
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...
PDF
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...
PDF
Collaborative Learning of Organisational Knolwedge
PDF
A PNML extension for the HCI design
PDF
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 Board
PDF
An overview on Advanced Research Works on Brain-Computer Interface
PDF
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...
PDF
Principles of Good Screen Design in Websites
PDF
Progress of Virtual Teams in Albania
PDF
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...
PDF
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...
PDF
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...
PDF
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
PDF
An Improved Approach for Word Ambiguity Removal
PDF
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
The Use of Java Swing’s Components to Develop a Widget
3D Human Hand Posture Reconstruction Using a Single 2D Image
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...
A Proposed Web Accessibility Framework for the Arab Disabled
Real Time Blinking Detection Based on Gabor Filter
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...
Collaborative Learning of Organisational Knolwedge
A PNML extension for the HCI design
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 Board
An overview on Advanced Research Works on Brain-Computer Interface
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...
Principles of Good Screen Design in Websites
Progress of Virtual Teams in Albania
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
An Improved Approach for Word Ambiguity Removal
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Ad

Recently uploaded (20)

PPTX
Lesson notes of climatology university.
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Structure & Organelles in detailed.
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Classroom Observation Tools for Teachers
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Pharma ospi slides which help in ospi learning
Lesson notes of climatology university.
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Computing-Curriculum for Schools in Ghana
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
FourierSeries-QuestionsWithAnswers(Part-A).pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Structure & Organelles in detailed.
VCE English Exam - Section C Student Revision Booklet
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
A systematic review of self-coping strategies used by university students to ...
Classroom Observation Tools for Teachers
O7-L3 Supply Chain Operations - ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Supply Chain Operations Speaking Notes -ICLT Program
Microbial disease of the cardiovascular and lymphatic systems
Pharma ospi slides which help in ospi learning

Audio Art Authentication and Classification with Wavelet Statistics

  • 1. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 1 Audio Art Authentication and Classification with Wavelet Statistics Joel Martin jrm2107@columbia.edu Columbia University Department of Electrical Engineering New York, NY, 10027, USA Abstract An experimental computation technique for audio art authentication is presented. Specifically, the computational techniques used by painting/drawings art authentication are transformed from two- dimensional (image) into one-dimensional (audio) methods. The statistical model consists of first and higher-order wavelet statistics. Classification is performed with a multi-dimensional scaled 3D visual model. The results from the analyses of music/silence discrimination, audio art authentication, genre classification, and audio fingerprinting are demonstrated. Keywords: Audio Classification, Feature Extraction, Musical Genre Classification, Wavelets. 1. INTRODUCTION The detection of forgery and authenticity of fine art paintings has been in demand for centuries. In the past, the intuitive trained eye of an art professional was relied upon to detect fakes with good, relative success. However, the digital age has spawned computer-imaging techniques that use advanced statistical algorithms for art content analysis. In particular, the intriguing work of Lyu, Rockmore, and Farid [1] of Dartmouth University has propelled the art authentication industry into the 21st century. Various works of Flemish artist Pieter Bruegel the Elder and Italian painter Pietro di Cristoforo Vannucci were analyzed for authenticity using statistical wavelet techniques. The results were quite remarkable, discerning forgeries and the works of assistants in the paintings with fairly accurately. These techniques could also be used to distinguish music authenticity and distinguish between musical genres. Although a fake Beethoven recording cannot be purchased on the internet (since Beethoven did not record any music), the idea is still intriguing. For example, statistical components of an audio file could be used to describe the finger-picking style of a folk guitar artist or the vocal-trends of a pop singer. These results could be used for commercial authentication, classification purposes, or simply as musical theory analysis for the aspiring musician. 2. RELATED WORK The demand for audio classification has initiated some very innovative research. Speech/music discrimination [2], audio violence detection [3], audio search and retrieval systems [4], and music genre classification [5] are a few examples of audio content analysis research. The basic principle behind each topic is the pertinent feature extraction from the audio data, i.e. extracting an N-length vector that sufficiently summarizes the content of the audio data. The differentiations between music, speech, and silence/environmental sounds have been proven fairly effective by using a variety of speech-processing techniques [6]. However, the classification of musical genres has been quite tedious [7]. Previous research has used a variety of digital musical analysis. Digital MIDI data was used to classify song melodies [8]. Irish, German, and
  • 2. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 2 Austrian folk music have been distinguished using hidden Markov models [9]. Even neural networks have been used to differentiate between pop and classical music [10]. Music experts agree that wavelet-statistics is an excellent method for audio analysis, because music is accurately described using a combination of wavelets [11]. Lambrou, Kudumakis, Speller, Sandler, and Linney used wavelet-statistics for genre classification for rock, piano, and jazz styles. First order statistics, such as mean, variance, skewness, and kurtosis; and second order statistics, such as angular second moment, correlation, and entropy, were used to build a feature vector. Finally, classification produced excellent accuracy, using either the minimum distance classifier or the least squares minimum distance classifier [12]. This paper attempts to use wavelet statistics for genre classification, music/silence differentiation, audio art authentication, and audio fingerprinting. The same wavelet-based techniques used by two-dimensional (image) painting/drawing analysis are transformed into one-dimensional (audio) methods. As stated, the feature extraction is done using wavelet-statistics, and classification is performed using multi-dimensional scaling [1]. 3. FEATURE SELECTION The music data is conditioned before the feature vectors are extracted. If needed, a digital filter is used to extract the frequency range of a specific musical instrument. For example, the guitar frequencies can be extracted from other instruments in a five-piece band. Then the data is read from wave files and auto-scaled to fill the full intensity range (0,255). The pertinent musical features are extracted after decomposing the audio file into basis wavelets. Each audio file is transformed by using a five-level, wavelet-like decomposition. As illustrated in Fig, 1, this decomposition splits the frequency space into multiple scales and two orientations, whereas a two-dimensional image would be split into four orientations (lowpass, vertical, horizontal, and diagonal subbands) [1]. Subsequent scales are created by subsampling the lowpass basis by a factor of two and recursively filtering. The highpass subbands at scale i = 1,…,n are denoted as Di(x,y). Shown in Fig. 2, is a three-level decomposition of an audio sample. The feature vector is composed of two parts. The first part is a statistical model composed of the mean, variance, skewness, and kurtosis of the highpass subband coefficients at each scale i = 1,…, n – 2. However, whereas the feature vector for a 2D image results in 12(n – 2) values, the 1D audio vector consists of 4(n – 2) values [1]. The second half of the feature vector consists of 4(n – 2) statistical values based on the errors of an optimal linear predictor of coefficient magnitude. For the highpass subband Di(x,y), a linear predictor for the magnitude of these coefficients in a subset of all possible neighbors may be given by (1). )1()( )1()1( )()1( )1()()( 428427 426215 214213 21 ++ +−++ ++− +++= −− −− −− x i x i x i x i x i x i iii DwDw DwDw DwDw xDwxDwxD (1) The linear predictor takes the form (2), in which the neighbors are arranged in a matrix Q. Next, the predictors w r are found with (3). Lastly, the log error in the linear predictors is calculated from (4). wQD rr = , (2)
  • 3. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 3 DQQQw TT rr 1 )( − = . (3) )(log)(log 22 wQDED rrr −= . (4) The mean, variance, skewness, and kurtosis of the highpass predictor log error is found at each scale i = 1,…, n – 2; resulting in 4(n – 2) values [1]. Combining both sets of elements with a five-level frequency-domain decomposition projects a 24- length feature vector. Our assumption is that the wavelet domain statistics are indicative to the inherent characteristics of the musician. For example, the finger-picking style or favorite notes of a musician may be extracted from a frequency-based analysis. This assumption proved to be suitable for the painting art analysis of Lyu, Rockmore, and Farid [1]. FIGURE 1: Multi-scale and orientation decomposition of frequency space for 1-dimensional. Shown, from top to bottom, are levels 0, 1, and 2, and from left to right, are the lowpass and highpass subbands.
  • 4. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 4 FIGURE 2: Time-domain (a) and frequency-domain (b) decomposition of a 16kpbs wav. Shown from top to bottom, are levels 0, 1, and 2, and from left to right are the lowpass and highpass subbands. 4. CLASSIFICATION The 24-length feature vector is computed for each of the N-songs. Next, the Hausdorff Distance is computed between all sets of feature vectors to search for similarities between the audio samples [1]. Finally, the resulting NxN distance matrix is transformed into 3D space using classical multidimensional scaling [13]. A visual 3D model can show the clustering and separation between the audio files. 5. EXPERIMENTAL RESULTS 5.1 Music/Silence Discrimination Three sets of analysis were done. The first consisted of a set of three song recordings from a musician and three sets of silence recordings. All files were 16 kHz, 8 bit wave files. The Hausdorff Distance between all sets of vectors produced 100% accurate results. That is, the smallest distances (besides 0) were accurate in identifying the author of the content (i.e., whether the data correlates with the musician or the silence). The multi-dimensional scaled (MDS) vectors were plotted in a 3D domain, shown in Fig. 3a. The silence recordings clustered very closely together, while the song vectors were sparsely located. The distances from songs A,B,C from silence 1,2,3 were all near ~1.15. The simple plot illustrates the apparent differences between the two sets. The scattering of the musical data could have been due to low sound quality. Therefore, a higher bit rate should be used during recording.
  • 5. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 5 FIGURE 3: 3D MDS for (a) Music/Silence, (b) Authentication, (c) Genre Classification. 5.2 Audio Art Authentication The second analysis consisted of three songs recorded by two different rock artists, a respected artist and an amateur (cover band) artist. All data files were 44.1 kHz, 16 bit wave files. The guitar signals of the files were extracted by band-passing the data from 80 to 5 kHz. The Hausdorff Distance was computed between all sets of vectors, and produced 50% accuracy as to identifying the correct musician. Fig. 3b shows the multi-dimensional scaled (MDS) vectors. Musician B had songs A and C cluster together, but most of the results were sparsely located. The distances of Musician A (songs A,B,C) and Musician B (songs A, B) from Musician B (song
  • 6. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 6 C) were 1.19, 0.98, 1.22, 0.17, and 0.82 respectively. Thus, the songs for Musician B were indeed closer together, while Musician A’s vectors remained scattered. The less-than-satisfactory results for the second experiment may have been due to a range of factors. For example, the professional recording may have undergone a compression technique that would have altered the frequency spectrum of the wave file. 5.3 Genre Identification The third analysis consisted of three songs from each of the genres: jazz, rock, rap, and classical. Again, all files were 44.1kHz, 16-bit wave files. For this analysis, no bandpass filtering was performed on any of the song files. The Hausdorff distances resulted in only 33% accurate genre classification. However, the 3D plot in Fig. 3c shows an interesting outcome. All three of the jazz and classical songs clustered very closely together, while the rock and rap songs were scattered in one primary direction. The distance of the rock 1,2,3 and rap 1,2,3 recordings from the cluster were approximately 0.48, 0.85, 0.78, 1.05, 0.26, and 1.05, respectively. The jazz and classical songs were all within ~0.06 distance from each other. These results are quite remarkable. That is, the jazz and classical songs produced very similar wavelet domain statistics, while the rock and rap songs were independent of each other. The similar instrumentation and timing arrangements of jazz and classical music may have caused the clustering. On the other hand, the distortion and spontaneity of rap and rock music may have caused the scattering. Nevertheless, the classification of jazz and classical music from rock and rap music is quite apparent from the results. 5.4 Audio Fingerprinting The final analysis consisted of a variety of eight pop or rock songs performed by nineteen different artists. The Hausdorff distance was computed to test whether the similar songs correlated together; e.g., the Hausdorff distance between four artists performing the same song. The results, shown in Fig. 4, were mediocre. In the graph, each song is represented as a shape. As can be seen, almost all of the shapes clustered together. Next, 10% white noise distortion was added to each song, and the Hausdorff distance was computed between each of the undistorted and distorted songs. The fingerprint should match the distorted song with its correct undistorted version. The results were better, matching five of the eight songs - 63% accuracy. The distortion in each of the five songs was very noticeable, but did not alter the fingerprint enough to disassociate it with its original form.
  • 7. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 7 FIGURE 4: 3D MDS for Audio Fingerprinting 6. CONCLUDING REMARKS A wavelet-based digital technique for audio art identification was presented. The presumption of the technique was that the statistical features of the frequency domain provide sufficient data to identify and classify audio art data. As shown in the paper, the technique was very effective in classifying music from silence. Sufficient results were also found for classifying two sets from a respectable band and a corresponding amateur (cover) band. Additionally, the fingerprinting proved 65% correct for 10% added distortion. These results could be used for commercial authentication purposes or simply as musical theory analysis for the aspiring musician. 7. ACKNOWLEDGEMENT Songs by Led Zeppelin and tribute band Led-By-Zeppelin were used for the authentication test. Songs by Miles Davis, Eqsuivel, Angelo Badalamenti, Bob Dylan, The Rolling Stones, Snoop Dogg, Ice Cube, Ugly Duckling, Erik Satie, and Felix Mendelssohn were used for the genre classification test. Songs by the Beatles, Ray Charles, James Taylor, Boys 2 Men, Pink Floyd, Creedance Clearwater Revival, Boonie Tyler, the Animals, Madonna, Kylie Minoge, Tina Turner, Grand Funk Railroad, Bob Dylan, Bob Marley, and Eric Clapton were used for the audio fingerprinting test. 8. REFERENCES 1. S. Lyu, D. Rockmore, H. Farid. “A Digital Technique for Art Authentication.” Proc Natl Acad Sci USA. 101(49): 17006-10. 2004. 2. E. S. Parris, M. J. Carey, H. Lloyd-Thomas, “A comparison of features for speech, music discrimination.” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. 1:149-152. 1999. 3. S. Pfeiffer, S. Fischer, W. Effelsberg, “Automatic audio content analysis.” Proc. of 4th ACM Multimedia Conference. 1:21-30. 1996. 4. E. Wold, T. Blum, D. Keislar, J. Wheaton. “Content-Based Classification, search, and Retrieval of Audio.” IEEE Multimedia. 3(3):27-36. 1996.
  • 8. Joel Martin Advances in Multimedia: An International Journal (AMIJ), Volume (4) : Issue (1) : 2013 8 5. G. Tzanetakis, P. Cook. “Music genre classification of audio signals.” IEEE Transactions on Speech and Audio Processing. 10(5):293-302. 2002. 6. L. Lu, H. Jiang, H. J. Zhang. “A robust audio classification and segmentation method.” Proc. 9th ACM Int. Conf. Multimedia. 1:203--211. 2001. 7. T. Li, G. Tzanetakis. “Factors in automatic musical genre classification of audio signals.” Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). 2003. 8. M. Shan, F. Kuo. “Music style mining and classification by melody.” Proc. IEEE International Conference on Multimedia and Expo. 1:97-100. 2002. 9. W. Chai, B. Vercoe. “Folk music classification using hidden Markov models.” Proc International Conference on Artificial Intelligence. 2001. 10. B. Matityaho, M. Furst. “Neural network based model for classification of music type.” Proc. 18th Conv. Electrical and Electronic Engineers in Israel. 4.3.4/1-5. 1995. 11. L. Endelt, A. Cour-Harbo, “Wavelets for sparse representation of music.” Proc of Wedelmusic. 2004. 12. T. Lambrou, P. Kudumakis, R. Speller, M. Sandler, A. Linney. "Classification of audio signals using statistical features on time and wavelet transform domains.” Proc. IEEE ICASSP. 6:3621- 24. 1998 13. T. Cox, M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994