SlideShare a Scribd company logo
USING SEARCH ENGINES FOR
 CLASSIFICATION: DOES IT
       STILL WORK?
    Sten Govaerts, Nik Corthaut, Erik Duval
• Our   problem

• Classification   using search engines

• The   setup

• The   evaluation

• Conclusion
TUNIFY
TUNIFY
TUNIFY
HOW DOES IT WORK?

• manually   annotated metadata

•5 music experts at Aristo Music and
 different consultants

• almost   80,000 songs

• but, not   enough...
PROBLEMS

• satisfying
         the music choice of all
 customers

  • retail
         and catering differ from you
    and me!

• new   markets

• react   fast on emerging music trends

• adding     the full Belgian library catalog
GENERATE THE METADATA

• from    different sources:

  • the audio signal
  • web sources
  • the Aristo database
  • attention metadata

• using   our metadata generation framework: SamgI
GENRE...




• our   master thesis looked at different ways to generate genre...
ONE APPROACH...

• M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning
                                                   and
 Visualizing Music Genres by Web-based Co-occurrence
 Analysis”, Proceedings of the 7th International Conference on
 Music Information Retrieval, 2006, pp. 260-265.

• G. Geleijnse, J. Korst, "Web-based Artist
                                          Categorization",
 Proceedings of the 7th International Conference on Music
 Information Retrieval, 2006, pp. 266 - 271.
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
CLASSIFICATION WITH
  SEARCH ENGINES
     using co-occurrence

   Artist + Genre + Schema
Using search engines for classification: does it still work?
Rock:      Jazz:




Blues:      Pop:




Country:   Metal:
Rock:              Jazz:




           0,013            0,013
Blues:              Pop:



           0,009            0,015
Country:           Metal:



           0,009            0,005
RESULTS

• master   thesis student’s results were much worse

• what   happened?

 • did   Google search result count change?

 • has   Google Search API different results?

 • is   the student’s implementation correct?
HOW TO EVALUATE THIS?


• re-run   the original experiment

  • evaluate   on the same data set: 1995 artists and 9 genres.

• different   search engines: Google,Yahoo! and Live! Search.

• over   time: 8 times over a period of 36 days.
THE DATA SET
       Blues   Country   Electronic
       Folk    Jazz      Metal
       Rap     Reggae    RnB
THE DATA SET
                  Blues   Country   Electronic
                  Folk    Jazz      Metal
                  Rap     Reggae    RnB
     10% 9%
   3%
  2%        12%
13%          5%
             4%


      41%
THE DATA SET
       Blues   Country   Electronic
       Folk    Jazz      Metal
       Rap     Reggae    RnB
Using search engines for classification: does it still work?
MOTION CHART



• http://hmdb.cs.kuleuven.be/muzik/gapminder.html
Using search engines for classification: does it still work?
Using search engines for classification: does it still work?
MORE FINE-GRAINED...

• 18   artists

• more  search engines: Google.co.uk/.fr/.be, uk/
 fr.search.yahoo.com

• twice   a day for 53 days

• 250,000    queries!
2 Pac            Rap
  Alan Lomax         Folk
  Art Pepper          Jazz
 Cradle of Filth     Metal
 David Parsons     Electronic
Desmond Dekker      Reggae
  Downpour           Metal
      IceT            Rap
  Jerry Butler       RnB
 Joy Lynn White    Country
 Louisiana Red       Blues
   Lou Rawls         RnB
   LTJ Bukem       Electronic
   Peter Tosh       Reggae
 Pinetop Smith        Jazz
Robert Johnson       Blues
  Roy Rogers       Country
 Steeleye Span       Folk
Using search engines for classification: does it still work?
MAIN SEARCH ENGINE
      RESULTS
REGIONAL GOOGLES
Using search engines for classification: does it still work?
WHAT TO USE?

• use   Google when it’s stable else rely on Yahoo!

• when    is it stable? test with a small set

  • some    artists get classified incorrectly on bad days

  • compare     the accuracy achieved with the test set to the
   average.
CONCLUSION

• still   works after 3 years

• Google      -> Yahoo! -> Live! Search

• why     does Google fluctuate?

•a generic version of an all purpose classifier is implemented in
  metadata generation framework
FUTURE WORK

• understand the performance
 differences of regional search
 engines

• use   alternative search engines

• tweak
      the genre taxonomy
 depending on the search engine
Q & A.
DEMO METADATA
              GENERATION



• http://ariadne.cs.kuleuven.be/samgi-service/

More Related Content

PDF
[JAM 2.0] Music API (Paul Malikov)
PDF
Project overview eng
PDF
Jason C. Harris Resume
PDF
FindStream investor deck
PDF
Music Recommendation 2018
PDF
Why schema.org
PPTX
رياضيات ... الاعداد الاولية
PDF
Texture classification based on overlapped texton co occurrence matrix (otcom...
[JAM 2.0] Music API (Paul Malikov)
Project overview eng
Jason C. Harris Resume
FindStream investor deck
Music Recommendation 2018
Why schema.org
رياضيات ... الاعداد الاولية
Texture classification based on overlapped texton co occurrence matrix (otcom...

Viewers also liked (9)

PDF
Coclustering Base Classification For Out Of Domain Documents
 
PPT
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
PDF
Search Engine, SEO and Google Algorithms
PPT
تعرف على برامج معالجة النصوص
PPTX
PPT
From ISO to Implementation A framework for ECM Implementation
PDF
Grey-level Co-occurence features for salt texture classification
PPTX
Machine Learning and Quran - The Meccan and Medinan Verses
PPTX
Sentiment analysis using naive bayes classifier
Coclustering Base Classification For Out Of Domain Documents
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
Search Engine, SEO and Google Algorithms
تعرف على برامج معالجة النصوص
From ISO to Implementation A framework for ECM Implementation
Grey-level Co-occurence features for salt texture classification
Machine Learning and Quran - The Meccan and Medinan Verses
Sentiment analysis using naive bayes classifier
Ad

Similar to Using search engines for classification: does it still work? (20)

KEY
Using mashup technology to improve findability
PDF
J-P. Fauconnier, J. Roumier. Musonto - A Semantic Search Engine Dedicated to ...
PDF
Hypergraph Models of Playlist Dialects
PPT
Musicovery in B2B
PDF
Music SEO - 7 Lessons in Brand Optimization for 2015
PDF
Understand Genre Popularity Using Netflix Data Scraping.pdf
PPTX
Understand Genre Popularity Using Netflix Data Scraping.pptx
PPTX
web based music genre classification.pptx
PDF
Artist popularity: do web and social music services agree?
PDF
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
PPTX
Genre Classification and Analysis
PDF
Entity Spotting in Informal Text
PDF
PDF
From Power Chord to the Power of Models - Oredev
PDF
frestyl presentation for ChItaly2011
PPTX
Research on music genre and video
PDF
Big Data in the Music Industries, Dagfinn Bach, Bach Technology
PPTX
Research into rock music
PPTX
Genre research
PPTX
Using mashup technology to improve findability
J-P. Fauconnier, J. Roumier. Musonto - A Semantic Search Engine Dedicated to ...
Hypergraph Models of Playlist Dialects
Musicovery in B2B
Music SEO - 7 Lessons in Brand Optimization for 2015
Understand Genre Popularity Using Netflix Data Scraping.pdf
Understand Genre Popularity Using Netflix Data Scraping.pptx
web based music genre classification.pptx
Artist popularity: do web and social music services agree?
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
Genre Classification and Analysis
Entity Spotting in Informal Text
From Power Chord to the Power of Models - Oredev
frestyl presentation for ChItaly2011
Research on music genre and video
Big Data in the Music Industries, Dagfinn Bach, Bach Technology
Research into rock music
Genre research
Ad

More from Sten Govaerts (20)

PDF
Learning Analytics at KULeuven by the team of Erik Duval
PDF
The Smart Device Specification for Remote Labs
PDF
Speakup - in NordiCHI2014 Personal or Social workshop.
PDF
The Go-Lab project at the REACT Research Days
PDF
Learning Analytics Dashboards
PDF
Towards an online lab portal for inquiry-based STEM learning at school.
PDF
SpeakUp – A Mobile App Facilitating Audience Interaction
PDF
The Go-Lab portal
PDF
The Student Activity Meter
PDF
Learning Dashboards & Learnscapes
KEY
An introduction to Git.
PDF
Quantified Self in the Multimedia course.
PDF
Learning Analytics & Learnscapes.
PDF
From Findability to Awareness: Metadata in Music and Technology Enhanced Lear...
PDF
The Student Activity Meter for Awareness and Self-reflection
PDF
Learning Dashboards and Learnscapes
PDF
The Student Activity Meter for Awareness and Self-reflection
PDF
Evaluating the Student Activity Meter: Two Case Studies.
KEY
Towards Responsive Open Learning Environments: the ROLE Interoperability Fram...
KEY
Winter School defense simulation: Visualizing Activities for Self-reflection ...
Learning Analytics at KULeuven by the team of Erik Duval
The Smart Device Specification for Remote Labs
Speakup - in NordiCHI2014 Personal or Social workshop.
The Go-Lab project at the REACT Research Days
Learning Analytics Dashboards
Towards an online lab portal for inquiry-based STEM learning at school.
SpeakUp – A Mobile App Facilitating Audience Interaction
The Go-Lab portal
The Student Activity Meter
Learning Dashboards & Learnscapes
An introduction to Git.
Quantified Self in the Multimedia course.
Learning Analytics & Learnscapes.
From Findability to Awareness: Metadata in Music and Technology Enhanced Lear...
The Student Activity Meter for Awareness and Self-reflection
Learning Dashboards and Learnscapes
The Student Activity Meter for Awareness and Self-reflection
Evaluating the Student Activity Meter: Two Case Studies.
Towards Responsive Open Learning Environments: the ROLE Interoperability Fram...
Winter School defense simulation: Visualizing Activities for Self-reflection ...

Recently uploaded (20)

PDF
Classroom Observation Tools for Teachers
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
Empowerment Technology for Senior High School Guide
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Cell Types and Its function , kingdom of life
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Trump Administration's workforce development strategy
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Complications of Minimal Access Surgery at WLH
Classroom Observation Tools for Teachers
Digestion and Absorption of Carbohydrates, Proteina and Fats
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Indian roads congress 037 - 2012 Flexible pavement
Empowerment Technology for Senior High School Guide
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Final Presentation General Medicine 03-08-2024.pptx
Unit 4 Skeletal System.ppt.pptxopresentatiom
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Weekly quiz Compilation Jan -July 25.pdf
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Cell Types and Its function , kingdom of life
LDMMIA Reiki Yoga Finals Review Spring Summer
Trump Administration's workforce development strategy
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Complications of Minimal Access Surgery at WLH

Using search engines for classification: does it still work?

  • 1. USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT STILL WORK? Sten Govaerts, Nik Corthaut, Erik Duval
  • 2. • Our problem • Classification using search engines • The setup • The evaluation • Conclusion
  • 6. HOW DOES IT WORK? • manually annotated metadata •5 music experts at Aristo Music and different consultants • almost 80,000 songs • but, not enough...
  • 7. PROBLEMS • satisfying the music choice of all customers • retail and catering differ from you and me! • new markets • react fast on emerging music trends • adding the full Belgian library catalog
  • 8. GENERATE THE METADATA • from different sources: • the audio signal • web sources • the Aristo database • attention metadata • using our metadata generation framework: SamgI
  • 9. GENRE... • our master thesis looked at different ways to generate genre...
  • 10. ONE APPROACH... • M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning and Visualizing Music Genres by Web-based Co-occurrence Analysis”, Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 260-265. • G. Geleijnse, J. Korst, "Web-based Artist Categorization", Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 266 - 271.
  • 11. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence
  • 12. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence
  • 13. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
  • 14. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
  • 15. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
  • 16. CLASSIFICATION WITH SEARCH ENGINES using co-occurrence Artist + Genre + Schema
  • 18. Rock: Jazz: Blues: Pop: Country: Metal:
  • 19. Rock: Jazz: 0,013 0,013 Blues: Pop: 0,009 0,015 Country: Metal: 0,009 0,005
  • 20. RESULTS • master thesis student’s results were much worse • what happened? • did Google search result count change? • has Google Search API different results? • is the student’s implementation correct?
  • 21. HOW TO EVALUATE THIS? • re-run the original experiment • evaluate on the same data set: 1995 artists and 9 genres. • different search engines: Google,Yahoo! and Live! Search. • over time: 8 times over a period of 36 days.
  • 22. THE DATA SET Blues Country Electronic Folk Jazz Metal Rap Reggae RnB
  • 23. THE DATA SET Blues Country Electronic Folk Jazz Metal Rap Reggae RnB 10% 9% 3% 2% 12% 13% 5% 4% 41%
  • 24. THE DATA SET Blues Country Electronic Folk Jazz Metal Rap Reggae RnB
  • 29. MORE FINE-GRAINED... • 18 artists • more search engines: Google.co.uk/.fr/.be, uk/ fr.search.yahoo.com • twice a day for 53 days • 250,000 queries!
  • 30. 2 Pac Rap Alan Lomax Folk Art Pepper Jazz Cradle of Filth Metal David Parsons Electronic Desmond Dekker Reggae Downpour Metal IceT Rap Jerry Butler RnB Joy Lynn White Country Louisiana Red Blues Lou Rawls RnB LTJ Bukem Electronic Peter Tosh Reggae Pinetop Smith Jazz Robert Johnson Blues Roy Rogers Country Steeleye Span Folk
  • 35. WHAT TO USE? • use Google when it’s stable else rely on Yahoo! • when is it stable? test with a small set • some artists get classified incorrectly on bad days • compare the accuracy achieved with the test set to the average.
  • 36. CONCLUSION • still works after 3 years • Google -> Yahoo! -> Live! Search • why does Google fluctuate? •a generic version of an all purpose classifier is implemented in metadata generation framework
  • 37. FUTURE WORK • understand the performance differences of regional search engines • use alternative search engines • tweak the genre taxonomy depending on the search engine
  • 39. DEMO METADATA GENERATION • http://ariadne.cs.kuleuven.be/samgi-service/

Editor's Notes

  • #8: NOT the Southern African Media and Gender Institute.
  • #22: 1. MG is better than MS, a possible explanation is that style is a broader term than genre for music 2. Google outperforms Yahoo! & Live! 3. results fluctuate over time 4. technical issues with Yahoo! only a fraction of the artists are retrieved
  • #28: 1. the accuracy is not the exactly the same as for the large data set. but the overall trends are similar. 2. MG schema is still more accurate 3. Yahoo! MG is a very stable 4. Live! is still the worst and Google the best!
  • #29: 1. Yahoo! is very stable 2. Live is the worst, Google the best! 3. no noticable differences between Live and Bing. Bing was launched on 3 June. 4. On 29 July, collaboration between Bing and Yahoo
  • #30: 1. .com performs best! -> co.uk -> fr -> be 2. fr and be worse: maybe because genres are in english 3. one could also check if local artists are classified better
  • #31: correct: light incorrect: dark 1. yahoo most stable 2. google changes most often. 3. changing from correct to incorrect occurs most, but no clear pattern 4. Live seems to struggle with the same artists, one time they do it correctly, the next time wrong.