SlideShare a Scribd company logo
Audio fingerprinting and metadata
     correction with Python

           Alastair Porter


         November 21, 2011
Me

     Background in Computer Science
     Masters McGill Music Tech
     Online
         http://github.com/alastair (20/28 music; 11 in python)
         http://twitter.com/alastairporter
Python as a go-to language

     Quick for prototyping
     Use the same code in a production release
     Very handy for API access (thin wrapper around urllib2)
Music and Metadata
Music and Metadata

  The problem:
      People are really bad at naming music
      Inconsistent over releases


  The solution:
      Crowdsourcing
      Get info from as many trusted sources as possible
      Make renaming take no effort
MusicBrainz
Amazon
Amazon (Coverart)
Last.fm
Last.fm (Genre tags)
MusicBrainz
albumidentify




  http://github.com/albumidentify/albumidentify
MP3, FLAC, Ogg, CDs
Identification strategy

      If there’s a CD TOC, use that (musicbrainz lookup)
      If no match, use audio fingerprinting
      If no match, do a text lookup (artist/album)
Fingerprinting

     Converts an audio signal to a short sequence of numbers
     Smaller to compare than an entire file
     Perceptual features rather than byte comparison (works
     with different encodings)
Identification strategy

      Fingerprinting gives us a set of candidate tracks
      A track could be on many albums (original release, best of,
      mix album)
      Keep a list of what tracks we have for each album
      Once we fill all the slots for an album, success!
Metadata strategy

     Text information from Musicbrainz
     Genre from last.fm
     Image from Amazon (or folder.jpg)
     Musicbrainz tells us where these are (don’t need to search)
     Save in every file (Text is cheap)
Writing it all out

      Custom MP3/ID3 writer
      Ogg meta tags
      FLAC meta tags
      Name files
          Artist/Artist - Year - Album/01 - Artist - Track
      Replaygain!
      Be a good citizen: Submit fingerprints to musicbrainz
What’s next

     New version of musicbrainz
     New fingerprinter
     More metadata
     More metadata
Thanks

  More information:
      MusicBrainz: http://musicbrainz.org
      albumidentify:
      http://github.com/albumidentify/albumidentify
      More fingerprinting: http://acoustid.org,
      http://echoprint.me
      Last.fm

More Related Content

PDF
With or Without UIDs
PDF
10 Famous Music Festivals Round The World
PPTX
Music apps
PDF
Music hack day
PPTX
Music and Sound Research
PPT
1337 - speech
PPTX
Plans for my digipak
DOCX
Props List
With or Without UIDs
10 Famous Music Festivals Round The World
Music apps
Music hack day
Music and Sound Research
1337 - speech
Plans for my digipak
Props List

What's hot (20)

PPTX
CFADW PRESENTATION(Music sampling in hip hop)
DOCX
Props List
PPTX
1. initial plans (js)
PPTX
Music Sampling in Hip Hop
PPTX
Assignment 53
KEY
Twitter bots I have known and loved
PPT
PPT
Podcasting Tips
PPT
Podcast Tutorial
PPTX
FCP #3 Importing Media
PPTX
Analysis of the mystery jets digi pack for the
PPT
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
PDF
Elvis Presley Cut Me And I Bleed 1999
PPTX
Project pronunciation game 1
PPTX
Sgp slideshow
PDF
Scott Slotnick Personal Persona
PPTX
File Naming Conventions and Creating Stems and Mixes
PPTX
Magazine names
PPTX
Music Horror Analysis
PPT
CFADW PRESENTATION(Music sampling in hip hop)
Props List
1. initial plans (js)
Music Sampling in Hip Hop
Assignment 53
Twitter bots I have known and loved
Podcasting Tips
Podcast Tutorial
FCP #3 Importing Media
Analysis of the mystery jets digi pack for the
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
Elvis Presley Cut Me And I Bleed 1999
Project pronunciation game 1
Sgp slideshow
Scott Slotnick Personal Persona
File Naming Conventions and Creating Stems and Mixes
Magazine names
Music Horror Analysis
Ad

Viewers also liked (7)

PDF
Mp25: Optical Music Recognition with Python
PDF
Mp24: The Bachelor, a facebook game
PDF
Mp24: Fabulous Mobile Development with and without Python
PDF
Mp26 : Connecting Startups with Talents
PDF
Mp25 Message Switching for Actor Based Designs
PDF
Mp26 : How do you Solve a Problem like Santa Claus?
PDF
Mp26 : Tachyon, sloppiness is bliss
Mp25: Optical Music Recognition with Python
Mp24: The Bachelor, a facebook game
Mp24: Fabulous Mobile Development with and without Python
Mp26 : Connecting Startups with Talents
Mp25 Message Switching for Actor Based Designs
Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : Tachyon, sloppiness is bliss
Ad

Similar to Mp25: Audio Fingerprinting and metadata correction with Python (20)

ODP
Social Tags and Music Information Retrieval (Part II)
PDF
Machine Learning for Creative AI Applications in Music (2018 May)
PPT
Copyright in music a lesson in heavy metal
PDF
Metadata for musicians: setting up release
PDF
Research at MAC Lab, Academia Sincia, in 2017
PDF
Introduction to Music Information Retrieval
PDF
Introduction to Music Information Retrieval
PPTX
Do Androids Dream Of Algorithmic Playlists
PDF
Music Personalization At Spotify
PPTX
What Are the Best Techniques for Scraping OTT Apps Using Python.pptx
PDF
What Are the Best Techniques for Scraping OTT Apps Using Python.pdf
PPT
Audio on the web
ODP
Annotating Music Collections: How Content-Based Similarity Helps to Propagate...
ODP
Towards a musical Semantic Web
PPT
Music mobile
PPTX
Audio format
PPTX
Mti presentation
PPTX
Mti presentation
DOCX
Ig2 task 1 work sheet
PPTX
Teaching Music Technology Concepts with Few Music Technology Resources
Social Tags and Music Information Retrieval (Part II)
Machine Learning for Creative AI Applications in Music (2018 May)
Copyright in music a lesson in heavy metal
Metadata for musicians: setting up release
Research at MAC Lab, Academia Sincia, in 2017
Introduction to Music Information Retrieval
Introduction to Music Information Retrieval
Do Androids Dream Of Algorithmic Playlists
Music Personalization At Spotify
What Are the Best Techniques for Scraping OTT Apps Using Python.pptx
What Are the Best Techniques for Scraping OTT Apps Using Python.pdf
Audio on the web
Annotating Music Collections: How Content-Based Similarity Helps to Propagate...
Towards a musical Semantic Web
Music mobile
Audio format
Mti presentation
Mti presentation
Ig2 task 1 work sheet
Teaching Music Technology Concepts with Few Music Technology Resources

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25-Week II
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf

Mp25: Audio Fingerprinting and metadata correction with Python

  • 1. Audio fingerprinting and metadata correction with Python Alastair Porter November 21, 2011
  • 2. Me Background in Computer Science Masters McGill Music Tech Online http://github.com/alastair (20/28 music; 11 in python) http://twitter.com/alastairporter
  • 3. Python as a go-to language Quick for prototyping Use the same code in a production release Very handy for API access (thin wrapper around urllib2)
  • 5. Music and Metadata The problem: People are really bad at naming music Inconsistent over releases The solution: Crowdsourcing Get info from as many trusted sources as possible Make renaming take no effort
  • 14. Identification strategy If there’s a CD TOC, use that (musicbrainz lookup) If no match, use audio fingerprinting If no match, do a text lookup (artist/album)
  • 15. Fingerprinting Converts an audio signal to a short sequence of numbers Smaller to compare than an entire file Perceptual features rather than byte comparison (works with different encodings)
  • 16. Identification strategy Fingerprinting gives us a set of candidate tracks A track could be on many albums (original release, best of, mix album) Keep a list of what tracks we have for each album Once we fill all the slots for an album, success!
  • 17. Metadata strategy Text information from Musicbrainz Genre from last.fm Image from Amazon (or folder.jpg) Musicbrainz tells us where these are (don’t need to search) Save in every file (Text is cheap)
  • 18. Writing it all out Custom MP3/ID3 writer Ogg meta tags FLAC meta tags Name files Artist/Artist - Year - Album/01 - Artist - Track Replaygain! Be a good citizen: Submit fingerprints to musicbrainz
  • 19. What’s next New version of musicbrainz New fingerprinter More metadata More metadata
  • 20. Thanks More information: MusicBrainz: http://musicbrainz.org albumidentify: http://github.com/albumidentify/albumidentify More fingerprinting: http://acoustid.org, http://echoprint.me Last.fm