Hello Cleveland!



Linked Data Publication of Live Music Archives
      Sean Bechhofer*, Kevin Page+, David De Roure+
    *School of Computer Science, University of Manchester
       +Oxford eResearch Centre, University of Oxford


                      @seanbechhofer

             DMRN+7, QMUL, December 2012
The Proposition
๏ Publication of structured metadata describing an audio
  collection

๏ Links to external resources provide additional context
  and information

๏ Rich query to allow the extraction of “interesting”
  subcollections




                                                           2
The Players
• The Internet Archive Live Music Archive
  ✦
      Community contributed live audio recordings


• Semantic Technologies
  ✦
      RDF, Ontologies, SPARQL and Linked Data


• Additional resources
  ✦
      Artist DBs, Geographical Information,Venue information, etc.

• Some ruby scripts.....


                                                                     3
The etree Collection
• Internet Archive Live Music Archive
• Community contributed live performance recordings
  ✦
          “Legal bootlegs”
• Approx 4,000 artists,
  ✦
          100,000 performances
• Why is it interesting?
  ✦
          Audio available in various formats
      ✤
            mp3, ogg, shn, flac....
  ✦
          Multiple performances by artists
  ✦
          Cover versions


                                                      4
Semantic Technologies
• Semantic Technologies aim to provide structured, machine
  readable representations of content
  ✦
      Unified frameworks for (meta)data


• RDF: Resource Description Framework
  ✦
      Triple based representation of information
• OWL/SKOS: Ontologies & Vocabularies for content description
  ✦
      Shared vocabularies plus definitional capabilities
• SPARQL
  ✦
      A query language for RDF data
  ✦
      A generic API

                                                                5
Semantic Technologies
                    RDF                       OWL/SKOS
•       Triple Based Representation   • Shared Vocabularies for
•       Common Data Model               content description
•       Identification via URIs         ✦
                                           Facilitating interoperation and
                                           exchange
•       Easy Integration               ✦
                                           Everybody talks the same
    ✦
         Graph Merging                     language
                                      • OWL allows for rich
• Query via SPARQL                      expressions and definitions
         A flexible, generic API
                                      • SKOS supports simpler
    ✦




                                        thesauri/controlled
                                        vocabularies
                                                                             6
Linked Data
• A set of common principles for data publication

    1.   Use URIs for identification
    2.   Use HTTP URIs (that will dereference)
    3.   Return useful information when dereferenced
    4.   Include links in that information

• Common infrastructure facilitates construction of applications.
• Use of content negotiation to supply “appropriate”
  representations

                                                                    7
Linked Data Resources
• MusicBrainz
  ✦
      RDF conversions of MusicBrainz data
• Geonames
  ✦
      Information about locations
• DBpedia
  ✦
      Structured representation of Wikipedia content
• BBC
  ✦
      Programme information, artist information




                                                       8
Data mangling
• Download of etree metadata files
• Simple data conversion
  ✦
      XML to RDF
  ✦
      etree data model
• Alignments
  ✦
      String matching plus bespoke
      methods for locations
  ✦
      Explicit capture of alignments
• Publication Infrastructure
  ✦
      fuseki server + pubby front end



                                        9
Modelling




Music Ontology
Event Ontology
                             10
Data Alignment
• MusicBrainz
  ✦
      Artist alignment via simple name queries


• Geographical Locations
  ✦
      Query against Geonames
  ✦
      Query against last.fm
  ✦
      Combination of string matching and lat/long




                                                    11
Layering
• Alignments are captured in an additional layer of data on top of
  the underlying source facts
• Preserving original metadata
      Allows clients to make their own judgements
                                                    sameAs
  ✦


  ✦
      Preserves subjectivity
• Explicitly exposing the source of the mappings
  ✦
      Use of Provenance vocabularies




                                                                 12
Modelling



Similarity Ontology




                                  13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Big Picture




              28
Discussion
• So far entirely metadata based
  ✦
      No processing of underlying audio
• Alignment is a little messy
  ✦
      But has to be automated
• Dataset itself is an interesting artefact
  ✦
      Contrasts with some other LD activities.
• Is this actually useful?


             Do artists really get a better reception when
                    they play in their home town?

                                                             29
The Future
• Better alignment
  ✦
      Beyond simple string queries
• More alignment
  ✦
      Adding in, e.g. MusicBrainz track/work resources
  ✦
      Other collections?
  ✦
      Modelling questions
• Characterising Alignments
• Audio Fingerprinting
  ✦
      Identifying further track level matches
• Crowdsourcing corrections
• Extracting subcollections
  ✦
      What would you want??
                                                         30
Thanks! You’ve been a
   great audience!




http://etree.linkedmusic.org
                               31

More Related Content

PPTX
Implementing a Corpus for Sinhala Language
PPTX
Sinmin Literature Review Presentation
PPTX
Sinmin final presentation
PDF
Sep 2012 HUG: Apache Drill for Interactive Analysis
PDF
Data at Spotify
PDF
Scientific Social Objects
PDF
OeRC Seminar
PDF
OAI7 Research Objects
Implementing a Corpus for Sinhala Language
Sinmin Literature Review Presentation
Sinmin final presentation
Sep 2012 HUG: Apache Drill for Interactive Analysis
Data at Spotify
Scientific Social Objects
OeRC Seminar
OAI7 Research Objects

Similar to Linked Data Publication of Live Music Archives (20)

ODP
Towards a musical Semantic Web
PPTX
Introduction to APIs and Linked Data
PDF
20110728 datalift-rpi-troy
PDF
Drupal case study: ABC Dig Music
PPTX
Mashed Up Playlist
PPT
Pragmatic Approaches to the Semantic Web
PPT
Developing A Semantic Web Application - ISWC 2008 tutorial
PDF
Some news about the SW
PPTX
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
PPTX
Semantic Web and Related Work at W3C
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PDF
The state of the art in Linked Data
PDF
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
PDF
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
PDF
The VRC Project
PDF
REST and Linked Data: a match made for domain driven development?
PPT
Introduction to Semantic Web for GIS Practitioners
PPT
Aplicații Web Semantice - Descriere Proiect
PDF
PDF
Sharing data on the web (2013)
Towards a musical Semantic Web
Introduction to APIs and Linked Data
20110728 datalift-rpi-troy
Drupal case study: ABC Dig Music
Mashed Up Playlist
Pragmatic Approaches to the Semantic Web
Developing A Semantic Web Application - ISWC 2008 tutorial
Some news about the SW
Semantic Media Project Introduction - Mark Sandler (Barbican Arts Centre, Oct...
Semantic Web and Related Work at W3C
Usage of Linked Data: Introduction and Application Scenarios
The state of the art in Linked Data
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
The VRC Project
REST and Linked Data: a match made for domain driven development?
Introduction to Semantic Web for GIS Practitioners
Aplicații Web Semantice - Descriere Proiect
Sharing data on the web (2013)
Ad

More from seanb (9)

PDF
Linked Data Publication of Live Music Archives and Analyses
PDF
Animation 14: Computer Science and Music
PPTX
Metadata for Research Objects
PPTX
Research Objects @ HARMONY 2014
PPT
RO Advisory Kickoff Slides
PDF
Ontologies and Vocabularies
PDF
FISHLink Presentation at JISC MRD Workshop
PPT
SKOS, Past, Present and Future
PPT
Semantic Web for Multimedia
Linked Data Publication of Live Music Archives and Analyses
Animation 14: Computer Science and Music
Metadata for Research Objects
Research Objects @ HARMONY 2014
RO Advisory Kickoff Slides
Ontologies and Vocabularies
FISHLink Presentation at JISC MRD Workshop
SKOS, Past, Present and Future
Semantic Web for Multimedia
Ad

Linked Data Publication of Live Music Archives

  • 1. Hello Cleveland! Linked Data Publication of Live Music Archives Sean Bechhofer*, Kevin Page+, David De Roure+ *School of Computer Science, University of Manchester +Oxford eResearch Centre, University of Oxford @seanbechhofer DMRN+7, QMUL, December 2012
  • 2. The Proposition ๏ Publication of structured metadata describing an audio collection ๏ Links to external resources provide additional context and information ๏ Rich query to allow the extraction of “interesting” subcollections 2
  • 3. The Players • The Internet Archive Live Music Archive ✦ Community contributed live audio recordings • Semantic Technologies ✦ RDF, Ontologies, SPARQL and Linked Data • Additional resources ✦ Artist DBs, Geographical Information,Venue information, etc. • Some ruby scripts..... 3
  • 4. The etree Collection • Internet Archive Live Music Archive • Community contributed live performance recordings ✦ “Legal bootlegs” • Approx 4,000 artists, ✦ 100,000 performances • Why is it interesting? ✦ Audio available in various formats ✤ mp3, ogg, shn, flac.... ✦ Multiple performances by artists ✦ Cover versions 4
  • 5. Semantic Technologies • Semantic Technologies aim to provide structured, machine readable representations of content ✦ Unified frameworks for (meta)data • RDF: Resource Description Framework ✦ Triple based representation of information • OWL/SKOS: Ontologies & Vocabularies for content description ✦ Shared vocabularies plus definitional capabilities • SPARQL ✦ A query language for RDF data ✦ A generic API 5
  • 6. Semantic Technologies RDF OWL/SKOS • Triple Based Representation • Shared Vocabularies for • Common Data Model content description • Identification via URIs ✦ Facilitating interoperation and exchange • Easy Integration ✦ Everybody talks the same ✦ Graph Merging language • OWL allows for rich • Query via SPARQL expressions and definitions A flexible, generic API • SKOS supports simpler ✦ thesauri/controlled vocabularies 6
  • 7. Linked Data • A set of common principles for data publication 1. Use URIs for identification 2. Use HTTP URIs (that will dereference) 3. Return useful information when dereferenced 4. Include links in that information • Common infrastructure facilitates construction of applications. • Use of content negotiation to supply “appropriate” representations 7
  • 8. Linked Data Resources • MusicBrainz ✦ RDF conversions of MusicBrainz data • Geonames ✦ Information about locations • DBpedia ✦ Structured representation of Wikipedia content • BBC ✦ Programme information, artist information 8
  • 9. Data mangling • Download of etree metadata files • Simple data conversion ✦ XML to RDF ✦ etree data model • Alignments ✦ String matching plus bespoke methods for locations ✦ Explicit capture of alignments • Publication Infrastructure ✦ fuseki server + pubby front end 9
  • 11. Data Alignment • MusicBrainz ✦ Artist alignment via simple name queries • Geographical Locations ✦ Query against Geonames ✦ Query against last.fm ✦ Combination of string matching and lat/long 11
  • 12. Layering • Alignments are captured in an additional layer of data on top of the underlying source facts • Preserving original metadata Allows clients to make their own judgements sameAs ✦ ✦ Preserves subjectivity • Explicitly exposing the source of the mappings ✦ Use of Provenance vocabularies 12
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 29. Discussion • So far entirely metadata based ✦ No processing of underlying audio • Alignment is a little messy ✦ But has to be automated • Dataset itself is an interesting artefact ✦ Contrasts with some other LD activities. • Is this actually useful? Do artists really get a better reception when they play in their home town? 29
  • 30. The Future • Better alignment ✦ Beyond simple string queries • More alignment ✦ Adding in, e.g. MusicBrainz track/work resources ✦ Other collections? ✦ Modelling questions • Characterising Alignments • Audio Fingerprinting ✦ Identifying further track level matches • Crowdsourcing corrections • Extracting subcollections ✦ What would you want?? 30
  • 31. Thanks! You’ve been a great audience! http://etree.linkedmusic.org 31