SlideShare a Scribd company logo
Freebase
A socially managed semantic database



Jamie Taylor
SemTech 2010 Data Camp
Freebase - Semantic Technologies 2010 Code Camp
Freebase has Many Types of Things
12 Million Topics
Freebase - Semantic Technologies 2010 Code Camp
A Multiplicity Strong Identifiers

            http://rdf.freebase.com/ns/en.berlin_wall




            http://www.ellerdale.com/topics/view/0080-6ba0




            http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c

                   http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c

http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
Relations
contains
                          400 Million
           contained-by

                                  event               label
                                          albums

                            member-of
                                          member-of

           nationality

                           education
                                          education

                          contained-by
What’s in Freebase?
Freebase - Semantic Technologies 2010 Code Camp
http://www.bestbuy.com/site/She+Wolf…

              http://www.daylife.com/topic/Shakira

                         http://twitter.com/shakira

                  http://www.facebook.com/shakira

                  http://www.myspace.com/shakira

                  http://www.last.fm/music/Shakira

http://www.netflix.com/RoleDisplay/Shakira/20046629

          http://www.guardian.co.uk/music/shakira
99% pure

All data undergoes rigorous QA before load
Major focus is reconciliation
Use sampling to assure 99% accuracy
Data that does not meet 99% accuracy is not loaded
What's been built on Freebase?
Up to 100,000 Queries a Day




 Quarterly dumps of graph
    http://download.freebase.com
Freebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code Camp
Users contribute data




Users extend the data model
The Freebase Commons
                      Top-level domains
                      ·American football       ·Internet
                      ·Anime/Manga             ·Language
                      ·Architecture            ·Law
                      ·Astronomy               ·Library
                      ·Automotive              ·Location
                      ·Aviation                ·Martial Arts
                      ·Awards                  ·Measurement Unit
                      ·Baseball                ·Media Common
                      ·Basketball              ·Medicine
                      ·Bicycles                ·Metaweb Types
                      ·Biology                 ·Meteorology
                      ·Boats                   ·Military
                      ·Broadcast               ·Music
                      ·Business                ·Olympics
                      ·Celebrities             ·Opera
                      ·Chemistry               ·Organization
                      ·Comics                  ·People
                      ·Common                  ·Geography
                      ·Computers               ·Projects
                      ·Conferences             ·Protected Places
                      ·Cricket                 ·Publishing
                      ·Data World              ·Radio
                      ·Digicams                ·Rail
                      ·Education               ·Religion
                      ·Engineering             ·Royalty
                      ·Event                   ·Soccer
                      ·Clothing and Textiles   ·Spaceflight
                      ·Fictional Universes     ·Sports
                      ·Film                    ·Symbols
                      ·Food & Drink            ·Tennis
                      ·Freebase                ·Theater
                      ·Games                   ·Time
                      ·Geology                 ·Transportation




schema = vocabulary
                      ·Government              ·Travel
                      ·Hobbies and Interests   ·TV
                      ·Ice Hockey              ·Video Games
                      ·Influence               ·Visual Art
The Scope of Schema
   10,448 Properties
      describing
     4,936 Types*
     organized into
     641 Domains
     (77 Commons)
            *types with 10 or more instances
Strength through Exemplars
                                                   Type Instances


            100,000,000


             10,000,000



                                                              >10 instances,
              1,000,000


               100,000
                                                              4936 types
Instances




                10,000


                  1,000
                                                              1424 Commons
                   100


                    10


                     1
                          0   1000   2000   3000   4000   5000    6000   7000   8000   9000   10000 11000
                                                                 Rank
Metaweb Query Language
      [{
           "name" : null,
           "type" : "/film/film"
      }]




               MQL
[{
     "name" : null,
     "type" : "/film/film",
     "directed_by":{"id":"/en/george_lucas"},
     "starring":[{
            "actor":{"id":"/en/harrison_ford"}
         }]
}]




                      MQL
[{
      "name" : null,
      "type" : "/film/film",
      "directed_by":{"id":"/en/george_lucas"},
      "starring": [{
          "actor": {
             "name": null,
             "film": [{
                 "film": {"id": "/en/the_great_escape"}
             }]
          }
     }]
}]


                     Donald Pleasence
                        THX 1138
Freebase Suggest
Reconciliation
        {
             "/type/object/name":"Blade Runner",
             "/type/object/type":"/film/film",
             "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"],
             "/film/film/director":"Ridley Scott",
             "/film/film/release_date_s":"1981"
         }
[{
     "id":"/guid/9202a8c04000641f8000000000009e89",
     "name":["Blade Runner", "Bladerunner"],
     "score":1.4320519,
     "match":true,
     "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/award_winning_work",
     ]},
 {
     "id":"/guid/9202a8c04000641f80000000002643d0",
     "name":["Blade"],
     "score":0.48852453,
     "match":false,
     "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/award_nominated_work",
     ]}

               http://data.labs.freebase.com/recon/
Topic Blocks
Topic API
         Shortcut to building Topic displays
         Two forms:
             basic (names, types, description)
             standard (basic + keys, properties)




http://www.freebase.com/experimental/topic/standard?id=/en/ncis
Geo Search API



Semantic              Spatial              Semantic




      http://www.freebase.com/docs/geosearch
Gridworks
Acre Development Environment
Getting Started++
•   Freebase Documentation Hub
    •   http://www.freebase.com/docs
•   Developer Mailing List
    •   http://lists.freebase.com/mailman/listinfo/freebase-discuss
    •   http://freebase.markmail.org
•   Real Time help on IRC
    •   Freenode #freebase
•   Freebase Happenings
    •   http://blog.freebase.com
•   About the Graph Store
    •   Google: "ACM SIGMOD schema last tuple store"

More Related Content

PDF
Geo Location Semantics
PDF
Freebase, RDF and the Semantic Web
PDF
Deconstructing freebase
PDF
Freebase Schema
PDF
The Europeana Datamodel: A semantic layer on top of Cultural Heritage Objects
PDF
NYC Semantic Web Meetup - Aug 2009
PDF
Social Fabric of Semantics - SemTech 2010
PPT
3 Tier Architecture
Geo Location Semantics
Freebase, RDF and the Semantic Web
Deconstructing freebase
Freebase Schema
The Europeana Datamodel: A semantic layer on top of Cultural Heritage Objects
NYC Semantic Web Meetup - Aug 2009
Social Fabric of Semantics - SemTech 2010
3 Tier Architecture

Similar to Freebase - Semantic Technologies 2010 Code Camp (19)

PDF
Freebase API @ HackTO 2
PDF
Text Analytic Summit 2010
PDF
Real-time Semantic Web with Twitter Annotations
ZIP
ServerSide Javascript on Freebase - SF JavaScript meetup #9
KEY
YQL:: Select * from Internet
PDF
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
PPTX
ChContext
KEY
YQL: Select * from Internet
KEY
Ruby Kaigi July 2009 Tokyo (Japanese)
PDF
yourHistory - entity linking for a personalized timeline of historic events
PPTX
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
PDF
R, Data Wrangling & Kaggle Data Science Competitions
PPT
How Brands Can Survive & Thrive Online - Digital Evolution
PPTX
Sounddogsppt
PDF
A Training & Simulation Perspective on Maritime Information & Automation
PPTX
Looking at Content Recommendations through a Search Lens - Extended Version
ZIP
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
PDF
COMP 4010 - Lecture 7: Introduction to Augmented Reality
Freebase API @ HackTO 2
Text Analytic Summit 2010
Real-time Semantic Web with Twitter Annotations
ServerSide Javascript on Freebase - SF JavaScript meetup #9
YQL:: Select * from Internet
The NoTube BeanCounter: Aggregating User Data for Television Programme Recomm...
ChContext
YQL: Select * from Internet
Ruby Kaigi July 2009 Tokyo (Japanese)
yourHistory - entity linking for a personalized timeline of historic events
Iccv2009 recognition and learning object categories p3 c00 - summary and da...
R, Data Wrangling & Kaggle Data Science Competitions
How Brands Can Survive & Thrive Online - Digital Evolution
Sounddogsppt
A Training & Simulation Perspective on Maritime Information & Automation
Looking at Content Recommendations through a Search Lens - Extended Version
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
COMP 4010 - Lecture 7: Introduction to Augmented Reality
Ad

More from Jamie Taylor (7)

PDF
The next phase of Web2.0: Data
PDF
Public private-cloud
PDF
Using Semantics to Enhance Content
PDF
Freebase Workshop, December 2009
PDF
Using Semantics to Enhance Content Publishing
PDF
ISWC 2009 Consuming LOD
PDF
Drupal and the Semantic Web
The next phase of Web2.0: Data
Public private-cloud
Using Semantics to Enhance Content
Freebase Workshop, December 2009
Using Semantics to Enhance Content Publishing
ISWC 2009 Consuming LOD
Drupal and the Semantic Web
Ad

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Empathic Computing: Creating Shared Understanding
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
sap open course for s4hana steps from ECC to s4
Empathic Computing: Creating Shared Understanding

Freebase - Semantic Technologies 2010 Code Camp

  • 1. Freebase A socially managed semantic database Jamie Taylor SemTech 2010 Data Camp
  • 3. Freebase has Many Types of Things
  • 6. A Multiplicity Strong Identifiers http://rdf.freebase.com/ns/en.berlin_wall http://www.ellerdale.com/topics/view/0080-6ba0 http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
  • 7. Relations contains 400 Million contained-by event label albums member-of member-of nationality education education contained-by
  • 10. http://www.bestbuy.com/site/She+Wolf… http://www.daylife.com/topic/Shakira http://twitter.com/shakira http://www.facebook.com/shakira http://www.myspace.com/shakira http://www.last.fm/music/Shakira http://www.netflix.com/RoleDisplay/Shakira/20046629 http://www.guardian.co.uk/music/shakira
  • 11. 99% pure All data undergoes rigorous QA before load Major focus is reconciliation Use sampling to assure 99% accuracy Data that does not meet 99% accuracy is not loaded
  • 12. What's been built on Freebase?
  • 13. Up to 100,000 Queries a Day Quarterly dumps of graph http://download.freebase.com
  • 16. Users contribute data Users extend the data model
  • 17. The Freebase Commons Top-level domains ·American football ·Internet ·Anime/Manga ·Language ·Architecture ·Law ·Astronomy ·Library ·Automotive ·Location ·Aviation ·Martial Arts ·Awards ·Measurement Unit ·Baseball ·Media Common ·Basketball ·Medicine ·Bicycles ·Metaweb Types ·Biology ·Meteorology ·Boats ·Military ·Broadcast ·Music ·Business ·Olympics ·Celebrities ·Opera ·Chemistry ·Organization ·Comics ·People ·Common ·Geography ·Computers ·Projects ·Conferences ·Protected Places ·Cricket ·Publishing ·Data World ·Radio ·Digicams ·Rail ·Education ·Religion ·Engineering ·Royalty ·Event ·Soccer ·Clothing and Textiles ·Spaceflight ·Fictional Universes ·Sports ·Film ·Symbols ·Food & Drink ·Tennis ·Freebase ·Theater ·Games ·Time ·Geology ·Transportation schema = vocabulary ·Government ·Travel ·Hobbies and Interests ·TV ·Ice Hockey ·Video Games ·Influence ·Visual Art
  • 18. The Scope of Schema 10,448 Properties describing 4,936 Types* organized into 641 Domains (77 Commons) *types with 10 or more instances
  • 19. Strength through Exemplars Type Instances 100,000,000 10,000,000 >10 instances, 1,000,000 100,000 4936 types Instances 10,000 1,000 1424 Commons 100 10 1 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 Rank
  • 20. Metaweb Query Language [{ "name" : null, "type" : "/film/film" }] MQL
  • 21. [{ "name" : null, "type" : "/film/film", "directed_by":{"id":"/en/george_lucas"}, "starring":[{ "actor":{"id":"/en/harrison_ford"} }] }] MQL
  • 22. [{ "name" : null, "type" : "/film/film", "directed_by":{"id":"/en/george_lucas"}, "starring": [{ "actor": { "name": null, "film": [{ "film": {"id": "/en/the_great_escape"} }] } }] }] Donald Pleasence THX 1138
  • 24. Reconciliation { "/type/object/name":"Blade Runner", "/type/object/type":"/film/film", "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"], "/film/film/director":"Ridley Scott", "/film/film/release_date_s":"1981" } [{ "id":"/guid/9202a8c04000641f8000000000009e89", "name":["Blade Runner", "Bladerunner"], "score":1.4320519, "match":true, "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/award_winning_work", ]}, { "id":"/guid/9202a8c04000641f80000000002643d0", "name":["Blade"], "score":0.48852453, "match":false, "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/award_nominated_work", ]} http://data.labs.freebase.com/recon/
  • 26. Topic API Shortcut to building Topic displays Two forms: basic (names, types, description) standard (basic + keys, properties) http://www.freebase.com/experimental/topic/standard?id=/en/ncis
  • 27. Geo Search API Semantic Spatial Semantic http://www.freebase.com/docs/geosearch
  • 30. Getting Started++ • Freebase Documentation Hub • http://www.freebase.com/docs • Developer Mailing List • http://lists.freebase.com/mailman/listinfo/freebase-discuss • http://freebase.markmail.org • Real Time help on IRC • Freenode #freebase • Freebase Happenings • http://blog.freebase.com • About the Graph Store • Google: "ACM SIGMOD schema last tuple store"