SlideShare a Scribd company logo
It's not what you said,
             it's how you said it.
                         Jamie Taylor, Ph.D.




  Text Analytic Summit
      Boston 2010
What do y'all mean
  "Semantics"



                  The Web!
                  Now with
                 Better Flavor!
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Tim Berners-Lee, James Hendler
           and Ora Lassila   




May 2001
The Semantic Web?




   The Cake
      taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png
Linked Open Data
The Real Web




               http://en.wikipedia.org/wiki/File:Internet_map_1024.jpg
Text Analytic Summit 2010
Wish it were real
Might be real
Is real, but don't believe it
Is currently useful
Entities
Identifiers        Side Step Polysemy




       Bono, A.K.A. Paul David Hewson
http://rdf.freebase.com/ns/en.paul_david_hewson
Vocabulary

                  Manufactures




http://rdf.freebase.com/ns/automotive.make.model_s
A socially managed semantic database
Freebase has Many Types of Things
Text Analytic Summit 2010
Text Analytic Summit 2010
Many Strong Identifiers
            http://rdf.freebase.com/ns/en.berlin_wall




            http://www.ellerdale.com/topics/view/0080-6ba0




            http://www.bbc.co.uk/music/artists/7f347782-eb14-40c3-98e2-17b6e1bfe56c

                   http://musicbrainz.org/artist/7f347782-eb14-40c3-98e2-17b6e1bfe56c

http://rdf.freebase.com/ns/authority.musicbrainz.7f347782-eb14-40c3-98e2-17b6e1bfe56c
12 Million Entites
350 Million Relations
Users contribute data




Users extend the data model
schema = vocabulary
1500 types with 500+ instances!!




A range of of vocabularies....
Growing Freebase
Reconciliation



   +=
Reconciliation

Relational Learning
            Record Matching
Collective Entity Resolution
                 Equivalence Mining
 Record Linking
                Identity Matching
Reconciliation
                              "Excuse Me"
"Excuse Me"
                                   "Harrison Ford"
          "Harrison Ford"




     "Vanity Fair"
                            "Maytime"
Reconciliation
                            "Fugitive"
"Excuse Me"
                                "Harrison Ford"
          "Harrison Ford"




     "Vanity Fair"
                                "Blade Runner"
A Graph of Entities
Vocabulary
contains

            located
                           performed-at               released-by
                                          created


                        plays-in
                                           plays-in

       nationality

                      education
                                          education

                        located
Text Analytic Summit 2010
Reconciliation as "understanding"
   contains

               located
                              performed-at               released-by
                                             created


                           plays-in
                                              plays-in

          nationality

                         education
                                             education

                           located
{
    "/type/object/name":"Blade Runner",
    "/type/object/type":"/film/film",
    "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"],
    "/film/film/director":"Ridley Scott",
    "/film/film/release_date_s":"1981"
}                                   [{
                                       "id":"/guid/9202a8c04000641f8000000000009e89",
                                       "name":["Blade Runner", "Bladerunner"],
                                       "score":1.4320519,
                                       "match":true,
                                       "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/
                                    award_winning_work",
                                         ......
                                       ]},
                                     {
                                       "id":"/guid/9202a8c04000641f80000000002643d0",
                                       "name":["Blade"],
                                       "score":0.48852453,
                                       "match":false,
                                       "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/
                                    award_nominated_work",
                                        .......
                                       ]},
                                     {
                                       "id":"/guid/9202a8c04000641f800000000e5daaae",
                                       "name":["Blade"],
                                       "score":0.46398318,
                                       "match":false,
                                        .....


         http://data.labs.freebase.com/recon/
Data Everywhere
Text Analytic Summit 2010
Wikipedia Features
Wikipedia Features



    X


X

    Error Prone -- Usually <99%
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
(Machine) Learning Semantics
                              get             5M type
                             types
                                             assertions
2.8M Wikipedia topics




                        intersect the two         calculate feature   join feature counts    generate type
                             sources               counts per type         with topics      scores for topics


                                                  2.4M features
                                                                                            1.6G scores
                                                   1400 types
                             extract
                            features


                                            37M features
     5M articles

                             WEX
/people/person distribution
                             untyped topics
                             person topics
                             other topics
                             all topics




                  Data courtesy Viral Shah
RABJ: Humans in the loop
Thresholding Results

          99% threshold at 16.75
/people/person assertions

                threshold




                        53K /people/person
                            assertions
Training Wheels?
Semantics are Everywhere
Text Analytic Summit 2010
Text Analytic Summit 2010
A Strong Tag for Food Inc.
   http://movi.es/BVl43
Widgets: Content Tags
Text Analytic Summit 2010
Text Analytic Summit 2010
Text Analytic Summit 2010
Explicit Semantics
Text Analytic Summit 2010
Rich Snippets
<div class="post-item restaurant-gen-info hreview-aggregate">
 <div class="item vcard">
  <h1 class="fn org">Taylor's Refresher</h1>
  <div class="address">
   <div class="ratings">
     <ul class="star-rating-2 rating" title="4.0 star rating across 3 ratings">
      <li class="current-rating average" style="width:80%;">4.0 star rating</li>
      <li class="star">&nbsp;</li>
      <li class="star">&nbsp;</li><li class="star">&nbsp;</li>
      <li class="star">&nbsp;</li>
      <li class="star">&nbsp;</li>
     </ul>
     <div class="rating-stats">
     <span class="rating">
       <span class="average">4.0</span>
     </span> rating over
     <span class="count">1</span> review
    </div>
Text Analytic Summit 2010
RDFa

       microformats


  HTML5 MicroData


Open Graph Protocol
Explicit Semantics in
 Surprising Places
Blog Tags::Entities
Metaweb Topic Block
Widget Microdata


<div class="fb-widget"
id="fbtb-9a1f44348ad145b5b7d7d7d2376b0420"
style="border:0; outline:0; padding:0; margin:0;
position:relative;" itemscope="" itemid="http://
www.freebase.com/id/en/taylor_swift"
itemtype="http://www.freebase.com/id/music/
artist"> ..... </div>
Thickening the Graph
"Vocabulary" Pattern
             taw    shooter      marksman




              marble   marksman

http://wordnet.freebaseapps.com
                          photo: http://sarabbit.openphoto.net
Text Analytic Summit 2010
Review (neighborhood) Pattern
                           Eric Schlosser


                     E. Coli


                          Michael Pollan

                                   Robert Kenner
Text Analytic Summit 2010
Text Analytic Summit 2010

More Related Content

PDF
Freebase Schema
PDF
NYC Semantic Web Meetup - Aug 2009
PDF
Public private-cloud
PDF
Geo Location Semantics
PDF
Fabrication 0
PDF
Impact of social media on patient adherence
PPTX
The Role of the Pharmacy in Adherence Support
PDF
Adherence to Medication_v3
Freebase Schema
NYC Semantic Web Meetup - Aug 2009
Public private-cloud
Geo Location Semantics
Fabrication 0
Impact of social media on patient adherence
The Role of the Pharmacy in Adherence Support
Adherence to Medication_v3

Similar to Text Analytic Summit 2010 (20)

KEY
Semantic Web: A web that is not the Web
PPT
The Village Avatars: A Learning Asset Model
PDF
Real-time Semantic Web with Twitter Annotations
PDF
Freebase - Semantic Technologies 2010 Code Camp
PDF
Speech acts meet tagging: NiceTag ontology (Pragmatic Web)
PDF
Semantic Web in the browser. From a blind Web to
KEY
rNews - towards structured data websites
PDF
WTF is Semantic Web?
PDF
JavaScript for Flex Devs
PDF
CouchDB Open Source Bridge
PDF
ITWS Capstone Lecture (Spring 2013)
PDF
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
KEY
Augmenting RDBMS with MongoDB for ecommerce
PDF
One Big Happy Family
KEY
Blending MongoDB and RDBMS for ecommerce
PDF
The things browsers can do! SAE Alumni Convention 2014
PDF
CMS content
PDF
Doug Belshaw - Open badges and learning
PDF
The Semantic Web: RPI ITWS Capstone (Fall 2012)
PDF
Overcoming The Impedance Mismatch Between Source Code And Architecture
Semantic Web: A web that is not the Web
The Village Avatars: A Learning Asset Model
Real-time Semantic Web with Twitter Annotations
Freebase - Semantic Technologies 2010 Code Camp
Speech acts meet tagging: NiceTag ontology (Pragmatic Web)
Semantic Web in the browser. From a blind Web to
rNews - towards structured data websites
WTF is Semantic Web?
JavaScript for Flex Devs
CouchDB Open Source Bridge
ITWS Capstone Lecture (Spring 2013)
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Augmenting RDBMS with MongoDB for ecommerce
One Big Happy Family
Blending MongoDB and RDBMS for ecommerce
The things browsers can do! SAE Alumni Convention 2014
CMS content
Doug Belshaw - Open badges and learning
The Semantic Web: RPI ITWS Capstone (Fall 2012)
Overcoming The Impedance Mismatch Between Source Code And Architecture
Ad

More from Jamie Taylor (8)

PDF
Social Fabric of Semantics - SemTech 2010
PDF
The next phase of Web2.0: Data
PDF
Using Semantics to Enhance Content
PDF
Freebase Workshop, December 2009
PDF
Using Semantics to Enhance Content Publishing
PDF
ISWC 2009 Consuming LOD
PDF
Drupal and the Semantic Web
PDF
Freebase, RDF and the Semantic Web
Social Fabric of Semantics - SemTech 2010
The next phase of Web2.0: Data
Using Semantics to Enhance Content
Freebase Workshop, December 2009
Using Semantics to Enhance Content Publishing
ISWC 2009 Consuming LOD
Drupal and the Semantic Web
Freebase, RDF and the Semantic Web
Ad

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”

Text Analytic Summit 2010