SlideShare a Scribd company logo
Semantic Search Using RDF Metadata Semantic Technology Conference 2005  8 March 2005 Bradley P. Allen Siderean Software, Inc.
Overview Semantic search Motivation  Enterprise adoption Semantic search using RDF Examples Lessons Directions
Problem “ We have to understand what information we have and organize it,’ says [Santa Clara Co. CIO] Ajmani, who estimates that saving each employee an hour a month spent looking for information would save millions of dollars.”  [Information Week, 1/19/04] “… typical enterprise floundering in a sea of information … too many repositories, each with its own set of applications.”  [IDC, 2004] “ The search capabilities on most company and content-oriented Web sites are as bad now as they were several years ago.”  [eWeek, 1/26/04]
Portal-driven demand for a better solution “ A portal provides an integrated information source for our internal process users or external customers” “ Now we have to architect the information related to business processes differently to search across multiple repositories” But they lack tools and applications that support this
Current solutions Enterprise search, portals, knowledge management and content management systems lashed up in ad hoc architectures Doesn’t unify data and content Doesn’t provide context or scope Too many results (requires searching the answer to the original search)
Why semantic search? Explicitly represented knowledge can Unify access to both content and data Create context and frames of reference Intellectual contributions that inform the search process must be captured The answer should include the question
Semantic search – some definitions Search: the process of retrieving objects matching a given query Semantic search:  Search that uses an explicit representation of knowledge to retrieve, organize or display objects matching a query Search that transparently renders human insight into the nature of matches
Benefits in the enterprise Addresses pervasive frustration with enterprise search Let users  Find high-value information quickly Add more value to it, and  Share it with others Aligns information to business needs
Roots Parametric search Query by example Retrieval by reformulation Rabbit, Argon Work in existing enterprise search and knowledge management Autonomy, Semio
Semantic search requires metadata Ontologies Specifications of how to represent classes, instances and their properties Sometimes called “vocabularies” Controlled vocabularies Terms for saying what something is about Also called “taxonomies” and “thesauri” Instances Descriptions of resources Application profiles Specifications of which classes and properties are useful and how they are to be used in an application
Current metadata solutions are costly Much custom development done Not easy to tag or incorporate content into the desired structures No easy way for groups creating the vocabularies to deliver them to production environments Perceived lack of tools Point solutions not well integrated Existing platform solutions closed
Metadata in today’s enterprises From thirty interviews conducted with Fortune 1000 organizations during Fall 2004 Use of metadata not yet widespread but emerging Understanding varies widely across enterprises Three basic approaches Top down, bottom up, and give up
Approach: top down CEO says “We must be an information-driven company” “ Corporate controlled vocabulary  that all divisions will use” Typically based on Dublin Core Used for subject tagging The effort is multi-year, ROI hard to track, and may not be implemented or adopted widely
Approach: bottom up Groups determine their vocabulary while describing their process Often in a collaboration environment  Light tagging of content when it is created or when the content is published to a portal Again, based on Dublin Core and their own controlled vocabularies
Approach: give up Assumption: too difficult to create metadata from existing content “ We can’t ever hope to organize this morass of content, so let’s put in a search appliance like Google” “ Our internal needs are like the public internet and users are familiar with Google searches” But still feel that metadata would improve matters, particularly within business units
Don’t give up! RDF can make metadata use easier and less costly  An open standard for metadata reduces cost and avoids technology and vendor lock-in A “universal solvent” for data and content A platform for reuse and sharing
Building semantic search systems with RDF Define/reuse ontologies expressed in RDF(S) Classes for defining instances and controlled vocabularies Properties for facets and additional attributes Import/transform instances into an RDF representation Resources referred to via URIs Content and controlled vocabularies Write application profiles in terms of RDF
Types of semantic search in RDF Searching for RDF Swoogle Adding value to search using RDF TAP, FOAFNaut Searching resources using RDF Edutella, Seamark
Swoogle: Searching for RDF Crawling for SW documents Leverages Google indexing And structure of key document types Searching for ontologies and instance data Mostly relevant to people bulding semantic applications rather than general users
TAP: Adding value to search using RDF Layering “related items” on top of traditional Web search Arm’s length integration and value-add for traditional Web search
FOAFNaut: Adding value to search using RDF Specialized search and visualization over FOAF networks Introducing the notion of social aspects of finding information
Edutella: Searching resources using RDF P2P architecture federating collections of learning objects Work on distributing RDF queries using schema information RDF as a more natural representation for learning objects than IEEE LOM
Seamark: Searching resources using RDF Using ontologies and taxonomies to define navigation over specific collections First implementation of faceted navigation using RDF
Faceted navigation as a type of semantic search Metadata may be faceted, i.e., includes properties whose ranges form a near-orthogonal set of controlled vocabularies Creator: Dickens, Charles Subject: Arsenic, Antimony Location: World > U.S. > California > Venice Facets form a frame of reference for information overview, access and discovery Other properties serve as landmarks and cues
Case study: DC 2003 Online Proceedings Further the goals of the Dublin Core Metadata Initiative (DCMI) by providing DC-centric faceted navigation of online proceedings
Project timeline July 2003 Initial experiment using DC 2002 site August 2003 Initial proposal to DCMI Iterative prototyping involving Selection and development of ontologies Generation of instance metadata Specification of application profile Conversion of DC2003 dataset into navigable RDF Elapsed time to implement: 1 day September 2003 Design and editing of controlled vocabulary Final iterations on site pages Launch at conference
Ontology Reused ontologies and metadata vocabularies Papers and posters: Dublin Core  Creators: Friend Of A Friend (FOAF)  Subjects: Thesaurus Interchange Format (TIF)  Added relatively few properties and classes in a conference ontology Events Tracks
Ontology for conferences <s:Class rdf:about=&quot;&dcconf;Event&quot;>  <s:label>Presentation</s:label>  </s:Class>  <s:Class rdf:about=&quot;&dcconf;Paper&quot;>  <s:label>Paper</s:label>  <s:subClassOf rdf:resource=&quot;&dcconf;Event&quot;/> </s:Class>  <s:Class rdf:about=&quot;&dcconf;Track&quot;>  <s:label>Conference Track</s:label>  </s:Class>  <rdf:Property rdf:about=&quot;&dcconf;track&quot;>  <s:label>Track</s:label>  <s:comment>The track that the given paper is in.</s:comment>  <s:domain rdf:resource=&quot;&dcconf;Event&quot; />  <s:range rdf:resource=&quot;&dcconf;Track&quot; />  </rdf:Property>
Controlled vocabulary Author-assigned keywords used as source materials Combined author-assigned with editorial judgment about the CV terms and structure
Seed thesaurus
Wrapping author-assigned keywords <tif:Term rdf:about=&quot;&dcconf2003;Relational_Database&quot;>  <tif:value>Relational Database</tif:value>  <tifs:USE rdf:resource=&quot;&dcconf2003;Relational_Databases&quot; /> </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;relationship_metadata&quot;>  <tif:value>Relationship metadata</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;requirements&quot;>  <tif:value>Requirements</tif:value>  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;resource_discovery&quot;>  <tif:value>Resource discovery</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Discovery&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;resource-level_metadata&quot;>  <tif:value>Resource-level metadata</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;SCORM&quot;>  <tif:value>SCORM</tif:value>  <tifs:USE rdf:resource=&quot;&dcconf2003;Sharable_Content_Object_Reference_Model_SCORM&quot; />  </tif:Term>
Adding editorial control <tif:Term rdf:about=&quot;&dcconf2003;Domain_Metadata&quot;>  <tif:value>Domain Metadata</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Applications&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;Governments&quot;>  <tif:value>Governments</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Organizations_and_Domains&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;Federal_Geographic_Data_Committee_Metadata&quot;>  <tif:value>Federal Geographic Data Committee Metadata</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; />  <tifs:RT rdf:resource=&quot;&dcconf2003;Governments&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;Geospatial_Metadata&quot;>  <tif:value>Geospatial Metadata</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; />  <tifs:RT rdf:resource=&quot;&dcconf2003;Organizations_and_Domains&quot; />  </tif:Term>  <tif:Term rdf:about=&quot;&dcconf2003;Government_Agency_Metadata&quot;>  <tif:value>Government Agency Metadata</tif:value>  <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; />  <tifs:RT rdf:resource=&quot;&dcconf2003;Governments&quot; />  </tif:Term>
Instance metadata Paper and poster metadata automatically extracted from author submissions Ad hoc Perl script Manual review and cleanup of generated RDF Mostly Dublin Core with some application-specific properties Creator and organization metadata manually collated from paper and poster metadata Represented in FOAF (but not in the manner in which FOAF is typically used)
Papers and posters <dcconf:Paper rdf:about=&quot;http://www.siderean.com/dc2003/103_paper-22.pdf&quot;> <seamark:texturl>http://www.siderean.com/dc2003/103_paper-22.pdf</seamark:texturl>  <rdf:type rdf:resource=&quot;&dcconf;Event&quot;/>  <dcconf:track rdf:resource=&quot;&dcconf;Interoperability&quot; />  <dc:title>Two Paths to Interoperable Metadata</dc:title>  <dc:creator rdf:resource=&quot;&dcconf;Godby_Carol&quot; />  <dc:creator rdf:resource=&quot;&dcconf;Smith_Devon&quot; />  <dc:creator rdf:resource=&quot;&dcconf;Childress_Eric&quot; />  <dc:description> This paper describes a prototype for a Web service that translates between pairs of metadata schemas. Despite a current trend toward encoding in XML and XSLT, we present arguments for a design that features a more distinct separation of syntax from semantics. The result is a system that auomates routine processes, has a well-defined place for human input, and achieves a clean separation of the document data model, the document translations, and the machinery of the application. </dc:description>  <dc:subject rdf:resource=&quot;&dcconf2003;metadata_schema_translation&quot; /> <dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;metadata_schema_translation&quot; /> <dc:subject rdf:resource=&quot;&dcconf2003;Web_services&quot; />  <dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;Web_services&quot; />  <dc:subject rdf:resource=&quot;&dcconf2003;communities_of_practice&quot; />  <dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;communities_of_practice&quot; />  </dcconf:Paper>
Creators and organizations <foaf:Person rdf:about=&quot;&dcconf;Greenberg_Jane&quot;>  <foaf:name>Greenberg, Jane</foaf:name>  <foaf:mbox rdf:resource=&quot;mailto:janeg@ils.unc.edu&quot; />  <foaf:memberOf rdf:resource=&quot;&dcconf;University_of_North_Carolina_at_Chapel_Hill&quot; />  <foaf:publication rdf:resource=&quot;http://www.siderean.com/dc2003/202_Paper82-color-NEW.pdf&quot; />  </foaf:Person>  <foaf:Organization rdf:about=&quot;&dcconf;University_of_North_Carolina_at_Chapel_Hill&quot;>  <foaf:name>University of North Carolina at Chapel Hill, USA</foaf:name>  <foaf:member rdf:resource=&quot;&dcconf;Greenberg_Jane&quot; /> <foaf:member rdf:resource=&quot;&dcconf;Crystal_Abe&quot; /> </foaf:Organization>
Application profile Expressed in XRBR (XML For Retrieval By Reformulation) Specifies a view over (possibly heterogeneous) RDF schemas with hints as to its interpretation and use for faceted navigation Provides a language for query reformulation and refinement in the context of navigation Query: “give me all resources where…” + advice  Response: result set +  suggested query refinements  + original query
Application profile: specifying facets <xrbr:query xmlns:xrbr=&quot;http://www.siderean.com/2001/10/xrbr/&quot; item-type=&quot;http://www.dcmi.org/dcconf/objects#Event&quot; sort-dimension=&quot;title&quot; >  <xrbr:hint flattenresults=&quot;yes&quot; startpagecolumns=&quot;4&quot;/>  <xrbr:dimensions>  <xrbr:dimension name=&quot;title&quot;  predicate=&quot;http://purl.org/dc/elements/1.1/title&quot;>  <xrbr:hint textsearch=&quot;yes&quot; label=&quot;Title&quot; function=&quot;itemlabel&quot;/>  <xrbr:return />  </xrbr:dimension>  <xrbr:dimension name=&quot;description&quot;  predicate=&quot;http://purl.org/dc/elements/1.1/description&quot;>  <xrbr:hint textsearch=&quot;yes&quot; label=&quot;Description&quot;  function=&quot;itemdescription&quot;/>  <xrbr:return />  </xrbr:dimension>  … </xrbr:dimensions>  </xrbr:query>
Application profile: specifying hierarchical facets … <xrbr:dimension name=&quot;BT1&quot; predicate=&quot;http://purl.org/dc/elements/1.1/subject&quot;  display-predicate=&quot;http://www.w3c.rl.ac.uk/2003/07/31-tif#value&quot;  root-resource=&quot;http://www.dcmi.org/dcconf/2003#Organizations_and_Domains&quot;  ancestor-predicate=&quot;http://www.w3c.rl.ac.uk/2003/07/31-tif-simple#BT&quot; >  <xrbr:hint label=&quot;Organizations and Domains&quot;  facet=&quot;yes“ scopenote=&quot;Sectors, languages, special literatures or communities that use metadata&quot; />  <xrbr:suggestions count=&quot;7&quot; /> </xrbr:dimension>  …
Application profile: flattening graphs … <xrbr:structure name=&quot;creator&quot; predicate=&quot;http://purl.org/dc/elements/1.1/creator&quot;>  <xrbr:dimension name=&quot;creatorname&quot;  predicate=&quot;http://xmlns.com/foaf/0.1/#name&quot;>  <xrbr:hint label=&quot;Author&quot; textsearch=&quot;yes&quot;/>  <xrbr:suggestions count=&quot;7&quot; />  <xrbr:return />  </xrbr:dimension>  <xrbr:dimension name=&quot;creatororg“ predicate=&quot;http://xmlns.com/foaf/0.1/#memberOf&quot;  display-predicate=&quot;http://xmlns.com/foaf/0.1/#name&quot;>  <xrbr:hint label=&quot;Author Affiliation&quot; />  <xrbr:suggestions count=&quot;7&quot; />  <xrbr:return />  </xrbr:dimension>  </xrbr:structure> …
Automatically generated interface
Alternate view: creators
Alternate view: subjects
Site start page
Site drilldown
Case study: Environmental Health News Aggregating news stories from the Web Semi-automated metadata creation by a team of subject matter experts and editors Semantic search to design custom feeds
Case study: Gateway to Educational Materials Aggregating learning objects from members of the GEM Consortium Embedding semantic search into a portal
Case study: NASA JPL Project information aggregated from content and data repositories Using and extending taxonomies Exploiting document type/genre
Related work in RDF OCLC Metadata Switch MIT Simile Longwell Haystack Aduna Sesame Ontoprise OntoSeek Nature Publishing Group Urchin
Issues Scale: must be commensurate with expectations and requirements from traditional web and enterprise search Number of objects, feeds: 10 6  to 10 9 Ingest rates: ~ 10 3  – 10 4  triples/sec, how many per resource? Tagging: where and when? Latency: < 0.5 sec user time regardless of application Retrieval algorithms: many alternatives still being explored Federated services vs. centralized servers Relationship to relevance ranking Support for aggregate and text search operators in RDF query Usability: lots of work to be done to validate benefits Navigation Precision and recall Visualization Security, trust and provenance: just beginning to understand
Lessons Balanced incremental approach Leverage metadata and indices at hand Exploit statistics where desirable But layer a framework on top to structure the statistics Significant mileage from very simple frameworks
Lessons: ontologies Don’t do: assume you have to build elaborate OWL ontologies  Don’t have to boil the ocean to get the benefits OWL DL, are OWL Full are overkill for this class of application Do: Tiny Ontologies Stitched Together RDF Schema with a smattering of RDF/OWL properties (e.g., owl:inverse) Start with DC + SKOS + FOAF
Lessons: controlled vocabularies Don’t do: huge monolithic taxonomies Unless they are ready at hand  and  can be reused largely without modification Do: bite-sized controlled vocabularies that exploit faceted approaches 4 facets x 10 terms per facet versus 10 4  terms in a single taxonomy Start with flat term lists Add BT/NT/RT relationships over time
Lessons: instances Manual creation Don’t do: exhaustive author creation of metadata Do: community annotation and tagging (Semi-)automated creation Don’t do: assume elaborate information extraction based on NLP, subject tagging and categorization Do: quick and dirty NEE or better yet, stick to readily available asset and relational metadata (date, creator, document type/genre) Much of the benefit at a fraction of the effort
Application profiles Metadata is increasingly pervasive The way to leverage existing information infrastructure Exploit “on-demand” information integration feature of RDF DB + XML -> XLST - > RDF(S)
The big question: statistics vs. knowledge Statistics can’t deliver everything Alan Kay’s puppy analogy Vitanyi work on “Google learning” On the other hand, knowledge is dearly won CYC Need a balance that enables adoption without losing the benefits Lessons from Statistics vs. knowledge in NLP Expert systems
Future directions User tagging + RDF: the killer SW application? The rehabilitation of metadata in the social software community The re-emergence of RSS/RDF “ Folksonomy”-driven collaborative search Del.icio.us, Flickr, CiteULike Growth of the SW compared to historical growth of the Web: it’s 1994 all over again
Summary Semantic search has a role in today’s enterprises RDF provides a framework that can ease adoption and encourage innovation in semantic search The future for enterprise and consumer use looks bright
 

More Related Content

PPT
Faceted Navigation (LACASIS Fall Workshop 2005)
PPT
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
PPT
Enterprise Navigation (KM World 2007)
PPT
Relational Navigation Brings Social Computing and Semantic Technology Computi...
PPT
Introducing Siderean Software (PC Forum 2005)
PPTX
Semantic Search at Yahoo
PPTX
Making things findable
PPTX
Sem tech2013 tutorial
Faceted Navigation (LACASIS Fall Workshop 2005)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Enterprise Navigation (KM World 2007)
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Introducing Siderean Software (PC Forum 2005)
Semantic Search at Yahoo
Making things findable
Sem tech2013 tutorial

What's hot (19)

PPT
Metadata practice and direction: a community perspective
PPT
Microformats Workshop (2009)
PPT
Gt ea2009
PPT
Related Entity Finding on the Web
PPT
Open Conceptual Data Models
PPTX
Jim Hendler's Presentation at SSSW 2011
ODP
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
PPT
Explaining The Semantic Web
PPTX
Diane Hillmann: RDA Vocabularies in the Semantic Web
PPT
Web Mining
PPT
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
PPTX
Introduction to RDA Part 1
PPTX
Asis&t webinar people directories access innovations
PPT
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
PPTX
Solving the Challenge of Connecting People and Author Networks
PPT
香港六合彩
PPT
Semantics and Web 3.0
PPT
An Ontology for K-12 Education and the NIEM
PPTX
Metadata & controlled vocabulary
Metadata practice and direction: a community perspective
Microformats Workshop (2009)
Gt ea2009
Related Entity Finding on the Web
Open Conceptual Data Models
Jim Hendler's Presentation at SSSW 2011
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Explaining The Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic Web
Web Mining
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Introduction to RDA Part 1
Asis&t webinar people directories access innovations
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Solving the Challenge of Connecting People and Author Networks
香港六合彩
Semantics and Web 3.0
An Ontology for K-12 Education and the NIEM
Metadata & controlled vocabulary
Ad

Viewers also liked (20)

PPT
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
PPT
Navigation Through Social Computing (Enterprise Search Summit 2008)
PDF
Rethinking Faceted Navigation for Online Marketing (2008)
PDF
Siderean and AWS (AWS Startup Event LA 2008)
PDF
V Mware Workshop
PDF
Pillar times 13-09-2015
PDF
Boletin Ofertas Eures Noviembre 2011
PPTX
Deloitte Callum Bir - mHealth IBC
PDF
El misterio de lacasa encantada
PDF
Telecom Italia
TXT
Badan hukum
PDF
eLearning in Romania: the State of the Art
PDF
Best of Washington DC
PPT
17 october embedded seminar
PPTX
Моделирование и анализ прочности сэндвич-панели в ANSYS
PDF
Jet Programme 20092010
PDF
CN_TPM_Brochure
PPTX
♥ 00 lifeofbuddha 140901 rev06 part3
PPT
Samurai
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Navigation Through Social Computing (Enterprise Search Summit 2008)
Rethinking Faceted Navigation for Online Marketing (2008)
Siderean and AWS (AWS Startup Event LA 2008)
V Mware Workshop
Pillar times 13-09-2015
Boletin Ofertas Eures Noviembre 2011
Deloitte Callum Bir - mHealth IBC
El misterio de lacasa encantada
Telecom Italia
Badan hukum
eLearning in Romania: the State of the Art
Best of Washington DC
17 october embedded seminar
Моделирование и анализ прочности сэндвич-панели в ANSYS
Jet Programme 20092010
CN_TPM_Brochure
♥ 00 lifeofbuddha 140901 rev06 part3
Samurai
Ad

Similar to Semantic Search using RDF Metadata (SemTech 2005) (20)

PPT
DM110 - Week 10 - Semantic Web / Web 3.0
PPTX
SemTech 2011 Semantic Search tutorial
ODT
Riding The Semantic Wave
PPT
Semantic Web in Action
PDF
Metadata
PPT
Slawek Korea
PPT
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
PPT
Tutorial on Semantic Digital Libraries (ESWC'2007)
PPTX
Large-Scale Semantic Search
PDF
NetIKX Semantic Search Presentation
PPT
Semantic Web, Cataloging, & Metadata
PPT
Peter Mika's Presentation at SSSW 2011
PPTX
Semantic Web, e-commerce
PPT
Spivack Blogtalk 2008
PPT
Tutorial on Semantic Digital Libraries (WWW'2007)
PPT
Implementing Semantic Search
PDF
Semantic Search Tutorial at SemTech 2012
PPT
Nova Spivack - Semantic Web Talk
PPT
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
ODP
Journalism and the Semantic Web
DM110 - Week 10 - Semantic Web / Web 3.0
SemTech 2011 Semantic Search tutorial
Riding The Semantic Wave
Semantic Web in Action
Metadata
Slawek Korea
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Tutorial on Semantic Digital Libraries (ESWC'2007)
Large-Scale Semantic Search
NetIKX Semantic Search Presentation
Semantic Web, Cataloging, & Metadata
Peter Mika's Presentation at SSSW 2011
Semantic Web, e-commerce
Spivack Blogtalk 2008
Tutorial on Semantic Digital Libraries (WWW'2007)
Implementing Semantic Search
Semantic Search Tutorial at SemTech 2012
Nova Spivack - Semantic Web Talk
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
Journalism and the Semantic Web

More from Bradley Allen (7)

PDF
DC-2016 Keynote 2016-10-13
PDF
Linked data and the future of scientific publishing
PDF
Smart Content AAP PSP 2012 02-01 rev 1
PDF
Innovation and the STM publisher of the future (SSP IN Conference 2011)
PPT
Searching BBC Rushes Using Semantic Web Techniques (TRECVID 2005)
PPT
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
PPT
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
DC-2016 Keynote 2016-10-13
Linked data and the future of scientific publishing
Smart Content AAP PSP 2012 02-01 rev 1
Innovation and the STM publisher of the future (SSP IN Conference 2011)
Searching BBC Rushes Using Semantic Web Techniques (TRECVID 2005)
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...

Semantic Search using RDF Metadata (SemTech 2005)

  • 1. Semantic Search Using RDF Metadata Semantic Technology Conference 2005 8 March 2005 Bradley P. Allen Siderean Software, Inc.
  • 2. Overview Semantic search Motivation Enterprise adoption Semantic search using RDF Examples Lessons Directions
  • 3. Problem “ We have to understand what information we have and organize it,’ says [Santa Clara Co. CIO] Ajmani, who estimates that saving each employee an hour a month spent looking for information would save millions of dollars.” [Information Week, 1/19/04] “… typical enterprise floundering in a sea of information … too many repositories, each with its own set of applications.” [IDC, 2004] “ The search capabilities on most company and content-oriented Web sites are as bad now as they were several years ago.” [eWeek, 1/26/04]
  • 4. Portal-driven demand for a better solution “ A portal provides an integrated information source for our internal process users or external customers” “ Now we have to architect the information related to business processes differently to search across multiple repositories” But they lack tools and applications that support this
  • 5. Current solutions Enterprise search, portals, knowledge management and content management systems lashed up in ad hoc architectures Doesn’t unify data and content Doesn’t provide context or scope Too many results (requires searching the answer to the original search)
  • 6. Why semantic search? Explicitly represented knowledge can Unify access to both content and data Create context and frames of reference Intellectual contributions that inform the search process must be captured The answer should include the question
  • 7. Semantic search – some definitions Search: the process of retrieving objects matching a given query Semantic search: Search that uses an explicit representation of knowledge to retrieve, organize or display objects matching a query Search that transparently renders human insight into the nature of matches
  • 8. Benefits in the enterprise Addresses pervasive frustration with enterprise search Let users Find high-value information quickly Add more value to it, and Share it with others Aligns information to business needs
  • 9. Roots Parametric search Query by example Retrieval by reformulation Rabbit, Argon Work in existing enterprise search and knowledge management Autonomy, Semio
  • 10. Semantic search requires metadata Ontologies Specifications of how to represent classes, instances and their properties Sometimes called “vocabularies” Controlled vocabularies Terms for saying what something is about Also called “taxonomies” and “thesauri” Instances Descriptions of resources Application profiles Specifications of which classes and properties are useful and how they are to be used in an application
  • 11. Current metadata solutions are costly Much custom development done Not easy to tag or incorporate content into the desired structures No easy way for groups creating the vocabularies to deliver them to production environments Perceived lack of tools Point solutions not well integrated Existing platform solutions closed
  • 12. Metadata in today’s enterprises From thirty interviews conducted with Fortune 1000 organizations during Fall 2004 Use of metadata not yet widespread but emerging Understanding varies widely across enterprises Three basic approaches Top down, bottom up, and give up
  • 13. Approach: top down CEO says “We must be an information-driven company” “ Corporate controlled vocabulary that all divisions will use” Typically based on Dublin Core Used for subject tagging The effort is multi-year, ROI hard to track, and may not be implemented or adopted widely
  • 14. Approach: bottom up Groups determine their vocabulary while describing their process Often in a collaboration environment Light tagging of content when it is created or when the content is published to a portal Again, based on Dublin Core and their own controlled vocabularies
  • 15. Approach: give up Assumption: too difficult to create metadata from existing content “ We can’t ever hope to organize this morass of content, so let’s put in a search appliance like Google” “ Our internal needs are like the public internet and users are familiar with Google searches” But still feel that metadata would improve matters, particularly within business units
  • 16. Don’t give up! RDF can make metadata use easier and less costly An open standard for metadata reduces cost and avoids technology and vendor lock-in A “universal solvent” for data and content A platform for reuse and sharing
  • 17. Building semantic search systems with RDF Define/reuse ontologies expressed in RDF(S) Classes for defining instances and controlled vocabularies Properties for facets and additional attributes Import/transform instances into an RDF representation Resources referred to via URIs Content and controlled vocabularies Write application profiles in terms of RDF
  • 18. Types of semantic search in RDF Searching for RDF Swoogle Adding value to search using RDF TAP, FOAFNaut Searching resources using RDF Edutella, Seamark
  • 19. Swoogle: Searching for RDF Crawling for SW documents Leverages Google indexing And structure of key document types Searching for ontologies and instance data Mostly relevant to people bulding semantic applications rather than general users
  • 20. TAP: Adding value to search using RDF Layering “related items” on top of traditional Web search Arm’s length integration and value-add for traditional Web search
  • 21. FOAFNaut: Adding value to search using RDF Specialized search and visualization over FOAF networks Introducing the notion of social aspects of finding information
  • 22. Edutella: Searching resources using RDF P2P architecture federating collections of learning objects Work on distributing RDF queries using schema information RDF as a more natural representation for learning objects than IEEE LOM
  • 23. Seamark: Searching resources using RDF Using ontologies and taxonomies to define navigation over specific collections First implementation of faceted navigation using RDF
  • 24. Faceted navigation as a type of semantic search Metadata may be faceted, i.e., includes properties whose ranges form a near-orthogonal set of controlled vocabularies Creator: Dickens, Charles Subject: Arsenic, Antimony Location: World > U.S. > California > Venice Facets form a frame of reference for information overview, access and discovery Other properties serve as landmarks and cues
  • 25. Case study: DC 2003 Online Proceedings Further the goals of the Dublin Core Metadata Initiative (DCMI) by providing DC-centric faceted navigation of online proceedings
  • 26. Project timeline July 2003 Initial experiment using DC 2002 site August 2003 Initial proposal to DCMI Iterative prototyping involving Selection and development of ontologies Generation of instance metadata Specification of application profile Conversion of DC2003 dataset into navigable RDF Elapsed time to implement: 1 day September 2003 Design and editing of controlled vocabulary Final iterations on site pages Launch at conference
  • 27. Ontology Reused ontologies and metadata vocabularies Papers and posters: Dublin Core Creators: Friend Of A Friend (FOAF) Subjects: Thesaurus Interchange Format (TIF) Added relatively few properties and classes in a conference ontology Events Tracks
  • 28. Ontology for conferences <s:Class rdf:about=&quot;&dcconf;Event&quot;> <s:label>Presentation</s:label> </s:Class> <s:Class rdf:about=&quot;&dcconf;Paper&quot;> <s:label>Paper</s:label> <s:subClassOf rdf:resource=&quot;&dcconf;Event&quot;/> </s:Class> <s:Class rdf:about=&quot;&dcconf;Track&quot;> <s:label>Conference Track</s:label> </s:Class> <rdf:Property rdf:about=&quot;&dcconf;track&quot;> <s:label>Track</s:label> <s:comment>The track that the given paper is in.</s:comment> <s:domain rdf:resource=&quot;&dcconf;Event&quot; /> <s:range rdf:resource=&quot;&dcconf;Track&quot; /> </rdf:Property>
  • 29. Controlled vocabulary Author-assigned keywords used as source materials Combined author-assigned with editorial judgment about the CV terms and structure
  • 31. Wrapping author-assigned keywords <tif:Term rdf:about=&quot;&dcconf2003;Relational_Database&quot;> <tif:value>Relational Database</tif:value> <tifs:USE rdf:resource=&quot;&dcconf2003;Relational_Databases&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;relationship_metadata&quot;> <tif:value>Relationship metadata</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;requirements&quot;> <tif:value>Requirements</tif:value> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;resource_discovery&quot;> <tif:value>Resource discovery</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Discovery&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;resource-level_metadata&quot;> <tif:value>Resource-level metadata</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;SCORM&quot;> <tif:value>SCORM</tif:value> <tifs:USE rdf:resource=&quot;&dcconf2003;Sharable_Content_Object_Reference_Model_SCORM&quot; /> </tif:Term>
  • 32. Adding editorial control <tif:Term rdf:about=&quot;&dcconf2003;Domain_Metadata&quot;> <tif:value>Domain Metadata</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Applications&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;Governments&quot;> <tif:value>Governments</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Organizations_and_Domains&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;Federal_Geographic_Data_Committee_Metadata&quot;> <tif:value>Federal Geographic Data Committee Metadata</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> <tifs:RT rdf:resource=&quot;&dcconf2003;Governments&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;Geospatial_Metadata&quot;> <tif:value>Geospatial Metadata</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> <tifs:RT rdf:resource=&quot;&dcconf2003;Organizations_and_Domains&quot; /> </tif:Term> <tif:Term rdf:about=&quot;&dcconf2003;Government_Agency_Metadata&quot;> <tif:value>Government Agency Metadata</tif:value> <tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> <tifs:RT rdf:resource=&quot;&dcconf2003;Governments&quot; /> </tif:Term>
  • 33. Instance metadata Paper and poster metadata automatically extracted from author submissions Ad hoc Perl script Manual review and cleanup of generated RDF Mostly Dublin Core with some application-specific properties Creator and organization metadata manually collated from paper and poster metadata Represented in FOAF (but not in the manner in which FOAF is typically used)
  • 34. Papers and posters <dcconf:Paper rdf:about=&quot;http://www.siderean.com/dc2003/103_paper-22.pdf&quot;> <seamark:texturl>http://www.siderean.com/dc2003/103_paper-22.pdf</seamark:texturl> <rdf:type rdf:resource=&quot;&dcconf;Event&quot;/> <dcconf:track rdf:resource=&quot;&dcconf;Interoperability&quot; /> <dc:title>Two Paths to Interoperable Metadata</dc:title> <dc:creator rdf:resource=&quot;&dcconf;Godby_Carol&quot; /> <dc:creator rdf:resource=&quot;&dcconf;Smith_Devon&quot; /> <dc:creator rdf:resource=&quot;&dcconf;Childress_Eric&quot; /> <dc:description> This paper describes a prototype for a Web service that translates between pairs of metadata schemas. Despite a current trend toward encoding in XML and XSLT, we present arguments for a design that features a more distinct separation of syntax from semantics. The result is a system that auomates routine processes, has a well-defined place for human input, and achieves a clean separation of the document data model, the document translations, and the machinery of the application. </dc:description> <dc:subject rdf:resource=&quot;&dcconf2003;metadata_schema_translation&quot; /> <dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;metadata_schema_translation&quot; /> <dc:subject rdf:resource=&quot;&dcconf2003;Web_services&quot; /> <dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;Web_services&quot; /> <dc:subject rdf:resource=&quot;&dcconf2003;communities_of_practice&quot; /> <dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;communities_of_practice&quot; /> </dcconf:Paper>
  • 35. Creators and organizations <foaf:Person rdf:about=&quot;&dcconf;Greenberg_Jane&quot;> <foaf:name>Greenberg, Jane</foaf:name> <foaf:mbox rdf:resource=&quot;mailto:janeg@ils.unc.edu&quot; /> <foaf:memberOf rdf:resource=&quot;&dcconf;University_of_North_Carolina_at_Chapel_Hill&quot; /> <foaf:publication rdf:resource=&quot;http://www.siderean.com/dc2003/202_Paper82-color-NEW.pdf&quot; /> </foaf:Person> <foaf:Organization rdf:about=&quot;&dcconf;University_of_North_Carolina_at_Chapel_Hill&quot;> <foaf:name>University of North Carolina at Chapel Hill, USA</foaf:name> <foaf:member rdf:resource=&quot;&dcconf;Greenberg_Jane&quot; /> <foaf:member rdf:resource=&quot;&dcconf;Crystal_Abe&quot; /> </foaf:Organization>
  • 36. Application profile Expressed in XRBR (XML For Retrieval By Reformulation) Specifies a view over (possibly heterogeneous) RDF schemas with hints as to its interpretation and use for faceted navigation Provides a language for query reformulation and refinement in the context of navigation Query: “give me all resources where…” + advice Response: result set + suggested query refinements + original query
  • 37. Application profile: specifying facets <xrbr:query xmlns:xrbr=&quot;http://www.siderean.com/2001/10/xrbr/&quot; item-type=&quot;http://www.dcmi.org/dcconf/objects#Event&quot; sort-dimension=&quot;title&quot; > <xrbr:hint flattenresults=&quot;yes&quot; startpagecolumns=&quot;4&quot;/> <xrbr:dimensions> <xrbr:dimension name=&quot;title&quot; predicate=&quot;http://purl.org/dc/elements/1.1/title&quot;> <xrbr:hint textsearch=&quot;yes&quot; label=&quot;Title&quot; function=&quot;itemlabel&quot;/> <xrbr:return /> </xrbr:dimension> <xrbr:dimension name=&quot;description&quot; predicate=&quot;http://purl.org/dc/elements/1.1/description&quot;> <xrbr:hint textsearch=&quot;yes&quot; label=&quot;Description&quot; function=&quot;itemdescription&quot;/> <xrbr:return /> </xrbr:dimension> … </xrbr:dimensions> </xrbr:query>
  • 38. Application profile: specifying hierarchical facets … <xrbr:dimension name=&quot;BT1&quot; predicate=&quot;http://purl.org/dc/elements/1.1/subject&quot; display-predicate=&quot;http://www.w3c.rl.ac.uk/2003/07/31-tif#value&quot; root-resource=&quot;http://www.dcmi.org/dcconf/2003#Organizations_and_Domains&quot; ancestor-predicate=&quot;http://www.w3c.rl.ac.uk/2003/07/31-tif-simple#BT&quot; > <xrbr:hint label=&quot;Organizations and Domains&quot; facet=&quot;yes“ scopenote=&quot;Sectors, languages, special literatures or communities that use metadata&quot; /> <xrbr:suggestions count=&quot;7&quot; /> </xrbr:dimension> …
  • 39. Application profile: flattening graphs … <xrbr:structure name=&quot;creator&quot; predicate=&quot;http://purl.org/dc/elements/1.1/creator&quot;> <xrbr:dimension name=&quot;creatorname&quot; predicate=&quot;http://xmlns.com/foaf/0.1/#name&quot;> <xrbr:hint label=&quot;Author&quot; textsearch=&quot;yes&quot;/> <xrbr:suggestions count=&quot;7&quot; /> <xrbr:return /> </xrbr:dimension> <xrbr:dimension name=&quot;creatororg“ predicate=&quot;http://xmlns.com/foaf/0.1/#memberOf&quot; display-predicate=&quot;http://xmlns.com/foaf/0.1/#name&quot;> <xrbr:hint label=&quot;Author Affiliation&quot; /> <xrbr:suggestions count=&quot;7&quot; /> <xrbr:return /> </xrbr:dimension> </xrbr:structure> …
  • 45. Case study: Environmental Health News Aggregating news stories from the Web Semi-automated metadata creation by a team of subject matter experts and editors Semantic search to design custom feeds
  • 46. Case study: Gateway to Educational Materials Aggregating learning objects from members of the GEM Consortium Embedding semantic search into a portal
  • 47. Case study: NASA JPL Project information aggregated from content and data repositories Using and extending taxonomies Exploiting document type/genre
  • 48. Related work in RDF OCLC Metadata Switch MIT Simile Longwell Haystack Aduna Sesame Ontoprise OntoSeek Nature Publishing Group Urchin
  • 49. Issues Scale: must be commensurate with expectations and requirements from traditional web and enterprise search Number of objects, feeds: 10 6 to 10 9 Ingest rates: ~ 10 3 – 10 4 triples/sec, how many per resource? Tagging: where and when? Latency: < 0.5 sec user time regardless of application Retrieval algorithms: many alternatives still being explored Federated services vs. centralized servers Relationship to relevance ranking Support for aggregate and text search operators in RDF query Usability: lots of work to be done to validate benefits Navigation Precision and recall Visualization Security, trust and provenance: just beginning to understand
  • 50. Lessons Balanced incremental approach Leverage metadata and indices at hand Exploit statistics where desirable But layer a framework on top to structure the statistics Significant mileage from very simple frameworks
  • 51. Lessons: ontologies Don’t do: assume you have to build elaborate OWL ontologies Don’t have to boil the ocean to get the benefits OWL DL, are OWL Full are overkill for this class of application Do: Tiny Ontologies Stitched Together RDF Schema with a smattering of RDF/OWL properties (e.g., owl:inverse) Start with DC + SKOS + FOAF
  • 52. Lessons: controlled vocabularies Don’t do: huge monolithic taxonomies Unless they are ready at hand and can be reused largely without modification Do: bite-sized controlled vocabularies that exploit faceted approaches 4 facets x 10 terms per facet versus 10 4 terms in a single taxonomy Start with flat term lists Add BT/NT/RT relationships over time
  • 53. Lessons: instances Manual creation Don’t do: exhaustive author creation of metadata Do: community annotation and tagging (Semi-)automated creation Don’t do: assume elaborate information extraction based on NLP, subject tagging and categorization Do: quick and dirty NEE or better yet, stick to readily available asset and relational metadata (date, creator, document type/genre) Much of the benefit at a fraction of the effort
  • 54. Application profiles Metadata is increasingly pervasive The way to leverage existing information infrastructure Exploit “on-demand” information integration feature of RDF DB + XML -> XLST - > RDF(S)
  • 55. The big question: statistics vs. knowledge Statistics can’t deliver everything Alan Kay’s puppy analogy Vitanyi work on “Google learning” On the other hand, knowledge is dearly won CYC Need a balance that enables adoption without losing the benefits Lessons from Statistics vs. knowledge in NLP Expert systems
  • 56. Future directions User tagging + RDF: the killer SW application? The rehabilitation of metadata in the social software community The re-emergence of RSS/RDF “ Folksonomy”-driven collaborative search Del.icio.us, Flickr, CiteULike Growth of the SW compared to historical growth of the Web: it’s 1994 all over again
  • 57. Summary Semantic search has a role in today’s enterprises RDF provides a framework that can ease adoption and encourage innovation in semantic search The future for enterprise and consumer use looks bright
  • 58.