Lessons and requirements from a decade of deployed Semantic Web apps

Digital Enterprise Research Institute www.deri.ie

Lessons and requirements from
a decade of deployed
Semantic Web apps
Benjamin Heitmann, Richard Cyganiak,
Conor Hayes, Stefan Decker
Funded by Science Foundation Ireland under
Grant No. SFI/08/CE/I1380 (Líon-2)
© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Enabling Networked Knowledge

Input for this workshop

 LEDP workshop CfP calls for:
 requirements
 patterns
 gaps in Linked Data

standards + guidelines

 Where should this input
come from ?

Benjamin Heitmann, slide: 2/17

The Semantic Web:
a decade is a long time

2001 2011

Choice of methodology?

 Goal:
 patterns, requirements and gaps
regarding LD
 Data:
 10 years of Semantic Web research

 Which scientific approach fits ?
 Empirical software engineering

 Full IEEE transactions journal paper:
http://tinyurl.com/semweblessons


Overview

Empirical
survey

Architecture: LD standards: Software Eng. Process:
arch. pattern gaps shortcomings

Software engineering
solutions


Empirical survey

 Sources: 124 apps total
 Semantic Web Challenge
(ISWC): 2003-2009,
101 apps
 Scripting for SemWeb
Challenge (ESWC), 2006-2009,
23 apps
 includes industry & research
apps
 Checklist (12 questions)
 Data collection:
1. own analysis of paper
2. validation by email


Empirical survey results

 widespread support for SemWeb specific
features
 clear difference to database-driven apps
 big uptake of Linked Data principles and
eco-system
 integration requires human intervention
 top 3 standards: RDF, OWL, SPARQL
 top 3 vocabularies: FOAF, DC, SIOC


Conceptual architecture

 Conceptual architecture:
 describes major design elements of
a system (+ relations)
 domain specific

(e.g. the Semantic Web)
 provides architectural pattern
 documents community consensus


Components of conceptual
architecture

starting
point: decouple +
specialise
RDF data Graph access RDF store Graph query
language service
handling layer (100%) (88%)
(77%)

Data Data homogenisation Data discovery
integration service (74%) service (30%)

User Graph-based Structured data
navigation interface authoring interface
interface (91%) (29%)


LD gaps:
publishing/consuming

 all applications consume RDF
 73% import API, 69% export API
 but: incompatible
implementations
 LD principles in 2006 led to
consolidation

 embedding RDF:
 web for humans vs. web for machines
 2008: introduction of RDFa


LD gaps: beyond open data

 writing/changing/updating RDF data
is difficult
 71% of apps do not support data
changes

 Writing to remote RDF store:
 draft status in 2011: SPARQL Update
 Restricting access (read/write):
 no standards
 no interoperability
 closest ideas (?): R/W design note, WebID


Software Eng. process
shortcomings (1)

 Integrating noisy RDF data:
 60% semi-automatic integration
 this involves human intervention
 only 20% use automatic heuristics
 major part of Semantic Web specific code

 Distribution of application logic:
 multiple components and standards
 queries(41%), rules(52%) or formal
vocabularies
 hard to maintain


Software Eng. process
shortcomings (2)

graph-based

 Mismatch of data models
between components
 graph versus relational or
object oriented (90%)
 overhead in communication
 inconsistent round-trip
conversion
 3 way ORM needed ?

object
relational oriented


Software Eng. solutions (1)

 More guidelines, best
practices and design
patterns:
 current examples:
– Linked Data principles and
publishing guidelines
– guidelines for naming of URIs
– Linked Data patterns collection
 result: more interoperability,
more coherent Web of Data



 More software libraries
(beyond RDF storage!)
 guidelines can be hardcoded in
reusable libraries
 good libraries can make
complicated guidelines easy to
use (See HTTP, SSL, SMTP and
DNS lookups)
 current examples:
– any23, d2r server, Semantic
Web Client Library



 More software factories:
 create complete applications
 requires patterns + libraries
 or: “opinionated software”

 components can be
customised for domain
 Interface, homogenisation
and data discovery usually
made from scratch

https://developers.facebook.com/docs/beta/opengraph/tutorial/


Summary

Empirical
survey

Architecture: LD standards: Software Eng. Process:
arch. pattern gaps shortcomings

Full article:
Software engineering http://tinyurl.com/
solutions semweblessons


Appendix: threats to validity

 Representativeness:
 only complete applications part of challenges (not tools or
libraries)
 apps needed to use real-world data
 submission of paper describing the app was required
 challenge extends of multiple years, allows trends to be seen
 Number of authors who verified checklist (65%):
 academic email addresses expire quickly
 we manually tried to find new email addresses
 no source code was used:
 source code was not required for challenges due to e.g. IP
issues


Table: Impl. details

2003 2004 2005 2006 2007 2008 2009 overall

Programming Java 10% Java 46% Java 48%
Java 60% Java 56% Java 50% Java 43%
languages Java 66% JS 15% JS 23% PHP 19%
C 20% JS 12% PHP 25% PHP 21%
PHP 26% PHP 23% JS 13%
Jena 18% RAP 15% Sesame 17%
RDF libraries — — Sesame 33% Sesame 19%
Sesame 12% RDFLib ARC 17% Sesame 23%
Jena 8% Jena 9%
Lucene 18% 10% Jena 13%
RDF 89% RDF 100% RDF 100% RDF 100% RDF 96%
RDF 87% RDF 66%
SemWeb standards RDF 100% OWL 42% SPARQL SPARQL SPARQL OWL 43%
RDFS 37% OWL 66%
OWL 30% SPARQL 50% 17% 69% SPARQL
OWL 37% RDFS 50%
15% OWL 41% OWL 10% OWL 46% 41%

Schemas/ FOAF 30%
RSS 20% FOAF 26% FOAF 41% FOAF 34% FOAF 27%
vocabularies/ DC 12% — DC 21%
FOAF 20% RSS 15% DC 20% DC 15% DC 13%
ontologies SWRC 12% DBpedia
DC 20% Bibtex 10% SIOC 20% SKOS 15% SIOC 7%
13%


Tables: Data integration and
other properties

2003 2004 2005 2006 2007 2008 2009
manual 30% 13% 0% 16% 9% 5% 4%
semi- 70% 31% 100% 47% 58% 65% 61%
automatic
automatic 0% 25% 0% 11% 13% 4% 19%
not needed 0% 31% 0% 26% 20% 26% 16%

2003 2004 2005 2006 2007 2008 2009
Data creation 20% 37% 50% 52% 37% 52% 76%
Data import 70% 50% 83% 52% 70% 86% 73%
Data export 70% 56% 83% 68% 79% 86% 73%
Inferencing 60% 68% 83% 57% 79% 52% 42%
Decentralised 90% 75% 100% 57% 41% 95% 96%
sources
Multiple 90% 93% 100% 89% 83% 91% 88%
owners
Heterogeneous 90% 87% 100% 89% 87% 78% 88%
formats
Data updates 90% 75% 83% 78% 45% 73% 50%
Linked Data
0% 0% 0% 5% 25% 26% 65%
principles


Table: architectural analysis

authoring interface
graph-based navi-

language service
data homogeni-
gation interface

structured data

data discovery
sation service
graph access

graph query
applications
number of

service
layer

RDF
store
year
2003 10 100% 80% 90% 90% 80% 20% 50%
2004 16 100% 94% 100% 50% 88% 38% 25%
2005 6 100% 100% 100% 83% 83% 33% 33%
2006 19 100% 95% 89% 63% 68% 37% 16%
2007 24 100% 92% 96% 88% 88% 33% 54%
2008 23 100% 87% 83% 70% 78% 26% 30%
2009 26 100% 77% 88% 80% 65% 19% 15%
total 124 100% 88% 91% 74% 77% 29% 30%


Lessons and requirements from a decade of deployed Semantic Web apps

More Related Content

What's hot (20)

Similar to Lessons and requirements from a decade of deployed Semantic Web apps (20)

Recently uploaded (20)

Lessons and requirements from a decade of deployed Semantic Web apps