SlideShare a Scribd company logo
Semantic Technologies & Triplestores
              for BI
   1st European Business Intelligence Summer School
                      eBISS 2011

              Marin Dimitrov (Ontotext)

                      Jul 2011
eBISS 2011




Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #2
Contents

• Introduction to Semantic Technologies
• Semantic Databases – advantages, features and
  benchmarks
• Semantic Technologies and Triplestores for BI




             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #3
INTRODUCTION TO SEMANTIC
TECHNOLOGIES



      Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #4
The need for a smarter Web

• "The Semantic Web is an extension of the current
  web in which information is given well-defined
  meaning, better enabling computers and people to
  work in cooperation.“ (Tim Berners-Lee, 2001)




             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #5
The need for a smarter Web (2)

•   “PricewaterhouseCoopers believes a Web of data
    will develop that fully augments the document Web
    of today. You’ll be able to find and take pieces of
    data sets from different places, aggregate them
    without warehousing, and analyze them in a more
    straightforward, powerful way than you can now.”
    (PWC, May 2009)




             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #6
The Semantic Web vision (W3C)

• Extend principles of the Web from documents to
  data
• Data should be accessed using the general Web
  architecture (e.g., URI-s, protocols, …)
• Data should be related to one another just as
  documents are already
• Creation of a common framework that allows:
  – Data to be shared and reused across applications
  – Data to be processed automatically
  – New relationships between pieces of data to be inferred

              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #7
The Semantic Web stack




                                                           (c) Benjamin Nowack

Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011              #8
The Semantic Web timeline
                 RDF                                                                                      RDF 2
        DAML+OIL                  OWL                                                 OWL 2
                                                      SPARQL                                      SPARQL 1.1
                                                                        RIF
                                                                  RDFa
                                                           SAWSDL
                                                                                       LOD
                                                                                    SKOS
                                                                                              HCLS
                                                                                                  RDB2RDF
                                                                                                          GLD

                                                                                                          PIL

1999   2000   2001     2002   2003      2004     2005      2006      2007     2008         2009    2010   2011


                         Semantic Technologies & Triplestores for BI (eBISS 2011)             Jul 2011      #9
Ontologies as data models on the Semantic Web

• An ontology is a formal specification that provides
  sharable and reusable knowledge representation
   – Examples – taxonomies, thesauri, topic maps, formal
     ontologies
• An ontology specification includes
   – Description of the concepts in some domain and their
     properties
   – Description of the possible relationships between concepts
     and the constraints on how the relationships can be used
   – Sometimes, the individuals (members of concepts)


              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #10
Resource Description Framework (RDF)

• A simple data model for
   – Formally describing the semantics of information
   – representing meta-data (data about data)
• A set of representation syntaxes
   – RDF/XML (standard), N-Triples, N3
• Building blocks
   – Resources (with unique identifiers)
   – Literals
   – Named relations between pairs of resources (or a resource
     and a literal)

              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #11
RDF (2)

• Everything is a triple
   – Subject (resource), Predicate (relation), Object (resource
     or literal)
• The RDF graph is a collection of triples
                                predicate
            subject                                     object

       École Centrale            locatedIn
                                                          Paris
            Paris

                             hasPopulation
              Paris                                  2193031




               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #12
RDF graph example (3)
                                                            hasName             “École Centrale Paris”

            dbpedia:École_centrale_Paris

                      locatedIn                               establishedIn
                                                                                           1829

                              dbpedia:Paris

                                                      hasPopulation
                              hasName

                   “Paris”                               2193031


                    Subject                           Predicate                            Object

       http://dbpedia.org/resource/Paris              hasName                              “Paris”
       http://dbpedia.org/resource/Paris            hasPopulation                         2193031
http://dbpedia.org/resource/École_centrale_Paris      locatedIn              http://dbpedia.org/resource/Paris
http://dbpedia.org/resource/École_centrale_Paris      hasName                       “École Centrale Paris”
http://dbpedia.org/resource/École_centrale_Paris    establishedIn                           1829


                               Semantic Technologies & Triplestores for BI (eBISS 2011)              Jul 2011    #13
RDF advantages

• Global identifiers of all resources (URIs)
   – Reduces ambiguity
   – Makes incremental data integration easier
• Graph data model
   – Suitable for sparse, unstructured and semi-structured data
• Inference of implicit facts
• Schema agility
   – Lowers the cost of schema evolution




              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #14
RDF Schema (RDFS)

• RDFS provides means for:
   – Defining Classes and Properties
   – Defining hierarchies (of classes and properties)
   – Domain/range of a property
• Entailment rules (axioms)
   – Infer new triples from existing ones




               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #15
RDFS entailment rules




Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #16
RDF entailment rules (2)

• Class/Property hierarchies
   – R5, R7, R9, R11
:human rdfs:subClassOf :mammal .                 :John a :man .
:man rdfs:subClassOf :human .                     :John a :human .
 :man rdfs:subClassOf :mammal .                  :John a :mammal .
:hasSpouse rdfs:subPropertyOf :hasRelative .
:John :hasSpouse :Merry .
 :John :hasRelative :Merry .

• Inferring types (domain/range restrictions)
   – R2, R3
 :hasSpouse rdfs:domain :human ;
            rdfs:range :human .
 :Adam :hasSpouse :Eve .
 :Adam a :human .
 :Eve a :human .

               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #17
Web Ontology Language (OWL)

• More expressive than RDFS
   – Identity equivalence/difference
      • sameAs, differentFrom, equivalentClass/Property

• Complex class expressions
   – Class intersection, union, complement, disjointness
• More expressive property definitions
   – Object/Datatype properties
   – Cardinality restrictions
   – Transitive, functional, symmetric, inverse properties



               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #18
OWL (2)
                                  db1:Paris :hasPopulation 2913031.
• Identity equivalence            db1:Paris = db2:Paris .
                                   db2:Paris :hasPopulation 2193031 .

                                     :locatedIn a owl:TransitiveProperty .
                                     :ECP :locatedIn :Paris .
• Transitive properties              :Paris :locatedIn :France .
                                      :ECP :locatedIn :France .

                                     :hasSpouse a owl:SymmetricProperty .
                                     :John :hasSpouse :Merry .
• Symmetric properties                :Merry :hasSpouse :John .

                                     :hasParent owl:inverseOf :hasChild .
• Inverse properties                 :John :hasChild :Jane .
                                      :Jane :hasParent :John .

                                    :hasSpouse a owl:FunctionalPropety .
                                    :Merry :hasSpouse :John .
• Functional properties             :Merry :hasSpouse :JohnSmith .
                                     :JohnSmith = :John .


             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #19
OWL sublanguages

• OWL Lite
  –   low expressivity / low formal complexity
  –   Logical decidability & completeness
  –   All RDFS features
  –   sameAs/differentFrom, equivalent class/property
  –   Inverse / symmetric / transitive / functional properties
  –   cardinality restriction (only 0 or 1)
  –   class intersection




                Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #20
OWL sublanguages (2)

• OWL DL
  –   high expressivity / efficient DL reasoning
  –   Logical decidability & completeness
  –   All OWL Lite features
  –   Class disjointness
  –   Complex class expressions
  –   Class union & complement
• OWL Full
  – max expressivity / no efficient reasoning
  – No guarantees for completeness & decidability


                Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #21
OWL 2 profiles

• Goals
  – sublanguages that trade expressiveness for efficiency of
    reasoning
  – Cover specific important application areas
  – Easier to understand by non-experts
• OWL 2 EL
  – Best for large ontologies / small instance data (TBox
    reasoning)
  – Computationally optimal
     • PTime reasoning complexity



              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #22
OWL 2 profiles (2)

• OWL 2 QL
  – Quite limited expressive power, but very efficient for
    query answering with large instance data
  – Can exploit query rewriting techniques
     • Data storage & query evaluation can be delegated to a RDBMS

• OWL 2 RL
  – Balance between scalable reasoning and expressive power
  – Suitable for rule-based reasoning




              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #23
OWL 2 profiles (3)




                                                              (c) Axel Polleres



Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011               #24
SPARQL Protocol and RDF Query Language (SPARQL)

• SQL-like query language for RDF data
• Simple protocol for querying remote databases over
  HTTP
• Query types
   –   select – query data by complex graph patterns
   –   ask – whether a query returns results (result is true/false)
   –   describe – returns all triples about a particular resource
   –   construct – create new triples based on query results




                 Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #25
Graph patterns

    • Whitespace separated list of Subject, Predicate,
      Object
      – ?x dbp-ont:city dbpedia:Paris
      – dbpedia:École_centrale_Paris db-ont:city ?y
    • Group Graph Pattern
      – A group of 1+ graph patterns
      – FILTERs can constrain the whole group
{
     ?uni a dbpedia:University ;
          dbp-ont:city dbpedia:Paris ;
          dbp-ont:numberOfStudents ?students .
     FILTER (?students > 5000)
}                Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #26
Graph Patterns (2)

 • Optional Graph Pattern
   – Optional parts of a pattern
   – pattern OPTIONAL {pattern}

SELECT ?uni ?students
WHERE {
  ?uni a dbpedia:University ;
       dbp-ont:city dbpedia:Paris .
  OPTIONAL {
     ?uni dbp-ont:numberOfStudents ?students
  }
}


              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #27
Graph Patterns (3)

 • Alternative Graph Pattern
    – Combine results of several alternative patterns
    – {pattern} UNION {pattern}
SELECT ?uni
WHERE {
  ?uni a dbpedia:University .
  {
     { ?uni dbp-ont:city dbpedia:Paris }
     UNION
     { ?uni dbp-ont:city dbpedia:Lyon }
  }
}

                Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #28
Anatomy of a SPARQL query

• List of namespace prefixes
   – PREFIX xyz: <URI>
• Query result clause (variables)
   – ?x, $y
• Datasets
• Graph patterns + filters
   – Simple / group / alternative / optional
• Modifiers
   – ORDER BY, DISTINCT, OFFSET/LIMIT


               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #29
Linked Data

• “To make the Semantic Web a reality, it is necessary to have a
  large volume of data available on the Web in a standard,
  reachable and manageable format. In addition the
  relationships among data also need to be made available. This
  collection of interrelated data on the Web can also be referred
  to as Linked Data. Linked Data lies at the heart of the
  Semantic Web: large scale integration of, and reasoning on,
  data on the Web.” (W3C)
• Linked Data is a set of principles that allows publishing,
  querying and consumption of RDF data, distributed across
  different servers
   • similar to the way HTML is currently published & consumed


                Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #30
Linked Data design principles

1.       Unambiguous identifiers for objects (resources)
     –     Use URIs as names for things
2.       Use the structure of the web
     –     Use HTTP URIs so that people can look up the names
3.       Make is easy to discover information about an
         object (resource)
     –     When someone lookups a URI, provide useful
           information
4.       Link the object (resource) to related objects
     –     Include links to other URIs

                  Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #31
Linked Data evolution – Oct 2007




                                                             (c) R. Cyganiak & A. Jentzsch




  Semantic Technologies & Triplestores for BI (eBISS 2011)            Jul 2011               #32
Linked Data evolution – Sep 2008




                                                             (c) R. Cyganiak & A. Jentzsch




  Semantic Technologies & Triplestores for BI (eBISS 2011)       Jul 2011               #33
Linked Data evolution – Jul 2010




                                                             (c) R. Cyganiak & A. Jentzsch


  Semantic Technologies & Triplestores for BI (eBISS 2011)       Jul 2011               #34
Linked Data evolution – Sep 2010




                                                           (c) R. Cyganiak & A. Jentzsch
Introduction to Semantic Technologies, Ontologies and the Semantic Web Aug 2010            #35
SEMANTIC DATABASES
(TRIPLESTORES)



      Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #36
Triplestores

• RDF databases
   – Store data according to the RDF data model
   – Provide inference of implicit triples (either at data loading
     time, or at query time)
   – SPARQL as a query language
• Many similarities to traditional DBMS approaches
   – … and many differences too




               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #37
Triplestores vs. traditional DBMS

                                            Triplestore       OLTP        OLAP      NoSQL            Graph

       Update performance                       +/-             +          +/-       ++               +
         Complex Queries                          +             +          ++         -               +
             inference                            +             -          +/-        -                -
           Sparse data                            +             -          +/-       +                +
Semi-structured / unstructured data               +             -            -       +/-              +
         Dynamic schema                           +             -            -       +/-              +/-




                         Semantic Technologies & Triplestores for BI (eBISS 2011)         Jul 2011          #38
Triplestore advantages

• Global identifiers of resources (entities)
   – Lowers the cost of data integration
• Inference of implicit facts
• Graph data model
   – Suitable for sparse, semi-structured and unstructured data
• Agile schema
   – New relations between entities may be easily added
• Exploratory queries against unknown schema
   – Query and data vocabulary may differ
• Compliance to standards (RDF, SPARQL)
              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #39
Design & Implementation

• Storage engine
   – Native
   – on top of an RDBMS
   – on top of a NoSQL engine
• Triple density
   – The in-memory “footprint” per triple may differ x10
     between different triplestores
   – Impact on TCO
• Compression
   – Improves I/O on multi-core systems

              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #40
Design & Implementation (2)

• Reasoning strategy
  – Forward-chaining – at data loading time, start from the
    explicit facts and apply the inference rules until the
    complete closure is inferred
  – Backward-chaining – at runtime, start with a query and
    decompose recursively into smaller requests that can be
    matched to explicit facts
  – Hybrid strategy – partial materialization at data loading
    time + partial query decomposition at runtime




              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #41
Pros and cons of forward-chaining based
                   materialization

• Relatively slow addition of new facts
   – inferred closure is extended after each transaction
• Deletion of facts is slow
   – facts that are no longer true are removed from the
     inferred closure
• The maintenance of the inferred closure requires
  more resources
• Querying and retrieval are fast
   – no reasoning is required at query time
   – RDBMS-like query evaluation & optimisation techniques
     are applicable
               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #42
Design & Implementation (3)

• Invalidation strategy
   – Truth maintenance is not trivial
   – huge overhead for keeping meta-data about inference
     dependencies
   – Trivial approach: just re-compute the complete inferred
     closure after a deletion
   – Advanced approach: detect which parts of the inferred
     closure are affected and need to be invalidated as well




              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #43
Design & Implementation (4)

• owl:sameAs optimization
  – It is a transitive, reflexive and symmetric relationship
  – owl:sameAs induced inference can “inflate” the number of
    statements and deteriorate inference/query performance
  – Specific optimizations allow that a compact representation
    of equivalent resources is used
  – Query results can be expanded through backward-chaining
    at query time




             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #44
Popular triplestores

• 4store
  – http://4store.org
  – Open source, distributed cluster (up to 32 nodes), data
    fully partitioned, no inference (external reasoner,
    backward chaining)
• AllegroGraph
  – http://www.franz.com
  – ACID transactions, RDF and limited OWL reasoning, full-
    text indexing, compression, replication cluster, backward
    chaining



              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #45
Popular triplestores (2)

• Bigdata
  – http://www.systap.com
  – Open source, data partitioning, hybrid materialization, RDF
    and limited OWL reasoning
• Dydra
  – http://dydra.com
  – SaaS, SPARQL endpoint + REST API, no reasoning
• Jena TDB
  – http://www.openjena.org/TDB
  – Open source, RDF and limited OWL reasoning

              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #46
Popular triplestores (3)

• Oracle
  – RDF and limited OWL reasoning, data partitioning &
    compression (RAC), owl:sameAs optimization, security &
    versioning; geo-spatial extensions
• OWLIM
  – http://www.ontotext.com
  – Forward-chaining, RDF / OWL 2 RL / OWL 2 QL and limited
    OWL Lite / OWL DL reasoning; replication cluster;
    owl:sameAs optimization; full-text indexing; geo-spatial
    extensions; scalable RDF Rank



             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #47
Popular triplestores (4)

• Sesame
  – http://www.openrdf.org
  – Open source; plugable storage & inference layer
• StarDog
  – http://stardog.com
  – backward-chaining; OWL DL and all OWL 2 profiles; full-
    text indexing; compression
• Virtuoso
  – http://virtuoso.openlinksw.com
  – Universal server (RDF, XML, RDBMS); backward chaining;
    geo-spatial extensions; full-text indexing; compression
             Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #48
Benchmarking

• Tasks
  – Data loading
  – Query evaluation
  – Data modification
• Performance factors
  –   Forward-chaining vs. backward-chaining
  –   Data model complexity
  –   Query complexity
  –   Result set size
  –   Number of concurrent clients


               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #49
Popular benchmarks for triplestores

• LUBM
  – Storing & query performance benchmark
  – 14 predefined queries + data generator tool
• BSBM
  – SPARQL benchmark
  – 3 use cases (12/17/8 distinct queries)
• SP2Bench
  – SPARQL benchmark
  – 12 queries for most common SPARQL constructs & access
    patterns

              Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #50
SEMANTIC TECHNOLOGIES &
TRIPLESTORES FOR BI



      Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #51
Data integration & querying (HCLS)




distributed querying at present




                                                                        distributed querying with RDF and SPARQL


                                                                                              (c) HCLS @ W3C



                      Semantic Technologies & Triplestores for BI (eBISS 2011)                Jul 2011             #52
Data integration & querying (HCLS)




                                                              (c) HCLS @ W3C




   Semantic Technologies & Triplestores for BI (eBISS 2011)      Jul 2011      #53
Data integration cost (PwC)




                                                           (c) PriceWaterhouseCooper

Semantic Technologies & Triplestores for BI (eBISS 2011)           Jul 2011            #54
Semantic Technologies & Triplestores for BI

• Speed-up data integration
   – RDF based ETL is more agile
• Lower the cost of data integration
   – Initial cost of using ontologies is higher
   – But the cost of ad-hoc ETL will be higher in the long term
• Align & integrate legacy data silos
   – Querying & consuming data from disparate sources is
     easier with SPARQL & RDF




               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #55
Semantic Technologies & Triplestores for BI (2)

• Infer implicit & hidden knowledge
   – Custom, user-defined rules as well
• Efficiently manage unstructured & semi-structured
  data together
   – graph data model
• Improve the quality of query results
   – Inference of implicit facts
   – SPARQL query vocabulary may differ from data vocabulary
   – Exploratory queries



               Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #56
Q&A




      Questions?
                         @ontotext



Semantic Technologies & Triplestores for BI (eBISS 2011)   Jul 2011   #57

More Related Content

PDF
Linked Data, Ontologies and Inference
PDF
RDF Database-as-a-Service with S4
PPTX
Scaling up Linked Data
PPTX
Microtask Crowdsourcing Applications for Linked Data
PPTX
Querying Linked Data
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
PDF
Maximising (Re)Usability of Library metadata using Linked Data
Linked Data, Ontologies and Inference
RDF Database-as-a-Service with S4
Scaling up Linked Data
Microtask Crowdsourcing Applications for Linked Data
Querying Linked Data
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Maximising (Re)Usability of Library metadata using Linked Data

What's hot (20)

PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
PPTX
EDF2013: Data Science Curriculum: Barry Norton: Big Linked Data
PDF
Wed roman tut_open_datapub
PDF
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
PDF
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
PDF
Property graph vs. RDF Triplestore comparison in 2020
PPTX
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
ODP
Grails goes Graph
PDF
Meetup Oracle Database BCN: 2.1 Data Management Trends
PPTX
Publishing Linked Data 3/5 Semtech2011
PDF
RDFa: introduction, comparison with microdata and microformats and how to use it
PDF
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
ODP
Data Integration And Visualization
PDF
Getting Started with Knowledge Graphs
PDF
Introduction to Property Graph Features (AskTOM Office Hours part 1)
PDF
Ephedra: efficiently combining RDF data and services using SPARQL federation
PPTX
Family tree of data – provenance and neo4j
PDF
ESWC 2017 Tutorial Knowledge Graphs
PDF
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
PDF
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
EDF2013: Data Science Curriculum: Barry Norton: Big Linked Data
Wed roman tut_open_datapub
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
Property graph vs. RDF Triplestore comparison in 2020
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
Grails goes Graph
Meetup Oracle Database BCN: 2.1 Data Management Trends
Publishing Linked Data 3/5 Semtech2011
RDFa: introduction, comparison with microdata and microformats and how to use it
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Data Integration And Visualization
Getting Started with Knowledge Graphs
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Ephedra: efficiently combining RDF data and services using SPARQL federation
Family tree of data – provenance and neo4j
ESWC 2017 Tutorial Knowledge Graphs
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
Ad

Viewers also liked (10)

PDF
The Nature of Information
PPTX
Triplestore and SPARQL
PPTX
Java and SPARQL
PDF
Jarrar: OWL (Web Ontology Language)
PPT
SPARQL Tutorial
PDF
Introduction to RDF & SPARQL
PDF
An Introduction to SPARQL
PDF
RDF, SPARQL and Semantic Repositories
PPTX
SPARQL Cheat Sheet
PDF
Einführung in RDF & SPARQL
The Nature of Information
Triplestore and SPARQL
Java and SPARQL
Jarrar: OWL (Web Ontology Language)
SPARQL Tutorial
Introduction to RDF & SPARQL
An Introduction to SPARQL
RDF, SPARQL and Semantic Repositories
SPARQL Cheat Sheet
Einführung in RDF & SPARQL
Ad

Similar to Semantic Technologies and Triplestores for Business Intelligence (20)

PPTX
Semantic Web and Related Work at W3C
PDF
Semantic Technologies for Big Data
PDF
Ivan Herman - Semantic Web Activities @ W3C
PPTX
Semantic web
PPTX
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
PDF
SPARQL and Linked Data
PDF
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
PDF
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
KEY
Linked data: spreading data over the web
PPTX
Linked Data: opportunities and challenges
PDF
RDF Seminar Presentation
PPTX
Linked Data as an enabling framework for resource discovery across libraries,...
PDF
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
PDF
20110728 datalift-rpi-troy
PDF
Metadata is back!
PDF
Webinar: Semantic web for developers
PPTX
Data Integration at the Ontology Engineering Group
PDF
DIR workshop ontology stream data access
PPTX
CSC 8101 Non Relational Databases
PDF
Linked Open data: CNR
Semantic Web and Related Work at W3C
Semantic Technologies for Big Data
Ivan Herman - Semantic Web Activities @ W3C
Semantic web
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL and Linked Data
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Linked data: spreading data over the web
Linked Data: opportunities and challenges
RDF Seminar Presentation
Linked Data as an enabling framework for resource discovery across libraries,...
Producing, Publishing and Consuming Linked Data Three lessons from the Bio2RD...
20110728 datalift-rpi-troy
Metadata is back!
Webinar: Semantic web for developers
Data Integration at the Ontology Engineering Group
DIR workshop ontology stream data access
CSC 8101 Non Relational Databases
Linked Open data: CNR

More from Marin Dimitrov (20)

PPTX
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
PDF
Mapping Your Career Journey
PDF
Open Source @ Uber
PDF
Trust - the Key Success Factor for Teams & Organisations
PDF
Uber @ Telerik Academy 2018
PDF
Machine Learning @ Uber
PDF
Career Advice for My Younger Self
PDF
Scaling Your Engineering Organization with Distributed Sites
PDF
Building, Scaling and Leading High-Performance Teams
PDF
Uber @ Career Days 2017 (Sofia University)
PDF
GraphDB Connectors – Powering Complex SPARQL Queries
PDF
DataGraft Platform: RDF Database-as-a-Service
PDF
On-Demand RDF Graph Databases in the Cloud
PDF
Low-cost Open Data As-a-Service
PDF
Text Analytics & Linked Data Management As-a-Service
PPTX
Scaling up Linked Data
PDF
Enabling Low-cost Open Data Publishing and Reuse
PDF
S4: The Self-Service Semantic Suite
PDF
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
PDF
Crossing the Chasm with Semantic Technology
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Mapping Your Career Journey
Open Source @ Uber
Trust - the Key Success Factor for Teams & Organisations
Uber @ Telerik Academy 2018
Machine Learning @ Uber
Career Advice for My Younger Self
Scaling Your Engineering Organization with Distributed Sites
Building, Scaling and Leading High-Performance Teams
Uber @ Career Days 2017 (Sofia University)
GraphDB Connectors – Powering Complex SPARQL Queries
DataGraft Platform: RDF Database-as-a-Service
On-Demand RDF Graph Databases in the Cloud
Low-cost Open Data As-a-Service
Text Analytics & Linked Data Management As-a-Service
Scaling up Linked Data
Enabling Low-cost Open Data Publishing and Reuse
S4: The Self-Service Semantic Suite
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Crossing the Chasm with Semantic Technology

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Programs and apps: productivity, graphics, security and other tools
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)

Semantic Technologies and Triplestores for Business Intelligence

  • 1. Semantic Technologies & Triplestores for BI 1st European Business Intelligence Summer School eBISS 2011 Marin Dimitrov (Ontotext) Jul 2011
  • 2. eBISS 2011 Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #2
  • 3. Contents • Introduction to Semantic Technologies • Semantic Databases – advantages, features and benchmarks • Semantic Technologies and Triplestores for BI Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #3
  • 4. INTRODUCTION TO SEMANTIC TECHNOLOGIES Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #4
  • 5. The need for a smarter Web • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ (Tim Berners-Lee, 2001) Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #5
  • 6. The need for a smarter Web (2) • “PricewaterhouseCoopers believes a Web of data will develop that fully augments the document Web of today. You’ll be able to find and take pieces of data sets from different places, aggregate them without warehousing, and analyze them in a more straightforward, powerful way than you can now.” (PWC, May 2009) Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #6
  • 7. The Semantic Web vision (W3C) • Extend principles of the Web from documents to data • Data should be accessed using the general Web architecture (e.g., URI-s, protocols, …) • Data should be related to one another just as documents are already • Creation of a common framework that allows: – Data to be shared and reused across applications – Data to be processed automatically – New relationships between pieces of data to be inferred Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #7
  • 8. The Semantic Web stack (c) Benjamin Nowack Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #8
  • 9. The Semantic Web timeline RDF RDF 2 DAML+OIL OWL OWL 2 SPARQL SPARQL 1.1 RIF RDFa SAWSDL LOD SKOS HCLS RDB2RDF GLD PIL 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #9
  • 10. Ontologies as data models on the Semantic Web • An ontology is a formal specification that provides sharable and reusable knowledge representation – Examples – taxonomies, thesauri, topic maps, formal ontologies • An ontology specification includes – Description of the concepts in some domain and their properties – Description of the possible relationships between concepts and the constraints on how the relationships can be used – Sometimes, the individuals (members of concepts) Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #10
  • 11. Resource Description Framework (RDF) • A simple data model for – Formally describing the semantics of information – representing meta-data (data about data) • A set of representation syntaxes – RDF/XML (standard), N-Triples, N3 • Building blocks – Resources (with unique identifiers) – Literals – Named relations between pairs of resources (or a resource and a literal) Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #11
  • 12. RDF (2) • Everything is a triple – Subject (resource), Predicate (relation), Object (resource or literal) • The RDF graph is a collection of triples predicate subject object École Centrale locatedIn Paris Paris hasPopulation Paris 2193031 Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #12
  • 13. RDF graph example (3) hasName “École Centrale Paris” dbpedia:École_centrale_Paris locatedIn establishedIn 1829 dbpedia:Paris hasPopulation hasName “Paris” 2193031 Subject Predicate Object http://dbpedia.org/resource/Paris hasName “Paris” http://dbpedia.org/resource/Paris hasPopulation 2193031 http://dbpedia.org/resource/École_centrale_Paris locatedIn http://dbpedia.org/resource/Paris http://dbpedia.org/resource/École_centrale_Paris hasName “École Centrale Paris” http://dbpedia.org/resource/École_centrale_Paris establishedIn 1829 Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #13
  • 14. RDF advantages • Global identifiers of all resources (URIs) – Reduces ambiguity – Makes incremental data integration easier • Graph data model – Suitable for sparse, unstructured and semi-structured data • Inference of implicit facts • Schema agility – Lowers the cost of schema evolution Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #14
  • 15. RDF Schema (RDFS) • RDFS provides means for: – Defining Classes and Properties – Defining hierarchies (of classes and properties) – Domain/range of a property • Entailment rules (axioms) – Infer new triples from existing ones Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #15
  • 16. RDFS entailment rules Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #16
  • 17. RDF entailment rules (2) • Class/Property hierarchies – R5, R7, R9, R11 :human rdfs:subClassOf :mammal . :John a :man . :man rdfs:subClassOf :human .  :John a :human .  :man rdfs:subClassOf :mammal .  :John a :mammal . :hasSpouse rdfs:subPropertyOf :hasRelative . :John :hasSpouse :Merry .  :John :hasRelative :Merry . • Inferring types (domain/range restrictions) – R2, R3 :hasSpouse rdfs:domain :human ; rdfs:range :human . :Adam :hasSpouse :Eve . :Adam a :human . :Eve a :human . Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #17
  • 18. Web Ontology Language (OWL) • More expressive than RDFS – Identity equivalence/difference • sameAs, differentFrom, equivalentClass/Property • Complex class expressions – Class intersection, union, complement, disjointness • More expressive property definitions – Object/Datatype properties – Cardinality restrictions – Transitive, functional, symmetric, inverse properties Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #18
  • 19. OWL (2) db1:Paris :hasPopulation 2913031. • Identity equivalence db1:Paris = db2:Paris .  db2:Paris :hasPopulation 2193031 . :locatedIn a owl:TransitiveProperty . :ECP :locatedIn :Paris . • Transitive properties :Paris :locatedIn :France .  :ECP :locatedIn :France . :hasSpouse a owl:SymmetricProperty . :John :hasSpouse :Merry . • Symmetric properties  :Merry :hasSpouse :John . :hasParent owl:inverseOf :hasChild . • Inverse properties :John :hasChild :Jane .  :Jane :hasParent :John . :hasSpouse a owl:FunctionalPropety . :Merry :hasSpouse :John . • Functional properties :Merry :hasSpouse :JohnSmith .  :JohnSmith = :John . Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #19
  • 20. OWL sublanguages • OWL Lite – low expressivity / low formal complexity – Logical decidability & completeness – All RDFS features – sameAs/differentFrom, equivalent class/property – Inverse / symmetric / transitive / functional properties – cardinality restriction (only 0 or 1) – class intersection Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #20
  • 21. OWL sublanguages (2) • OWL DL – high expressivity / efficient DL reasoning – Logical decidability & completeness – All OWL Lite features – Class disjointness – Complex class expressions – Class union & complement • OWL Full – max expressivity / no efficient reasoning – No guarantees for completeness & decidability Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #21
  • 22. OWL 2 profiles • Goals – sublanguages that trade expressiveness for efficiency of reasoning – Cover specific important application areas – Easier to understand by non-experts • OWL 2 EL – Best for large ontologies / small instance data (TBox reasoning) – Computationally optimal • PTime reasoning complexity Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #22
  • 23. OWL 2 profiles (2) • OWL 2 QL – Quite limited expressive power, but very efficient for query answering with large instance data – Can exploit query rewriting techniques • Data storage & query evaluation can be delegated to a RDBMS • OWL 2 RL – Balance between scalable reasoning and expressive power – Suitable for rule-based reasoning Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #23
  • 24. OWL 2 profiles (3) (c) Axel Polleres Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #24
  • 25. SPARQL Protocol and RDF Query Language (SPARQL) • SQL-like query language for RDF data • Simple protocol for querying remote databases over HTTP • Query types – select – query data by complex graph patterns – ask – whether a query returns results (result is true/false) – describe – returns all triples about a particular resource – construct – create new triples based on query results Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #25
  • 26. Graph patterns • Whitespace separated list of Subject, Predicate, Object – ?x dbp-ont:city dbpedia:Paris – dbpedia:École_centrale_Paris db-ont:city ?y • Group Graph Pattern – A group of 1+ graph patterns – FILTERs can constrain the whole group { ?uni a dbpedia:University ; dbp-ont:city dbpedia:Paris ; dbp-ont:numberOfStudents ?students . FILTER (?students > 5000) } Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #26
  • 27. Graph Patterns (2) • Optional Graph Pattern – Optional parts of a pattern – pattern OPTIONAL {pattern} SELECT ?uni ?students WHERE { ?uni a dbpedia:University ; dbp-ont:city dbpedia:Paris . OPTIONAL { ?uni dbp-ont:numberOfStudents ?students } } Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #27
  • 28. Graph Patterns (3) • Alternative Graph Pattern – Combine results of several alternative patterns – {pattern} UNION {pattern} SELECT ?uni WHERE { ?uni a dbpedia:University . { { ?uni dbp-ont:city dbpedia:Paris } UNION { ?uni dbp-ont:city dbpedia:Lyon } } } Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #28
  • 29. Anatomy of a SPARQL query • List of namespace prefixes – PREFIX xyz: <URI> • Query result clause (variables) – ?x, $y • Datasets • Graph patterns + filters – Simple / group / alternative / optional • Modifiers – ORDER BY, DISTINCT, OFFSET/LIMIT Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #29
  • 30. Linked Data • “To make the Semantic Web a reality, it is necessary to have a large volume of data available on the Web in a standard, reachable and manageable format. In addition the relationships among data also need to be made available. This collection of interrelated data on the Web can also be referred to as Linked Data. Linked Data lies at the heart of the Semantic Web: large scale integration of, and reasoning on, data on the Web.” (W3C) • Linked Data is a set of principles that allows publishing, querying and consumption of RDF data, distributed across different servers • similar to the way HTML is currently published & consumed Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #30
  • 31. Linked Data design principles 1. Unambiguous identifiers for objects (resources) – Use URIs as names for things 2. Use the structure of the web – Use HTTP URIs so that people can look up the names 3. Make is easy to discover information about an object (resource) – When someone lookups a URI, provide useful information 4. Link the object (resource) to related objects – Include links to other URIs Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #31
  • 32. Linked Data evolution – Oct 2007 (c) R. Cyganiak & A. Jentzsch Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #32
  • 33. Linked Data evolution – Sep 2008 (c) R. Cyganiak & A. Jentzsch Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #33
  • 34. Linked Data evolution – Jul 2010 (c) R. Cyganiak & A. Jentzsch Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #34
  • 35. Linked Data evolution – Sep 2010 (c) R. Cyganiak & A. Jentzsch Introduction to Semantic Technologies, Ontologies and the Semantic Web Aug 2010 #35
  • 36. SEMANTIC DATABASES (TRIPLESTORES) Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #36
  • 37. Triplestores • RDF databases – Store data according to the RDF data model – Provide inference of implicit triples (either at data loading time, or at query time) – SPARQL as a query language • Many similarities to traditional DBMS approaches – … and many differences too Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #37
  • 38. Triplestores vs. traditional DBMS Triplestore OLTP OLAP NoSQL Graph Update performance +/- + +/- ++ + Complex Queries + + ++ - + inference + - +/- - - Sparse data + - +/- + + Semi-structured / unstructured data + - - +/- + Dynamic schema + - - +/- +/- Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #38
  • 39. Triplestore advantages • Global identifiers of resources (entities) – Lowers the cost of data integration • Inference of implicit facts • Graph data model – Suitable for sparse, semi-structured and unstructured data • Agile schema – New relations between entities may be easily added • Exploratory queries against unknown schema – Query and data vocabulary may differ • Compliance to standards (RDF, SPARQL) Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #39
  • 40. Design & Implementation • Storage engine – Native – on top of an RDBMS – on top of a NoSQL engine • Triple density – The in-memory “footprint” per triple may differ x10 between different triplestores – Impact on TCO • Compression – Improves I/O on multi-core systems Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #40
  • 41. Design & Implementation (2) • Reasoning strategy – Forward-chaining – at data loading time, start from the explicit facts and apply the inference rules until the complete closure is inferred – Backward-chaining – at runtime, start with a query and decompose recursively into smaller requests that can be matched to explicit facts – Hybrid strategy – partial materialization at data loading time + partial query decomposition at runtime Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #41
  • 42. Pros and cons of forward-chaining based materialization • Relatively slow addition of new facts – inferred closure is extended after each transaction • Deletion of facts is slow – facts that are no longer true are removed from the inferred closure • The maintenance of the inferred closure requires more resources • Querying and retrieval are fast – no reasoning is required at query time – RDBMS-like query evaluation & optimisation techniques are applicable Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #42
  • 43. Design & Implementation (3) • Invalidation strategy – Truth maintenance is not trivial – huge overhead for keeping meta-data about inference dependencies – Trivial approach: just re-compute the complete inferred closure after a deletion – Advanced approach: detect which parts of the inferred closure are affected and need to be invalidated as well Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #43
  • 44. Design & Implementation (4) • owl:sameAs optimization – It is a transitive, reflexive and symmetric relationship – owl:sameAs induced inference can “inflate” the number of statements and deteriorate inference/query performance – Specific optimizations allow that a compact representation of equivalent resources is used – Query results can be expanded through backward-chaining at query time Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #44
  • 45. Popular triplestores • 4store – http://4store.org – Open source, distributed cluster (up to 32 nodes), data fully partitioned, no inference (external reasoner, backward chaining) • AllegroGraph – http://www.franz.com – ACID transactions, RDF and limited OWL reasoning, full- text indexing, compression, replication cluster, backward chaining Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #45
  • 46. Popular triplestores (2) • Bigdata – http://www.systap.com – Open source, data partitioning, hybrid materialization, RDF and limited OWL reasoning • Dydra – http://dydra.com – SaaS, SPARQL endpoint + REST API, no reasoning • Jena TDB – http://www.openjena.org/TDB – Open source, RDF and limited OWL reasoning Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #46
  • 47. Popular triplestores (3) • Oracle – RDF and limited OWL reasoning, data partitioning & compression (RAC), owl:sameAs optimization, security & versioning; geo-spatial extensions • OWLIM – http://www.ontotext.com – Forward-chaining, RDF / OWL 2 RL / OWL 2 QL and limited OWL Lite / OWL DL reasoning; replication cluster; owl:sameAs optimization; full-text indexing; geo-spatial extensions; scalable RDF Rank Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #47
  • 48. Popular triplestores (4) • Sesame – http://www.openrdf.org – Open source; plugable storage & inference layer • StarDog – http://stardog.com – backward-chaining; OWL DL and all OWL 2 profiles; full- text indexing; compression • Virtuoso – http://virtuoso.openlinksw.com – Universal server (RDF, XML, RDBMS); backward chaining; geo-spatial extensions; full-text indexing; compression Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #48
  • 49. Benchmarking • Tasks – Data loading – Query evaluation – Data modification • Performance factors – Forward-chaining vs. backward-chaining – Data model complexity – Query complexity – Result set size – Number of concurrent clients Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #49
  • 50. Popular benchmarks for triplestores • LUBM – Storing & query performance benchmark – 14 predefined queries + data generator tool • BSBM – SPARQL benchmark – 3 use cases (12/17/8 distinct queries) • SP2Bench – SPARQL benchmark – 12 queries for most common SPARQL constructs & access patterns Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #50
  • 51. SEMANTIC TECHNOLOGIES & TRIPLESTORES FOR BI Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #51
  • 52. Data integration & querying (HCLS) distributed querying at present distributed querying with RDF and SPARQL (c) HCLS @ W3C Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #52
  • 53. Data integration & querying (HCLS) (c) HCLS @ W3C Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #53
  • 54. Data integration cost (PwC) (c) PriceWaterhouseCooper Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #54
  • 55. Semantic Technologies & Triplestores for BI • Speed-up data integration – RDF based ETL is more agile • Lower the cost of data integration – Initial cost of using ontologies is higher – But the cost of ad-hoc ETL will be higher in the long term • Align & integrate legacy data silos – Querying & consuming data from disparate sources is easier with SPARQL & RDF Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #55
  • 56. Semantic Technologies & Triplestores for BI (2) • Infer implicit & hidden knowledge – Custom, user-defined rules as well • Efficiently manage unstructured & semi-structured data together – graph data model • Improve the quality of query results – Inference of implicit facts – SPARQL query vocabulary may differ from data vocabulary – Exploratory queries Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #56
  • 57. Q&A Questions? @ontotext Semantic Technologies & Triplestores for BI (eBISS 2011) Jul 2011 #57