SlideShare a Scribd company logo
Enterprise Knowledge Graphs
Sören Auer
The three Big Data „V“ – Variety is often neglected
Quelle: Gesellschaft für Informatik
Sören Auer 2
Linked Data Principles
Addressing the neglected third V (Variety)
1. Use URIs to identify the “things” in your data
2. Use http:// URIs so people (and machines) can
look them up on the web
3. When a URI is looked up, return a description of
the thing (in RDF format)
4. Include links to related things
http://www.w3.org/DesignIssues/LinkedData.html
3
[1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
Linked (Open) Data: The RDF Data Model
4
RDF = Resource Description Framework
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
Sören Auer
RDF Data Model (a bit more technical)
– Graph consists of:
• Resources (identified via URIs)
• Literals: data values with data type (URI) or language (multilinguality integrated)
• Attributes of resources are also URI-identified (from vocabularies)
– Various data sources and vocabularies can be arbitrarily mixed and meshed
– URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/
gn:locatedIn
rdfs:label
dbo:industry
ex:headquarters
foaf:namedbp:DHL_International_GmbH
dbp:Post_Tower
"162.5"^^xsd:decimal
dbp:Bonn
dbp:Logistics
"Logistik"@de
"DHL International GmbH"^^xsd:string
ex:height
"物流"@zh
rdfs:label
rdf:value
unit:Meter
ex:unit
RDF mediates between different Data Models &
bridges between Conceptual and Operational Layers
Id Title Screen
5624 SmartTV 104cm
5627 Tablet 21cm
Prod:5624 rdf:type Electronics
Prod:5624 rdfs:label “SmartTV”
Prod:5624 hasScreenSize “104”^^unit:cm
...
Electronics
Vehicle
Car Bus Truck
Vehicle rdf:type owl:Thing
Car rdfs:subClassOf Vehicle
Bus rdfs:subClassOf Vehicle
...
Tabular/Relational Data
Taxonomic/Tree Data
Logical Axioms / Schema
Male rdfs:subClassOf Human
Female rdfs:subClassOf Human
Male owl:disjointWith Female
...
Sören Auer 6
© Fraunhofer · Seite 7
Vocabularies – Breaking the mold!
Semantic data virtualization allows for continuous expansion and
enhancement of data and metadata across data sources without loosing
the overall perspective
Relational
data models
1:1 Relation between
Data Model und Application
Graph based
data model
Subject
Predicate
Object / Subject
Predicate
Object / Subject
1:n Relation between
Data Model and Application
© Fraunhofer · Seite 8
Vocabulary Example
Vocabulary Schema Instantiation
PostTower rdf:type Building
PostTower locatedIn dbpedia:Bonn
PostTower height "162.5"^^meter
located in
label
industry
headquarters
full nameDHL
Post Tower
162.5 m
Bonn
Logistics Logistik
DHL International GmbH
height
物流
label
Class: Company
Property Expected type
inIndustry Industry
fullName String
headquarter Building
Class: Building
Property Expected type
locatedIn Industry
height unit:meter
RDFRepresentationVisualRepresentation
Company rdf:type rdfs:Class
Building rdf:type rdfs:Class
inIndustry rdf:type rdfs:Property
inIndustry rdfs:domain Company
inIndustry rdfs:range Industry
headquarter rdf:type rdfs:Property
headquarter rdfs:domain Company
headquarter rdfs:range Building
DHL rdf:type Company
DHL fullName "DHL Int. GmbH"
DHL inIndustry Logistics
DHL headquarter PostTower
Die Semantic Web Layer Cake 2001
http://www.w3.org/2001/10/03-sww-1/slide7-0.html
• Monolithisch basierend auf XML
• Fokus auf schwergewichtige
Semantik (Ontologien, Logic,
Reasoning)
© Fraunhofer
The Semantic Web Layer Cake 2015 –
Bridging between Big & Smart Data
Unicode URIs
XML JSON CSV RDB HTML
RDF
RDF/XML JSON-LD CSV2RDF R2RML RDFa
RDF Data
Shapes
RDF-Schema
Vocabularies
OntologienSKOS Thesauri
LogikSWRL Regeln
SPARQL
(Accesscontrol),Signatur,
Encryption(HTTPS/CERT/DANE),
• Lingua Franca of Data
integration with many
technology interfaces (XML,
HTML, JSON, CSV, RDB,…)
• Focus on lightweight
vocabularies, rules,
thesauri etc.
• Less “invasive”
© Fraunhofer
RDF - the Lingua Franca of Data Integration
• RDF is simple
• We can easily encode and combine all kinds of data models (relational, taxonomic,
graphs, object-oriented, …)
• RDF supports distributed data and schema
• We can seamlessly evolve simple semantic representations (vocabularies) to more
complex ones (e.g. ontologies)
• Small representational units (URI/IRIs, triples) facilitate mixing and mashing
• RDF can be viewed from many perspectives: facts, graphs, ER, logical axioms,
graphs, objects
• RDF integrates well with other formalisms - HTML (RDFa), XML (RDF/XML), JSON
(JSON-LD), CSV, …
• Linking and referencing between different knowledge bases, systems and platforms
facilitates the creation of sustainable data ecosystems
11
© Fraunhofer
Successful application domains
Linked Data & Semantic Integration
Search Engine Optimization & Web-Commerce
 Schema.org used by >20% of Web sites
 Major search engines exploit semantic desciptions
Pharma, Lifesciences
 Mature, comprehensive vocabularies and ontologies
 Billions of disease, drug, clinical trial descriptions
Digital Libraries
 Many established vocabularies (DublinCore, FRBR, EDM)
 Millions of aggregated from thousends of memory
institutions in Europeana, German Digital Library
© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
The Web evolves into a Web of Data
Sören Auer 13
Linked Open Data
Facebook
Open Graph
© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
Knowledge Graphs – A definition
• Fabric of concept, class, property, relationships,
entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed
data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases
© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
Knowledge Graph Challenges & Opportunities
Knowledge graphs typically cover
• Multiple domains
• Various levels of granularity
• Data from multiple sources
• Various degrees of structure
Challenges
• Quality
• Coherence
• Co-evolution
• Update propagation
• Curation & interaction
Opportunities
• Background knowledge for various applications (e.g. question answering, data
integration, machine learning)
• Facilitate intra-organizational data sharing and exchange (data value chains)
15
© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
Comparison of various enterprise data integration
paradigms
Paradigm Data
Model
Integr.
Strategy
Conceptual/
operational
Hetero-
geneous
data
Intern./
extern.
data
No. of
sources
Type of
integr.
Domain
coverage
Se-
mantic
repres.
XML
Schema
DOM trees LaV operational   medium both medium high
Data
Warehouse
relational GaV operational - partially medium physical small medium
Data Lake various LaV operational   large physical high medium
MDM UML GaV conceptual - - small physical small medium
PIM / PCS trees GaV operational partially partially - physical medium medium
Enterprise
search
document - operational  partially large virtual high low
EKG RDF LaV both   medium both high very high
[1] Michael Galkin, Sören Auer, Simon Screrri: Enterprise Knowledge Graphs: A Survey.
Submitted to 37th International Conference on Information Systems. 2016.
© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
Knowledge Graph Technology
17
Adding a Semantic Layer to Data Lakes
18
Management
Accounting
Marketing Sales SupportR&D
Semantic Data Lake
• central place for
model, schema and
data historization
• Combination of Scale
Out (cost reduction)
and semantics
(increased control &
flexibility)
• grows incrementally
(pay-as-you-go)
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Data Lake (order of magnitude cheaper scalable data store)
Knowledge Graph for Relationship Definition and Meta Data
Frontend to Access Relationship and KPI Definition
/ Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
JSON-LD CSVW R2RMLXML2RDF
W3C R2RML – Relational to RDF Mapping
Sören Auer 19
R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012
Editors: Souripriya Das, Seema Sundara, Richard Cyganiak
http://www.w3.org/TR/r2rml/
Example R2RML Mapping
Sören Auer 20
1. Either resulting RDF knowledge base is materialized in a triple store &
2. subsequently queried using SPARQL
3. or the materialization step is avoided by dynamically mapping an input SPAQRL query
into a corresponding SQL query, which renders exactly the same results as the SPARQL
query being executed against the materialized RDF dump
SPARQLMap – Mapping RDB 2 RDF
Example: Sparqlify
• Rationale: Exploit existing formalisms
(SQL, SPARQL Construct) as much as
possible
• flexible & versatile mapping language
• translating one SPARQL query into
exactly one efficiently executable SQL
query
• Solid theoretical formalization based
on SPARQL-relational algebra
transformations
• Extremely scalable through elaborated
view candidate selection mechanism
• Used to publish 20B triples for
LinkedGeoData
[1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases.
[2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013
[3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009
SPARQL
Construct
SQL
View
Bridge
Semantified Big Data Architecture Blueprint
Sören Auer 23
[1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016
Datasources Ingestion Storage
Semantic Lifting
with Mappings
Querys
Storing of semantic and semantified data
in Apache Parquet files on HDFS
SEBIDA Implementation Architecture
Sören Auer 24
SEBIDA Evaluation Results
• Loads data faster
• Has quite different query
performance
characteristics –
faster in 5 out of 12
queries,
similar performance in 2,
slower in 5
Sören Auer 25
© Fraunhofer · Seite 26
VOCOL: COLLABORATIVE
VOCABULARY CURATION
ENVIRONMENT
Comprehensive Support for Evolving Vocabularies
© Fraunhofer · Seite 27
Industry 4.0
Semantic Models as Bridge between Shop & Office Floor
© Fraunhofer · Seite 28
Semantic Administrative Shell &
Reference Architecture for Industry 4.0 (RAMI4.0)
Administrative Shell (Verwaltungsschale)
provides a digital identity for arbitrary
Industry 4.0 components (e.g. sensors,
actors/robots) exposing data covering the
whole life-cycle
Reference Architecture for Industry 4.0
(RAMI4.0) provides a conceptual framework
for implementing comprehensive Industry 4.0
scenarios
We have implemented both concepts along
with a number of IEC and ISO standards
in a comprehensive information model
ready to be implemented in productive
environments
© Fraunhofer · Seite 29
VoCol collaborative Development Environment for
Vocabularies
Versioning
Git/Bitbucket
Issue
tracking
GitLab/
GitHub
Syntax
validation
Docu-
mentation
generation
Authoring
Turtle
Visualization
vOWL
Publishing
LOD/Sparql
Integrates a number of tools &
services for different aspects of
vocabulary development
Is centered around Git version
control (or Bitbucket), thus
supporting the branching and
merging of vocabularies
Supports the roundtrip between
• Schema/vocabulary development
• Competency questions
(expressed in SPARQL)
• Example data
 Bridges between conceptual
models and executable code
http://eis.iai.uni-bonn.de/Projects/VoCol.html
© Fraunhofer · Seite 30
Development based on
Git – Version Control
Git is meanwhile the most widely used version control system.
It is a distributed revision control system with an emphasis on speed, data integrity,
and support for distributed, non-linear workflows.
Git was initially designed and developed in 2005 by Linux kernel developers for
Linux kernel development
Git is the basis for a variety of open-source or commercial services and products
such as:
GitHub/Bitbucket - Web-based Git repository hosting service with millions of users
GitLab/Gitolite - open-source Web-based Git repository management platforms
Since TeamFoundationServer release 2013, Microsoft added native support for Git
Git is easily extensible and integratable into arbitrary workflows via GitHooks
© Fraunhofer · Seite 31
Information Model – Environment
© Fraunhofer · Seite 32
Environment: Dynamic Documentation
© Fraunhofer · Seite 33
Environment: Dynamic Documentation
© Fraunhofer · Seite 34
Environment: Dynamic Visualization
© Fraunhofer · Seite 35
Environment: Analytics
© Fraunhofer · Seite 36
Environment: Analytics
© Fraunhofer · Seite 37
Environment: Analytics
© Fraunhofer · Seite 38
© Fraunhofer · Seite 39
Environment: Querying
© Fraunhofer · Seite 40
Environment: Evolution
© Fraunhofer · Seite 41
INDUSTRIAL DATA SPACE
© Fraunhofer · Seite 42
Vocabulary-based Integration facilitates Data-driven
Businesses
Vocablary
© Fraunhofer ·· Seite 43
Die Arbeiten zum Industrial Data Space sind
komplementär verzahnt mit der Plattform Industrie 4.0
Handel 4.0 Bank 4.0Versicherung
4.0
…Industrie 4.0
Fokus auf die
produzierende
Industrie Smart Services
Übertragung,
Netzwerke
Echtzeitsysteme
Industrial Data Space
Fokus auf Daten
Daten
…
© Fraunhofer-Institut für Intelligente
Analyse- und Informationssysteme IAIS
The Industrial Data Space Initiative
Community of >30 large German and European Companies
Pre-competitive, publicly funded innovation project involving 11 Fraunhofer
institutes for developing IDS reference architecture
Current members of the
Industrial Data
Space Association
© Fraunhofer · Seite 45
Bilder: ©Fotolia
Francesco De Paoli, Nmedia, hakandogu
Semantic Data Linking for Enterprise Data Value Chains
Data Lake Pure Internet
centralized, monopolistic
federated, secure, „trusted“,
standard-based
completely dezentral, open,
unsecure
Data management Central Repository Decentral Decentral
Data Ownership Central Decentral Decentral
Data Linking Single provider Federated, on demand Missing
Data Security Bilateral Certified system Bilateral
Market structure Central Provider Role system Unstructured
Transport infrastructure Internet Internet Internet
Industrial
Data Space
© Fraunhofer · Seite 46
Bilder: © Fotolia
77260795 ∙ 73040142
58947296 ∙ 68898041
Basic principles of the Industrial Data Space
On Demand
Vernetzung
Linked Light
Semantics
Security
with Industrial
Data Container
Certified Roles
On Demand
Interlinking
© Fraunhofer · Seite 47
Bildquellen: Istockphoto
Industrial Data Space:
On Demand Interlinking
Service A
Service C
Service E
Service B
Service D
Service G
Service F
Enterprise 4
Enterprise 1
Enterprise 6
Enterprise 2
Enterprise 3
Enterprise 5
All Data stays with its Ownern and are controlled and secured. Only on request for a
service data will be shared. No central platform.
© Fraunhofer · Seite 48 --- VERTRAULICH ---
Industrial Data Space
Upload / Download / Search
Internet
AppsVocabulary
Industrial Data Space
Broker
Clearing
RegistryIndex
Industrial Data Space
App Store
Internal IDS
Connector
Company A Internal IDS
Connector
Company B
External IDS
Connector
External IDS
Connector
Upload
Third Party
Cloud Provider
Download
Upload / Download
© Fraunhofer
IDS Architecture Overview
Big Data is not Just Volume and Velocity
Variety (& Varacity) are key challenges
Linked Data helps dealing with both
• Linked Data life-cycle requires to integrate
and adapt results from a number of
disciplines
– NLP,
– Machine Learning,
– Knowledge Representation,
– Data Management,
– User Interaction
– …
• Applications in a number of domains
– cultural heritage,
– life sciences,
– industry 4.0 / cyber-physical systems,
– smart cities,
– mobility,
– …
Sören Auer 49
Linked Data links not only data but also:
• Various disciplines
• Applications and Use cases
The Team
Sören Auer 50
Creating Knowledge
out of Interlinked Data
Thanks for your attention!
Sören Auer
http://www.iai.uni-bonn.de/~auer | http://eis.iai.uni-bonn.de
auer@cs.uni-bonn.de
LINKED-DATA-BASED QUESTION
ANSWERING
A Grand Challenge
Sören Auer 52
Question Answering research challenges
Main Goals
• Completeness ⇒ Extension of background knowledge, streams, deduplication
• Flexibility ⇒ Deal with keywords and NL
• Runtime ⇒ New models for query processing, ranking for top-k queries
• Easy use ⇒ Verbalization of queries, entity verbalization, explanation of answers in NL
• Multilinguality ⇒ cover several European languages
Automatic Extension of background knowledge
• 1. Generate query from own data and get answer set A; 2. Add new data set and get answer A’; 3. If info
gain, then iterate; 4. Else terminate
Data Streams
• Continuous queries on data streams (update SPARQL results as new information comes in)
• Send novel answers to end user
• Open Information Extraction
Hybrid Search - extension for queries on unstructured data
Ensure Quasi-Completeness
• Fully automatic entity consolidation
• Find links at runtime, e.g., between DBpedia and LinkedMDB to answer “Which films were directed by and
starred Tarantino”?
Sören Auer 53
[1] Shekarpour, Marx, Ngomo, Auer: Semantic query interpretation for question answering on linked data. J. Web Semantic 30 (2015)
[2] Marx, Usbeck, Ngomo, Höffner, Lehmann, Auer: Towards an open question answering architecture. SEMANTICS 2014
[3] Shekarpour, Ngomo, Auer: Question answering on interlinked data. WWW 2013:
The approach: An Open QA Architecture
Create an open, extensible architecture for Linked-Data-based Question Answering
• Enable the plugin and competition of different modules for various QA aspects:
• Input: query string / question, voice, brain input; Query Splitting; Disambiguation/Mapping; Query
Construction; Query Execution; Result presentation
• Take context, personalization, feedback into account
For Whom? Use Cases:
• In-car interaction / Human Vehicle Interaction
Where can I find parking? What are the main sights in Luxembourg?
• Assisting people with disabilities (e.g. vision impaired)
Is there any pharmacy still open? What classics concerts are brodcast next week?
• Medical information retrieval
Which side effects can be caused by Paracetamol? Do Paracetamol and Tamiflu interfere?
•…
Sören Auer 54
[1] The WDAqua Marie Curie ITN: Answering Questions using Web Data. http://wdaqua.informatik.uni-bonn.de

More Related Content

PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PDF
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
PDF
Smart Data Applications powered by the Wikidata Knowledge Graph
PDF
The Bounties of Semantic Data Integration for the Enterprise
PPT
The Power of Semantic Technologies to Explore Linked Open Data
PPTX
Vassilios Peristeras | Promoting Semantic Interoperability for European Publi...
PDF
Linked Data Experiences at Springer Nature
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Smart Data Applications powered by the Wikidata Knowledge Graph
The Bounties of Semantic Data Integration for the Enterprise
The Power of Semantic Technologies to Explore Linked Open Data
Vassilios Peristeras | Promoting Semantic Interoperability for European Publi...
Linked Data Experiences at Springer Nature

What's hot (20)

PDF
Building Knowledge Graphs in 10 steps
KEY
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
PDF
Discovering Related Data Sources in Data Portals
PDF
Choosing the Right Graph Database to Succeed in Your Project
PDF
Open Data and News Analytics Demo
PDF
Querying the Wikidata Knowledge Graph
PPTX
Boost your data analytics with open data and public news content
PDF
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
PDF
Sebastian Hellmann
PPTX
What can linked data do for digital libraries
PDF
Wed roman tut_open_datapub
PDF
Smarter content with a Dynamic Semantic Publishing Platform
PPTX
Semantic Technology in Publishing & Finance
PDF
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
PDF
On demand access to Big Data through Semantic Technologies
PPTX
HDL - Towards A Harmonized Dataset Model for Open Data Portals
PDF
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
PPTX
Diving in Panama Papers and Open Data to Discover Emerging News
PDF
Geospatial Big Data: Business Cases from proDataMarket
PDF
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Building Knowledge Graphs in 10 steps
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Discovering Related Data Sources in Data Portals
Choosing the Right Graph Database to Succeed in Your Project
Open Data and News Analytics Demo
Querying the Wikidata Knowledge Graph
Boost your data analytics with open data and public news content
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Sebastian Hellmann
What can linked data do for digital libraries
Wed roman tut_open_datapub
Smarter content with a Dynamic Semantic Publishing Platform
Semantic Technology in Publishing & Finance
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
On demand access to Big Data through Semantic Technologies
HDL - Towards A Harmonized Dataset Model for Open Data Portals
Practical use of Knowledge Graph with Case Studies using Semantic Web Publish...
Diving in Panama Papers and Open Data to Discover Emerging News
Geospatial Big Data: Business Cases from proDataMarket
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Ad

Viewers also liked (20)

PPTX
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
PDF
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
PDF
Victor Charpenay | Standardized Semantics for an Open Web of Things
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
PPTX
Kostas Kastrantas | Business Opportunities with Linked Open Data
PPTX
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
PPTX
Thomas Vavra | New Ways of Handling Old Data
PPTX
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
PDF
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
PDF
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
PPTX
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
PDF
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
PPTX
Reginald Ford, Grit Denker, Daniel Elenius, Wesley Moore and Elie Abi-Lahoud ...
PDF
Tomas Knap | RDF Data Processing and Integration Tasks in UnifiedViews: Use C...
PPTX
Holger Wollschläger | E-government at its best: Open, transparent and useful
PPTX
Jo Kent | ADA – Opening up the BBC archive with linked data
PDF
GE’s Industrial Data Lake Platform
PDF
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
PDF
Linked data the next 5 years - From Hype to Action
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Victor Charpenay | Standardized Semantics for an Open Web of Things
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Kostas Kastrantas | Business Opportunities with Linked Open Data
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
Thomas Vavra | New Ways of Handling Old Data
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Christian Opitz | Semantic E-Commerce - Use Cases in Enterprise Web Applications
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
Reginald Ford, Grit Denker, Daniel Elenius, Wesley Moore and Elie Abi-Lahoud ...
Tomas Knap | RDF Data Processing and Integration Tasks in UnifiedViews: Use C...
Holger Wollschläger | E-government at its best: Open, transparent and useful
Jo Kent | ADA – Opening up the BBC archive with linked data
GE’s Industrial Data Lake Platform
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Linked data the next 5 years - From Hype to Action
Ad

Similar to Sören Auer | Enterprise Knowledge Graphs (20)

PPTX
Enterprise knowledge graphs
PDF
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
PPTX
Knowledge Graph Introduction
PPT
Linked Data Tutorial
PPTX
Linked data for Enterprise Data Integration
PPTX
Scaling up Linked Data
PPTX
Cognitive data
PPTX
Why I don't use Semantic Web technologies anymore, event if they still influe...
PDF
Vital AI: Big Data Modeling
PPT
Linked Data Driven Data Virtualization for Web-scale Integration
PDF
Introduction to linked data
PPTX
Scaling up Linked Data
PDF
FAIR data: LOUD for all audiences
PPTX
Linked data HHS 2015
PPTX
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
PPT
Structured Dynamics' Semantic Technologies Product Stack
PPTX
One day workshop Linked Data and Semantic Web
PPTX
FAIR Workflows and Research Objects get a Workout
PDF
Llinked open data training for EU institutions
PDF
lodlam summit session browsable linked data
Enterprise knowledge graphs
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
Knowledge Graph Introduction
Linked Data Tutorial
Linked data for Enterprise Data Integration
Scaling up Linked Data
Cognitive data
Why I don't use Semantic Web technologies anymore, event if they still influe...
Vital AI: Big Data Modeling
Linked Data Driven Data Virtualization for Web-scale Integration
Introduction to linked data
Scaling up Linked Data
FAIR data: LOUD for all audiences
Linked data HHS 2015
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Structured Dynamics' Semantic Technologies Product Stack
One day workshop Linked Data and Semantic Web
FAIR Workflows and Research Objects get a Workout
Llinked open data training for EU institutions
lodlam summit session browsable linked data

More from semanticsconference (20)

PPTX
Linear books to open world adventure
PDF
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
PDF
Session 4.3 semantic annotation for enhancing collaborative ideation
PDF
Session 1.1 dalicc - data licenses clearance center
PDF
Session 1.3 context information management across smart city knowledge domains
PDF
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
PPTX
Session 0.0 keynote sandeep sacheti - final hi res
PPTX
Session 1.1 linked data applied: a field report from the netherlands
PDF
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
PDF
Session 1.4 connecting information from legislation and datasets using a ca...
PDF
Session 1.4 a distributed network of heritage information
PDF
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
PDF
Session 1.3 semantic asset management in the dutch rail engineering and con...
PPTX
Session 1.3 energy, smart homes & smart grids: towards interoperability...
PDF
Session 1.2 improving access to digital content by semantic enrichment
PPTX
Session 2.3 semantics for safeguarding & security – a police story
PPTX
Session 2.5 semantic similarity based clustering of license excerpts for im...
PDF
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
PDF
Session 1.6 slovak public metadata governance and management based on linke...
PPTX
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Linear books to open world adventure
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 1.1 dalicc - data licenses clearance center
Session 1.3 context information management across smart city knowledge domains
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0 keynote sandeep sacheti - final hi res
Session 1.1 linked data applied: a field report from the netherlands
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4 a distributed network of heritage information
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.2 improving access to digital content by semantic enrichment
Session 2.3 semantics for safeguarding & security – a police story
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 1.6 slovak public metadata governance and management based on linke...
Session 5.6 towards a semantic outlier detection framework in wireless sens...

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Weekly Chronicles - August'25 Week I
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction

Sören Auer | Enterprise Knowledge Graphs

  • 2. The three Big Data „V“ – Variety is often neglected Quelle: Gesellschaft für Informatik Sören Auer 2
  • 3. Linked Data Principles Addressing the neglected third V (Variety) 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing (in RDF format) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html 3 [1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
  • 4. Linked (Open) Data: The RDF Data Model 4 RDF = Resource Description Framework located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label Sören Auer
  • 5. RDF Data Model (a bit more technical) – Graph consists of: • Resources (identified via URIs) • Literals: data values with data type (URI) or language (multilinguality integrated) • Attributes of resources are also URI-identified (from vocabularies) – Various data sources and vocabularies can be arbitrarily mixed and meshed – URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/ gn:locatedIn rdfs:label dbo:industry ex:headquarters foaf:namedbp:DHL_International_GmbH dbp:Post_Tower "162.5"^^xsd:decimal dbp:Bonn dbp:Logistics "Logistik"@de "DHL International GmbH"^^xsd:string ex:height "物流"@zh rdfs:label rdf:value unit:Meter ex:unit
  • 6. RDF mediates between different Data Models & bridges between Conceptual and Operational Layers Id Title Screen 5624 SmartTV 104cm 5627 Tablet 21cm Prod:5624 rdf:type Electronics Prod:5624 rdfs:label “SmartTV” Prod:5624 hasScreenSize “104”^^unit:cm ... Electronics Vehicle Car Bus Truck Vehicle rdf:type owl:Thing Car rdfs:subClassOf Vehicle Bus rdfs:subClassOf Vehicle ... Tabular/Relational Data Taxonomic/Tree Data Logical Axioms / Schema Male rdfs:subClassOf Human Female rdfs:subClassOf Human Male owl:disjointWith Female ... Sören Auer 6
  • 7. © Fraunhofer · Seite 7 Vocabularies – Breaking the mold! Semantic data virtualization allows for continuous expansion and enhancement of data and metadata across data sources without loosing the overall perspective Relational data models 1:1 Relation between Data Model und Application Graph based data model Subject Predicate Object / Subject Predicate Object / Subject 1:n Relation between Data Model and Application
  • 8. © Fraunhofer · Seite 8 Vocabulary Example Vocabulary Schema Instantiation PostTower rdf:type Building PostTower locatedIn dbpedia:Bonn PostTower height "162.5"^^meter located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label Class: Company Property Expected type inIndustry Industry fullName String headquarter Building Class: Building Property Expected type locatedIn Industry height unit:meter RDFRepresentationVisualRepresentation Company rdf:type rdfs:Class Building rdf:type rdfs:Class inIndustry rdf:type rdfs:Property inIndustry rdfs:domain Company inIndustry rdfs:range Industry headquarter rdf:type rdfs:Property headquarter rdfs:domain Company headquarter rdfs:range Building DHL rdf:type Company DHL fullName "DHL Int. GmbH" DHL inIndustry Logistics DHL headquarter PostTower
  • 9. Die Semantic Web Layer Cake 2001 http://www.w3.org/2001/10/03-sww-1/slide7-0.html • Monolithisch basierend auf XML • Fokus auf schwergewichtige Semantik (Ontologien, Logic, Reasoning)
  • 10. © Fraunhofer The Semantic Web Layer Cake 2015 – Bridging between Big & Smart Data Unicode URIs XML JSON CSV RDB HTML RDF RDF/XML JSON-LD CSV2RDF R2RML RDFa RDF Data Shapes RDF-Schema Vocabularies OntologienSKOS Thesauri LogikSWRL Regeln SPARQL (Accesscontrol),Signatur, Encryption(HTTPS/CERT/DANE), • Lingua Franca of Data integration with many technology interfaces (XML, HTML, JSON, CSV, RDB,…) • Focus on lightweight vocabularies, rules, thesauri etc. • Less “invasive”
  • 11. © Fraunhofer RDF - the Lingua Franca of Data Integration • RDF is simple • We can easily encode and combine all kinds of data models (relational, taxonomic, graphs, object-oriented, …) • RDF supports distributed data and schema • We can seamlessly evolve simple semantic representations (vocabularies) to more complex ones (e.g. ontologies) • Small representational units (URI/IRIs, triples) facilitate mixing and mashing • RDF can be viewed from many perspectives: facts, graphs, ER, logical axioms, graphs, objects • RDF integrates well with other formalisms - HTML (RDFa), XML (RDF/XML), JSON (JSON-LD), CSV, … • Linking and referencing between different knowledge bases, systems and platforms facilitates the creation of sustainable data ecosystems 11
  • 12. © Fraunhofer Successful application domains Linked Data & Semantic Integration Search Engine Optimization & Web-Commerce  Schema.org used by >20% of Web sites  Major search engines exploit semantic desciptions Pharma, Lifesciences  Mature, comprehensive vocabularies and ontologies  Billions of disease, drug, clinical trial descriptions Digital Libraries  Many established vocabularies (DublinCore, FRBR, EDM)  Millions of aggregated from thousends of memory institutions in Europeana, German Digital Library
  • 13. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS The Web evolves into a Web of Data Sören Auer 13 Linked Open Data Facebook Open Graph
  • 14. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graphs – A definition • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases
  • 15. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graph Challenges & Opportunities Knowledge graphs typically cover • Multiple domains • Various levels of granularity • Data from multiple sources • Various degrees of structure Challenges • Quality • Coherence • Co-evolution • Update propagation • Curation & interaction Opportunities • Background knowledge for various applications (e.g. question answering, data integration, machine learning) • Facilitate intra-organizational data sharing and exchange (data value chains) 15
  • 16. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Comparison of various enterprise data integration paradigms Paradigm Data Model Integr. Strategy Conceptual/ operational Hetero- geneous data Intern./ extern. data No. of sources Type of integr. Domain coverage Se- mantic repres. XML Schema DOM trees LaV operational   medium both medium high Data Warehouse relational GaV operational - partially medium physical small medium Data Lake various LaV operational   large physical high medium MDM UML GaV conceptual - - small physical small medium PIM / PCS trees GaV operational partially partially - physical medium medium Enterprise search document - operational  partially large virtual high low EKG RDF LaV both   medium both high very high [1] Michael Galkin, Sören Auer, Simon Screrri: Enterprise Knowledge Graphs: A Survey. Submitted to 37th International Conference on Information Systems. 2016.
  • 17. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Knowledge Graph Technology 17
  • 18. Adding a Semantic Layer to Data Lakes 18 Management Accounting Marketing Sales SupportR&D Semantic Data Lake • central place for model, schema and data historization • Combination of Scale Out (cost reduction) and semantics (increased control & flexibility) • grows incrementally (pay-as-you-go) Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Data Lake (order of magnitude cheaper scalable data store) Knowledge Graph for Relationship Definition and Meta Data Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems JSON-LD CSVW R2RMLXML2RDF
  • 19. W3C R2RML – Relational to RDF Mapping Sören Auer 19 R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012 Editors: Souripriya Das, Seema Sundara, Richard Cyganiak http://www.w3.org/TR/r2rml/
  • 21. 1. Either resulting RDF knowledge base is materialized in a triple store & 2. subsequently queried using SPARQL 3. or the materialization step is avoided by dynamically mapping an input SPAQRL query into a corresponding SQL query, which renders exactly the same results as the SPARQL query being executed against the materialized RDF dump SPARQLMap – Mapping RDB 2 RDF
  • 22. Example: Sparqlify • Rationale: Exploit existing formalisms (SQL, SPARQL Construct) as much as possible • flexible & versatile mapping language • translating one SPARQL query into exactly one efficiently executable SQL query • Solid theoretical formalization based on SPARQL-relational algebra transformations • Extremely scalable through elaborated view candidate selection mechanism • Used to publish 20B triples for LinkedGeoData [1] Stadler, Unbehauen, Auer, Lehmann: Sparqlify – Very Large Scale Linked Data Publication from Relational Databases. [2] Unbehauen, Stadler, Auer: Optimizing SPARQL-to-SQL Rewriting. iiWAS 2013 [3] Auer, et al.: Triplify: light-weight linked data publication from relational databases. WWW 2009 SPARQL Construct SQL View Bridge
  • 23. Semantified Big Data Architecture Blueprint Sören Auer 23 [1] Mami, Scerri, Auer, Vidal: Towards the Semantification of Big Data Technology. DEXA 2016 Datasources Ingestion Storage Semantic Lifting with Mappings Querys Storing of semantic and semantified data in Apache Parquet files on HDFS
  • 25. SEBIDA Evaluation Results • Loads data faster • Has quite different query performance characteristics – faster in 5 out of 12 queries, similar performance in 2, slower in 5 Sören Auer 25
  • 26. © Fraunhofer · Seite 26 VOCOL: COLLABORATIVE VOCABULARY CURATION ENVIRONMENT Comprehensive Support for Evolving Vocabularies
  • 27. © Fraunhofer · Seite 27 Industry 4.0 Semantic Models as Bridge between Shop & Office Floor
  • 28. © Fraunhofer · Seite 28 Semantic Administrative Shell & Reference Architecture for Industry 4.0 (RAMI4.0) Administrative Shell (Verwaltungsschale) provides a digital identity for arbitrary Industry 4.0 components (e.g. sensors, actors/robots) exposing data covering the whole life-cycle Reference Architecture for Industry 4.0 (RAMI4.0) provides a conceptual framework for implementing comprehensive Industry 4.0 scenarios We have implemented both concepts along with a number of IEC and ISO standards in a comprehensive information model ready to be implemented in productive environments
  • 29. © Fraunhofer · Seite 29 VoCol collaborative Development Environment for Vocabularies Versioning Git/Bitbucket Issue tracking GitLab/ GitHub Syntax validation Docu- mentation generation Authoring Turtle Visualization vOWL Publishing LOD/Sparql Integrates a number of tools & services for different aspects of vocabulary development Is centered around Git version control (or Bitbucket), thus supporting the branching and merging of vocabularies Supports the roundtrip between • Schema/vocabulary development • Competency questions (expressed in SPARQL) • Example data  Bridges between conceptual models and executable code http://eis.iai.uni-bonn.de/Projects/VoCol.html
  • 30. © Fraunhofer · Seite 30 Development based on Git – Version Control Git is meanwhile the most widely used version control system. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed in 2005 by Linux kernel developers for Linux kernel development Git is the basis for a variety of open-source or commercial services and products such as: GitHub/Bitbucket - Web-based Git repository hosting service with millions of users GitLab/Gitolite - open-source Web-based Git repository management platforms Since TeamFoundationServer release 2013, Microsoft added native support for Git Git is easily extensible and integratable into arbitrary workflows via GitHooks
  • 31. © Fraunhofer · Seite 31 Information Model – Environment
  • 32. © Fraunhofer · Seite 32 Environment: Dynamic Documentation
  • 33. © Fraunhofer · Seite 33 Environment: Dynamic Documentation
  • 34. © Fraunhofer · Seite 34 Environment: Dynamic Visualization
  • 35. © Fraunhofer · Seite 35 Environment: Analytics
  • 36. © Fraunhofer · Seite 36 Environment: Analytics
  • 37. © Fraunhofer · Seite 37 Environment: Analytics
  • 38. © Fraunhofer · Seite 38
  • 39. © Fraunhofer · Seite 39 Environment: Querying
  • 40. © Fraunhofer · Seite 40 Environment: Evolution
  • 41. © Fraunhofer · Seite 41 INDUSTRIAL DATA SPACE
  • 42. © Fraunhofer · Seite 42 Vocabulary-based Integration facilitates Data-driven Businesses Vocablary
  • 43. © Fraunhofer ·· Seite 43 Die Arbeiten zum Industrial Data Space sind komplementär verzahnt mit der Plattform Industrie 4.0 Handel 4.0 Bank 4.0Versicherung 4.0 …Industrie 4.0 Fokus auf die produzierende Industrie Smart Services Übertragung, Netzwerke Echtzeitsysteme Industrial Data Space Fokus auf Daten Daten …
  • 44. © Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS The Industrial Data Space Initiative Community of >30 large German and European Companies Pre-competitive, publicly funded innovation project involving 11 Fraunhofer institutes for developing IDS reference architecture Current members of the Industrial Data Space Association
  • 45. © Fraunhofer · Seite 45 Bilder: ©Fotolia Francesco De Paoli, Nmedia, hakandogu Semantic Data Linking for Enterprise Data Value Chains Data Lake Pure Internet centralized, monopolistic federated, secure, „trusted“, standard-based completely dezentral, open, unsecure Data management Central Repository Decentral Decentral Data Ownership Central Decentral Decentral Data Linking Single provider Federated, on demand Missing Data Security Bilateral Certified system Bilateral Market structure Central Provider Role system Unstructured Transport infrastructure Internet Internet Internet Industrial Data Space
  • 46. © Fraunhofer · Seite 46 Bilder: © Fotolia 77260795 ∙ 73040142 58947296 ∙ 68898041 Basic principles of the Industrial Data Space On Demand Vernetzung Linked Light Semantics Security with Industrial Data Container Certified Roles On Demand Interlinking
  • 47. © Fraunhofer · Seite 47 Bildquellen: Istockphoto Industrial Data Space: On Demand Interlinking Service A Service C Service E Service B Service D Service G Service F Enterprise 4 Enterprise 1 Enterprise 6 Enterprise 2 Enterprise 3 Enterprise 5 All Data stays with its Ownern and are controlled and secured. Only on request for a service data will be shared. No central platform.
  • 48. © Fraunhofer · Seite 48 --- VERTRAULICH --- Industrial Data Space Upload / Download / Search Internet AppsVocabulary Industrial Data Space Broker Clearing RegistryIndex Industrial Data Space App Store Internal IDS Connector Company A Internal IDS Connector Company B External IDS Connector External IDS Connector Upload Third Party Cloud Provider Download Upload / Download © Fraunhofer IDS Architecture Overview
  • 49. Big Data is not Just Volume and Velocity Variety (& Varacity) are key challenges Linked Data helps dealing with both • Linked Data life-cycle requires to integrate and adapt results from a number of disciplines – NLP, – Machine Learning, – Knowledge Representation, – Data Management, – User Interaction – … • Applications in a number of domains – cultural heritage, – life sciences, – industry 4.0 / cyber-physical systems, – smart cities, – mobility, – … Sören Auer 49 Linked Data links not only data but also: • Various disciplines • Applications and Use cases
  • 51. Creating Knowledge out of Interlinked Data Thanks for your attention! Sören Auer http://www.iai.uni-bonn.de/~auer | http://eis.iai.uni-bonn.de auer@cs.uni-bonn.de
  • 53. Question Answering research challenges Main Goals • Completeness ⇒ Extension of background knowledge, streams, deduplication • Flexibility ⇒ Deal with keywords and NL • Runtime ⇒ New models for query processing, ranking for top-k queries • Easy use ⇒ Verbalization of queries, entity verbalization, explanation of answers in NL • Multilinguality ⇒ cover several European languages Automatic Extension of background knowledge • 1. Generate query from own data and get answer set A; 2. Add new data set and get answer A’; 3. If info gain, then iterate; 4. Else terminate Data Streams • Continuous queries on data streams (update SPARQL results as new information comes in) • Send novel answers to end user • Open Information Extraction Hybrid Search - extension for queries on unstructured data Ensure Quasi-Completeness • Fully automatic entity consolidation • Find links at runtime, e.g., between DBpedia and LinkedMDB to answer “Which films were directed by and starred Tarantino”? Sören Auer 53 [1] Shekarpour, Marx, Ngomo, Auer: Semantic query interpretation for question answering on linked data. J. Web Semantic 30 (2015) [2] Marx, Usbeck, Ngomo, Höffner, Lehmann, Auer: Towards an open question answering architecture. SEMANTICS 2014 [3] Shekarpour, Ngomo, Auer: Question answering on interlinked data. WWW 2013:
  • 54. The approach: An Open QA Architecture Create an open, extensible architecture for Linked-Data-based Question Answering • Enable the plugin and competition of different modules for various QA aspects: • Input: query string / question, voice, brain input; Query Splitting; Disambiguation/Mapping; Query Construction; Query Execution; Result presentation • Take context, personalization, feedback into account For Whom? Use Cases: • In-car interaction / Human Vehicle Interaction Where can I find parking? What are the main sights in Luxembourg? • Assisting people with disabilities (e.g. vision impaired) Is there any pharmacy still open? What classics concerts are brodcast next week? • Medical information retrieval Which side effects can be caused by Paracetamol? Do Paracetamol and Tamiflu interfere? •… Sören Auer 54 [1] The WDAqua Marie Curie ITN: Answering Questions using Web Data. http://wdaqua.informatik.uni-bonn.de

Editor's Notes

  • #3: http://www.gi.de/nc/service/informatiklexikon/detailansicht/article/big-data.html
  • #19: Data Lake is a storage repository for big data scale raw data in original data formats. late binding approach to schema: “Let us decide, when we need it.” scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses. Semantic Data Lake = Data Lake + Knowledge Graph management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other. A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities. Based on the Resource Description Framework (RDF) standard and Linked Data principles.
  • #48: Die Plattform bietet einen sicheren Raum zur Vernetzung Daten bleiben bei den Enterprise und werden nur bei Bedarf vernetzt Marktorientiertes Modell ohne Abhängigkeiten von einzelnen Anbietern Wertschöpfung und Servicee bleiben beim Enterprise Finanzierung über Servicee, nicht über Werbung oder Datenverkauf Keine zentrale Datenkrake wie Google, sondern Kontrolle über Daten bleibt bei den Daten-Ownern Kunde (Endnutzer) ist nicht Produkt, sondern Souverän über seine Daten Das Ganze ist mehr als die Summe der einzelnen Teile (Ende-zu-Ende-Servicee auf Basis der Daten von mehreren bieten überproportional höheren Mehrwert) Kein zentraler Datentopf, sondern ein Netz gesunder, sicherer Daten Governance nicht monopolistisch, sondern föderal
  • #50: Linked Data approach can help to establish data value chains Linked Data life-cycle requires to integrate and adapt results from a number of disciplines (NLP, Machine Learning, Knowledge Representation, Data Management) Applications in a number of domains (cultural heritage, life sciences, industry 4.0 / cyber-physical systems, smart cities, mobility,…)