SlideShare a Scribd company logo
Scaling the (evolving) web data
–at low cost-
Javier D. Fernández
QuWeDa 2017: Querying the Web of Data
Kosice, 29/05/2017
A good recipe for a WS Keynote
Ingredients (for approx. 20 persons)
 A motivated speaker
 Some knowledge in the area
 An engaged audience
 Slides (number at your convenience)
Method
 Present yourself
 Set the context, give an overall picture of the area
 Touch some of the topics of the event
 Focus the discussion- Sell your work
 Devise future developments in the area
• Mix everything with jokes
About me:
 since 2015 @WU, Inst. for Information Business
Research interest: Semantic Web, Open Data, Big (Semantic) Data Management,
Databases, Data Compression, Privacy and Security
 https://www.wu.ac.at/en/infobiz/team/fernandez/
MadridValladolid Santiago Rome
3
Óscar CorchoPablo de la Fuente
Miguel A. Martínez-Prieto
Claudio Gutiérrez Maurizio Lenzerini
Vienna
Axel Polleres
A good recipe for a WS Keynote
Ingredients (for approx. 20 persons)
 A motivated speaker
 Some knowledge in the area
 An engaged audience
 Slides (number at your convenience)
Method
 Present yourself
 Set the context, give an overall picture of the area
 Touch some of the topics of the event
 Focus the discussion- Sell your work
 Devise future developments in the area
• Mix everything with humour
5
The Web of Data Eco System
The Web of Data Eco System
 First, we better know what we can offer…
 What is the Semantic Web/Web of Data/Linked Data?
 Who are we? What have we done so far?
 What we haven‘t done so far?
6
Linked Data Semantic Web
Open Data
Big Data
(Big Semantic Data: Linked Data vs.
Big Data)
 Overlaps:
 LD as a whole is big (38B-150B triples)
 No rigid (e.g., relational) data model
 Big Data technologies (e.g., Hadoop) are used to handle LD
 LD can represent knowledge extracted from big unstructured
data (specially to deal with variety)
 Key Differences:
 Individual linked data sets are typically not "big" per se
(e.g., English DBpedia dump (zip) currently < 5 GB)
 LD is structured, single data model (RDF), "big data lakes" are
typically neither
 Big data based on distributed data infrastructures within an
organization (e.g., Hadoop clusters), LD creates a
decentralized, globally distributed data infrastructure
Let’s study the community…
Survey practitioner needs, technological challenges, and
open research questions on the use of Linked Data
 Austrian FFG ICT of the Future project (exploratory study)
 Consortium: IDC Austria, Technical University of Vienna,
University of Economy Vienna, Semantic Web Company
 Project ended in Dec 2016: https://www.linked-data.at/
Standards*Requirements Literature research*
* Special kudos to Sabrina Kirrane and Axel Polleres for the community analysis
Interviews
 23 interviews:
 Domains
 Consulting, Engineering, Environment, Finance and Insurance,
Government, Healthcare, ICT, IT, Media, Pharmaceutical,
Professional Services, Real Estate, Research, Startup, Tourism,
Transports & Logistics
 Roles
 Business Intelligence, CEO, Chief Engineer, Data and Systems
Architect, Data Scientist, Director Information Management,
Enterprise Architect, Founder, General Secretary, Governance, Risk
& Compliance Manager, Head of Communications and Media, Head
of Development, Head of HR, Head of R&D, Innovation Manager,
Information Architect, IT Project Manager, Management, Managing
director, Marketing Analyst, Principle System Analyst, Project
Coordinator, Researcher, Technical Specialist
Technologies in need…
Analytics
Computational
linguistics & NLP
Concept tagging
& annotation
Data integration
Data
management
Dynamic data /
streaming
Extraction, data
mining, text
mining, entity
extraction
Logic, formal
languages &
reasoning
Human-
Computer
Interaction &
visualization
Knowledge
representation
Machine learning
Ontology/thesaur
us/taxonomy
management
Quality &
Provenance
Recommendation
Robustness,
scalability,
optimization and
performance
Searching,
browsing &
exploration
Security and
privacy
System
engineering
We ended
with most
areas of
the SW
Standards
Standards Toolbox (incl. W3C member submissions)
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
What can we offer?
Community Analysis
 Monitoring SW community major venues (2006-2015):
 ISWC (since 2006), ESWC (since 2006), SEMANTiCS (since
2007), JWS (since 2006), SWJ (since 2010)
 3 seminal papers:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Topic Categorisation
Topic Categorisation
Interestingly, the
same “empty”
topics in
standards
Semantic Web/Linked Data over
time…
Subtopics:
Expressing Meaning
Knowledge Representation
Ontologies
Agents
Knowledge Representation
& Reasoning
Semantic Web/Linked Data over
time…
Early adopters:
MITRE
Chevron
British Telecom
Boeing
Ordnance Survey
Eli Lily
Pfizer
Agfa
Food and Drug Administration
National Institutes of Health
Software adopters/products:
Oracle
Adobe
Altova
OpenLink
TopQuadrant
Software AG
Aduna Software
Protége
SAPHIRE
LD Adopters - Companies
LD Adopters - Companies
LD Adopters - Companies
0
200
400
600
800
1000
1200
1400
1600
Google Oracle Yahoo SAP IEEE
Intelligent
Systems
Franz Bing Expert
System
IBM Research Poolparty
Occurrences
Companies
Conference Sponsors that appear in papers 2006-2015
To whom we can sell our technology
Semantic Web/Linked Data over
time…
The authors claim that "early research has
transitioned into these larger, more
applied systems, today’s Semantic Web
research is changing: It builds on the
earlier foundations but it has generated a
more diverse set of pursuits”.
Big Semantic Data and applied
systems
Big Semantic Data and applied
systems
Other topics of the QuWeDa
workshop
A good recipe for a WS Keynote
Ingredients (for approx. 20 persons)
 A motivated speaker
 Some knowledge in the area
 An engaged audience
 Slides (number at your convenience)
Method
 Present yourself
 Set the context, give an overall picture of the area
 Touch some of the topics of the event
 Focus the discussion- Sell your work
 Devise future developments in the area
• Mix everything with humour
Motivation
 Publication, Exchange and Consumption of large RDF datasets
 Most RDF formats (N3, XML, Turtle) are text serializations, designed for
human readability (not for machines)
 Verbose = High costs to write/exchange/parse
 A basic offline search = (decompress)+ index the file + search
 Lightweight Binary RDF (HDT)
 Highly compact serialization of RDF
 Allows fast RDF retrieval in compressed space (without prior decompression)
 Includes internal indexes to solve basic queries with small (3%) memory footprint.
 Very fast on basic queries (triple patterns), x 1.5 faster than Virtuoso, RDF3x.
 Complex queries (joins) on the same scale of current solutions (Virtuoso, RDF3x).
431 M.triples~
63 GB
DBpedia
NT + gzip
5 GB
HDT
6.6 GB
HDT + gzip
2.7 GB
rdfhdt.org
The real motivation
The real motivation
http://www.kunsan.af.mil/News/
Article/413995/serving-the-masses/
Oh man I’m hungry and
I don’ t even know if I
will like whatever you
are cooking
The real motivation
http://www.kunsan.af.mil/News/
Article/413995/serving-the-masses/
Oh man I’m hungry and
I don’ t even know if I
will like whatever you
are cooking
consume
Applications
 Compress and share ready-to-consume RDF datasets
 Transfer large data between servers
 Embedded Systems & Phones
 Fast –low cost- SPARQL Query Engine
 Via LDF
 HDT-Jena
 HDT-Cliopatra
But what about Web-scale queries
 E.g. retrieve all entities in LOD with the label “Tim
Berners-Lee“
 Options:
 Crawl and index LOD locally (-no-)
 Follow-your-nose (where should I start?)
 Federated querying (as good as the endpoints you query)
 Use LOD Laundromat as a “good approximation” (still
querying 650K datasets)
36
select distinct ?x {
?x rdfs:label "Tim Berners-Lee"
}
37
LOD
Laundromat
Dataset 1
N-Triples
(zip)
Dataset 650K
N-Triples
(zip)
Linked
Open Data
SPARQL
endpoint
(metadata)
LOD Laundromat
But what about Web-scale queries
38
LOD-a-lot
- flashforward -
But what about Web-scale queries
But one could be really hungry
39
https://hwy55burgers.wordpress.com/tag/food-challenge/
LOD-a-lot
40
LOD
Laundromat
Dataset 1
N-Triples
(zip)
Dataset 650K
N-Triples
(zip)
Linked
Open Data
LOD-a-lot
SPARQL
endpoint
(metadata)
LOD-a-lot
Kudos Javier D. Fernandez, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias
28B triples
LOD-a-lot (some numbers)
Disk size:
 HDT: 304 GB
 HDT-FoQ (additional indexes): 133 GB
Memory footprint (to query):
 15.7 GB of RAM (3% of the size)
 144 seconds loading time
 8 cores (2.6 GHz), RAM 32 GB, SATA HDD on Ubuntu 14.04.5 LTS
LDF page resolution in milliseconds.
41
305€
(LOD-a-lot creation took 64 h & 170GB RAM. HDT-FoQ took 8 h & 250GB RAM)
42
LOD-a-lot
https://datahub.io/dataset/lod-a-lot
LOD-a-lot (some use cases)
 Query resolution at Web scale
 Evaluation and Benchmarking
 No excuse 
 RDF metrics and analytics
43
subjects predicates objects
LOD-a-lot (ACKs)
44
A good recipe for a WS Keynote
Ingredients (for approx. 20 persons)
 A motivated speaker
 Some knowledge in the area
 An engaged audience
 Slides (number at your convenience)
Method
 Present yourself
 Set the context, give an overall picture of the area
 Touch some of the topics of the event
 Focus the discussion- Sell your work
 Devise future developments in the area
• Mix everything with humour
G3b
G1b
Linked Open Data
Cloud
Linked Closed Data
Cloud
dbpedia
G3a G4a
G1a G2a
G1c G2c
G2b
1) Linked Open/Close Data
“Deep Semantic Web”
1) Linked Open/Close Data
1) Linked Open/Close Data
 A) Exchange: Encryption + HDT (hdtcrypt)
48
49
1) Linked Open/Close Data
 B) A secure LD Endpoint
ESWC’17, THU 16:30-17:00
Self-Enforcing Access Control for Encrypted RDF
Javier D. Fernández, Sabrina Kirrane, Axel Polleres and
Simon Steyskal
2) RDF evolution at Scale
ANDREAS HARTH - STREAM REASONING IN MIXED REALITY APPLICATIONS, STREAM REASONING WORKSHOP 2015
Number of
sources
Update rate
month
year
week
day
hour
minute
second
104 105 106101100 102 103
DBpedia
BTC
Dyldo
Internet
of Things
Virtual/Augmented
Reality
versions?
LOD-a-lot
Managing the Evolution and
Preservation of the Data Web (FP7)
Preserving Linked Data (FP7)
last few years:
51
Research projects
Archives
Tools
Benchmarking
one of the fundamental problems in the Web of Data
BEnchmark of RDF ARchives
2) RDF evolution at Scale
Use mappings to update
infoboxes and track
pages that need
updating.
3) Ontology-based Data Management
Use case: Dbpedia & SPARQL Update to maintain Wikipedia?
Our approach to OBDM over curated sources
1. Ensure consistency in all cases, automatically resolve
updates on the best-effort basis.
2. Learn from existing data and from principled belief
revision semantics.
 E.g.: many football players with only one foaf:name in
English DBpedia have both name and full name Infobox
properties set.
3. Record, extract and apply best / typical practices.
name foaf:name
full_name
A minimal-change insert translation
would only update one infobox
property.
ESWC’17, TUE 12:00-12:30- Updating Wikipedia via Dbpedia Mappings and
SPARQL. Albin Ahmeti, Javier D Fernández, Axel Polleres and Vadim Savenkov
3) Ontology-based Data Management
A good recipe for a WS Keynote
Ingredients (for approx. 20 persons)
 A motivated speaker
 Some knowledge in the area
 An engaged audience
 Slides (number at your convenience)
Method
 Present yourself
 Set the context, give an overall picture of the area
 Touch some of the topics of the event
 Focus the discussion- Sell your work
 Devise future developments in the area
• Mix everything with humour
Dept. of Information Systems & Operations
Institute for Information Business
Welthandelsplatz 1, 1020 Vienna, Austria
DR. Javier D. Fernández
T +43-1-313 36-5241
F +43-1-313 36-739
jfernand@wu.ac.at
www.ai.wu.ac.at
Thanks!
 Big (Semantic) Data
 Versions
 Evolving Data
 Encryption
 Compression
rdfhdt.org

More Related Content

PPTX
Efficient RDF Interchange (ERI) Format for RDF Data Streams
PPTX
Triple Stores
PDF
Linked Open Data: A simple how-to
PPTX
Timbuctoo 2 EASY
PPTX
SWT Lecture Session 2 - RDF
PDF
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
PDF
Linked (Open) Data
PPTX
SWT Lecture Session 8 - Rules
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Triple Stores
Linked Open Data: A simple how-to
Timbuctoo 2 EASY
SWT Lecture Session 2 - RDF
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Linked (Open) Data
SWT Lecture Session 8 - Rules

What's hot (16)

PDF
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...
PDF
Another RDF Encoding Form
PPT
Introduction To RDF and RDFS
PPTX
Virtuoso -- The Prometheus of RDF
PDF
20160818 Semantics and Linkage of Archived Catalogs
PDF
The web of interlinked data and knowledge stripped
PDF
Applications of Word Vectors in Text Retrieval and Classification
PDF
Verifying Integrity Constraints of a RDF-based WordNet
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
PDF
Linked Open Data Visualization
PPTX
DLF 2015 Presentation, "RDF in the Real World."
PDF
morph-LDP: An R2RML-based Linked Data Platform implementation
PPTX
Democratizing Big Semantic Data management
PPTX
Fedora Migration Considerations
PPTX
LD4KD 2015 - Demos and tools
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...
Another RDF Encoding Form
Introduction To RDF and RDFS
Virtuoso -- The Prometheus of RDF
20160818 Semantics and Linkage of Archived Catalogs
The web of interlinked data and knowledge stripped
Applications of Word Vectors in Text Retrieval and Classification
Verifying Integrity Constraints of a RDF-based WordNet
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Linked Open Data Visualization
DLF 2015 Presentation, "RDF in the Real World."
morph-LDP: An R2RML-based Linked Data Platform implementation
Democratizing Big Semantic Data management
Fedora Migration Considerations
LD4KD 2015 - Demos and tools
Ad

Similar to Scaling the (evolving) web data –at low cost- (20)

PPTX
The Semantic Web Exists. What Next?
PDF
Linked Data (1st Linked Data Meetup Malmö)
PDF
Linked Data
PDF
The Web of Data: The W3C Semantic Web Initiative
PPSX
The Web of data and web data commons
ODP
State of the Semantic Web
PDF
Methodological Guidelines for Publishing Linked Data
PPTX
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
PDF
Jabes 2011 - Conférence inaugurale "Linked Open Data : opportunités et défis"
PDF
Linked Data 1st Edition David Wood Marsha Zaidman Luke Ruth Michael Hausenblas
PPTX
Enterprise knowledge graphs
PPTX
Linked Energy Data Generation
PDF
Some news about the SW
PDF
Using the Semantic Web Stack to Make Big Data Smarter
PPTX
Consuming Linked Data SemTech2010
PPTX
Linked Data Tutorial (Florianópolis)
PDF
Implementing Linked Data in Low-Resource Conditions
PDF
Standardizing for Open Data
PDF
Linked Data 1st Edition Tom Heath Christian Bizer
PPTX
2011 05-01 linked data
The Semantic Web Exists. What Next?
Linked Data (1st Linked Data Meetup Malmö)
Linked Data
The Web of Data: The W3C Semantic Web Initiative
The Web of data and web data commons
State of the Semantic Web
Methodological Guidelines for Publishing Linked Data
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
Jabes 2011 - Conférence inaugurale "Linked Open Data : opportunités et défis"
Linked Data 1st Edition David Wood Marsha Zaidman Luke Ruth Michael Hausenblas
Enterprise knowledge graphs
Linked Energy Data Generation
Some news about the SW
Using the Semantic Web Stack to Make Big Data Smarter
Consuming Linked Data SemTech2010
Linked Data Tutorial (Florianópolis)
Implementing Linked Data in Low-Resource Conditions
Standardizing for Open Data
Linked Data 1st Edition Tom Heath Christian Bizer
2011 05-01 linked data
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx

Scaling the (evolving) web data –at low cost-

  • 1. Scaling the (evolving) web data –at low cost- Javier D. Fernández QuWeDa 2017: Querying the Web of Data Kosice, 29/05/2017
  • 2. A good recipe for a WS Keynote Ingredients (for approx. 20 persons)  A motivated speaker  Some knowledge in the area  An engaged audience  Slides (number at your convenience) Method  Present yourself  Set the context, give an overall picture of the area  Touch some of the topics of the event  Focus the discussion- Sell your work  Devise future developments in the area • Mix everything with jokes
  • 3. About me:  since 2015 @WU, Inst. for Information Business Research interest: Semantic Web, Open Data, Big (Semantic) Data Management, Databases, Data Compression, Privacy and Security  https://www.wu.ac.at/en/infobiz/team/fernandez/ MadridValladolid Santiago Rome 3 Óscar CorchoPablo de la Fuente Miguel A. Martínez-Prieto Claudio Gutiérrez Maurizio Lenzerini Vienna Axel Polleres
  • 4. A good recipe for a WS Keynote Ingredients (for approx. 20 persons)  A motivated speaker  Some knowledge in the area  An engaged audience  Slides (number at your convenience) Method  Present yourself  Set the context, give an overall picture of the area  Touch some of the topics of the event  Focus the discussion- Sell your work  Devise future developments in the area • Mix everything with humour
  • 5. 5 The Web of Data Eco System
  • 6. The Web of Data Eco System  First, we better know what we can offer…  What is the Semantic Web/Web of Data/Linked Data?  Who are we? What have we done so far?  What we haven‘t done so far? 6 Linked Data Semantic Web Open Data Big Data
  • 7. (Big Semantic Data: Linked Data vs. Big Data)  Overlaps:  LD as a whole is big (38B-150B triples)  No rigid (e.g., relational) data model  Big Data technologies (e.g., Hadoop) are used to handle LD  LD can represent knowledge extracted from big unstructured data (specially to deal with variety)  Key Differences:  Individual linked data sets are typically not "big" per se (e.g., English DBpedia dump (zip) currently < 5 GB)  LD is structured, single data model (RDF), "big data lakes" are typically neither  Big data based on distributed data infrastructures within an organization (e.g., Hadoop clusters), LD creates a decentralized, globally distributed data infrastructure
  • 8. Let’s study the community… Survey practitioner needs, technological challenges, and open research questions on the use of Linked Data  Austrian FFG ICT of the Future project (exploratory study)  Consortium: IDC Austria, Technical University of Vienna, University of Economy Vienna, Semantic Web Company  Project ended in Dec 2016: https://www.linked-data.at/ Standards*Requirements Literature research* * Special kudos to Sabrina Kirrane and Axel Polleres for the community analysis
  • 9. Interviews  23 interviews:  Domains  Consulting, Engineering, Environment, Finance and Insurance, Government, Healthcare, ICT, IT, Media, Pharmaceutical, Professional Services, Real Estate, Research, Startup, Tourism, Transports & Logistics  Roles  Business Intelligence, CEO, Chief Engineer, Data and Systems Architect, Data Scientist, Director Information Management, Enterprise Architect, Founder, General Secretary, Governance, Risk & Compliance Manager, Head of Communications and Media, Head of Development, Head of HR, Head of R&D, Innovation Manager, Information Architect, IT Project Manager, Management, Managing director, Marketing Analyst, Principle System Analyst, Project Coordinator, Researcher, Technical Specialist
  • 10. Technologies in need… Analytics Computational linguistics & NLP Concept tagging & annotation Data integration Data management Dynamic data / streaming Extraction, data mining, text mining, entity extraction Logic, formal languages & reasoning Human- Computer Interaction & visualization Knowledge representation Machine learning Ontology/thesaur us/taxonomy management Quality & Provenance Recommendation Robustness, scalability, optimization and performance Searching, browsing & exploration Security and privacy System engineering We ended with most areas of the SW
  • 12. Standards Toolbox (incl. W3C member submissions)
  • 16. What can we offer? Community Analysis  Monitoring SW community major venues (2006-2015):  ISWC (since 2006), ESWC (since 2006), SEMANTiCS (since 2007), JWS (since 2006), SWJ (since 2010)  3 seminal papers: 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
  • 18. Topic Categorisation Interestingly, the same “empty” topics in standards
  • 19. Semantic Web/Linked Data over time… Subtopics: Expressing Meaning Knowledge Representation Ontologies Agents
  • 21. Semantic Web/Linked Data over time… Early adopters: MITRE Chevron British Telecom Boeing Ordnance Survey Eli Lily Pfizer Agfa Food and Drug Administration National Institutes of Health Software adopters/products: Oracle Adobe Altova OpenLink TopQuadrant Software AG Aduna Software Protége SAPHIRE
  • 22. LD Adopters - Companies
  • 23. LD Adopters - Companies
  • 24. LD Adopters - Companies 0 200 400 600 800 1000 1200 1400 1600 Google Oracle Yahoo SAP IEEE Intelligent Systems Franz Bing Expert System IBM Research Poolparty Occurrences Companies Conference Sponsors that appear in papers 2006-2015
  • 25. To whom we can sell our technology
  • 26. Semantic Web/Linked Data over time… The authors claim that "early research has transitioned into these larger, more applied systems, today’s Semantic Web research is changing: It builds on the earlier foundations but it has generated a more diverse set of pursuits”.
  • 27. Big Semantic Data and applied systems
  • 28. Big Semantic Data and applied systems
  • 29. Other topics of the QuWeDa workshop
  • 30. A good recipe for a WS Keynote Ingredients (for approx. 20 persons)  A motivated speaker  Some knowledge in the area  An engaged audience  Slides (number at your convenience) Method  Present yourself  Set the context, give an overall picture of the area  Touch some of the topics of the event  Focus the discussion- Sell your work  Devise future developments in the area • Mix everything with humour
  • 31. Motivation  Publication, Exchange and Consumption of large RDF datasets  Most RDF formats (N3, XML, Turtle) are text serializations, designed for human readability (not for machines)  Verbose = High costs to write/exchange/parse  A basic offline search = (decompress)+ index the file + search  Lightweight Binary RDF (HDT)  Highly compact serialization of RDF  Allows fast RDF retrieval in compressed space (without prior decompression)  Includes internal indexes to solve basic queries with small (3%) memory footprint.  Very fast on basic queries (triple patterns), x 1.5 faster than Virtuoso, RDF3x.  Complex queries (joins) on the same scale of current solutions (Virtuoso, RDF3x). 431 M.triples~ 63 GB DBpedia NT + gzip 5 GB HDT 6.6 GB HDT + gzip 2.7 GB rdfhdt.org
  • 33. The real motivation http://www.kunsan.af.mil/News/ Article/413995/serving-the-masses/ Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking
  • 34. The real motivation http://www.kunsan.af.mil/News/ Article/413995/serving-the-masses/ Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking consume
  • 35. Applications  Compress and share ready-to-consume RDF datasets  Transfer large data between servers  Embedded Systems & Phones  Fast –low cost- SPARQL Query Engine  Via LDF  HDT-Jena  HDT-Cliopatra
  • 36. But what about Web-scale queries  E.g. retrieve all entities in LOD with the label “Tim Berners-Lee“  Options:  Crawl and index LOD locally (-no-)  Follow-your-nose (where should I start?)  Federated querying (as good as the endpoints you query)  Use LOD Laundromat as a “good approximation” (still querying 650K datasets) 36 select distinct ?x { ?x rdfs:label "Tim Berners-Lee" }
  • 38. But what about Web-scale queries 38 LOD-a-lot - flashforward -
  • 39. But what about Web-scale queries But one could be really hungry 39 https://hwy55burgers.wordpress.com/tag/food-challenge/ LOD-a-lot
  • 40. 40 LOD Laundromat Dataset 1 N-Triples (zip) Dataset 650K N-Triples (zip) Linked Open Data LOD-a-lot SPARQL endpoint (metadata) LOD-a-lot Kudos Javier D. Fernandez, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias 28B triples
  • 41. LOD-a-lot (some numbers) Disk size:  HDT: 304 GB  HDT-FoQ (additional indexes): 133 GB Memory footprint (to query):  15.7 GB of RAM (3% of the size)  144 seconds loading time  8 cores (2.6 GHz), RAM 32 GB, SATA HDD on Ubuntu 14.04.5 LTS LDF page resolution in milliseconds. 41 305€ (LOD-a-lot creation took 64 h & 170GB RAM. HDT-FoQ took 8 h & 250GB RAM)
  • 43. LOD-a-lot (some use cases)  Query resolution at Web scale  Evaluation and Benchmarking  No excuse   RDF metrics and analytics 43 subjects predicates objects
  • 45. A good recipe for a WS Keynote Ingredients (for approx. 20 persons)  A motivated speaker  Some knowledge in the area  An engaged audience  Slides (number at your convenience) Method  Present yourself  Set the context, give an overall picture of the area  Touch some of the topics of the event  Focus the discussion- Sell your work  Devise future developments in the area • Mix everything with humour
  • 46. G3b G1b Linked Open Data Cloud Linked Closed Data Cloud dbpedia G3a G4a G1a G2a G1c G2c G2b 1) Linked Open/Close Data “Deep Semantic Web”
  • 48. 1) Linked Open/Close Data  A) Exchange: Encryption + HDT (hdtcrypt) 48
  • 49. 49 1) Linked Open/Close Data  B) A secure LD Endpoint ESWC’17, THU 16:30-17:00 Self-Enforcing Access Control for Encrypted RDF Javier D. Fernández, Sabrina Kirrane, Axel Polleres and Simon Steyskal
  • 50. 2) RDF evolution at Scale ANDREAS HARTH - STREAM REASONING IN MIXED REALITY APPLICATIONS, STREAM REASONING WORKSHOP 2015 Number of sources Update rate month year week day hour minute second 104 105 106101100 102 103 DBpedia BTC Dyldo Internet of Things Virtual/Augmented Reality versions? LOD-a-lot
  • 51. Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) last few years: 51 Research projects Archives Tools Benchmarking one of the fundamental problems in the Web of Data BEnchmark of RDF ARchives 2) RDF evolution at Scale
  • 52. Use mappings to update infoboxes and track pages that need updating. 3) Ontology-based Data Management Use case: Dbpedia & SPARQL Update to maintain Wikipedia?
  • 53. Our approach to OBDM over curated sources 1. Ensure consistency in all cases, automatically resolve updates on the best-effort basis. 2. Learn from existing data and from principled belief revision semantics.  E.g.: many football players with only one foaf:name in English DBpedia have both name and full name Infobox properties set. 3. Record, extract and apply best / typical practices. name foaf:name full_name A minimal-change insert translation would only update one infobox property. ESWC’17, TUE 12:00-12:30- Updating Wikipedia via Dbpedia Mappings and SPARQL. Albin Ahmeti, Javier D Fernández, Axel Polleres and Vadim Savenkov 3) Ontology-based Data Management
  • 54. A good recipe for a WS Keynote Ingredients (for approx. 20 persons)  A motivated speaker  Some knowledge in the area  An engaged audience  Slides (number at your convenience) Method  Present yourself  Set the context, give an overall picture of the area  Touch some of the topics of the event  Focus the discussion- Sell your work  Devise future developments in the area • Mix everything with humour
  • 55. Dept. of Information Systems & Operations Institute for Information Business Welthandelsplatz 1, 1020 Vienna, Austria DR. Javier D. Fernández T +43-1-313 36-5241 F +43-1-313 36-739 jfernand@wu.ac.at www.ai.wu.ac.at Thanks!  Big (Semantic) Data  Versions  Evolving Data  Encryption  Compression rdfhdt.org

Editor's Notes

  • #7: After some years pushing for the Web of Data, now it should be the moment to see the ecosystem and think what have we done so far, and what we haven‘t done so far
  • #20: Outlines quite clearly what they thought back then the Semantic Web should be…
  • #51: LEDS:Linked Enterprise Data Services