SlideShare a Scribd company logo
SIB . 23.03.2011 . Page 1                         http://lod2.eu




WP2
Storing and Querying
Very Large Knowledge Bases
                             Vienna Update
                             March 2012 – M18

                             Peter Boncz


                                                http://lod2.eu
SIB . 23.03.2011 . Page 2                                             http://lod2.eu




 Table of Contents

 • WP2 Refresher
 • LOD Cloud Hosted on the Knowledge Store Cluster
    * 50B mark reached, column-store Virtuoso deployed
 • State of the Art LOD Laboratory (“Benchmarking”)
    * LDBC – RDF Store Industry council
    * BSBM at large scale
    * RDF-H + Social Intelligence Benchmark (SIB)
 • Technical work
    * column-store Virtuoso  cluster version
    * recycling query results
 • Next up
   * LOD cloud @250B triples
    * Virtuoso: adaptive query optimizer (and more)
    * first MonetDB/SPARQL version (RDF clustering, graph indexing)
LOD2 Title . 02.09.2010 . Page 3                          http://lod2.eu




 WP2 Organization

 CWI (MonetDB):
 • Peter Boncz (also in VUA group of Frank v Harmelen)
 • Duc Pham Minh (Phd student)
 • Irini Fundulaki (1-year sabbatical from FORTH)

 OpenLink (Virtuoso):
 • Orri Erling
 • Hugh Williams
 • Ivan Mikhailov

 + FU Berlin (BSBM)
 + DERI (BSBM text+ LOD cloud + text retrieval/sindice)
 + ULEI (DBpedia benchmark)
SIB . 23.03.2011 . Page 4                              http://lod2.eu


      WP2
      Storing and Querying Very Large Knowledge Bases

Goal: enabling large-scale, feature-rich & enterprise-ready Linked
  Data management solutions

Database Partners in LOD2:
CWI: Leading open source analytics RDBMS
OpenLink: Leading Linked data deployment platform

Technological Excellence:
Creating and publishing metrics for choosing RDF solutions
Bringing Column Store Technology for Business Intelligence on RDF
Ground-breaking database innovations for RDF stores
   (Dynamic Query optimization, Adaptive Caching of Joins,
   Optimized Graph Processing, Cluster/Cloud scalability)
LOD2 Title . 02.09.2010 . Page 5                   http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 LOD cloud cache scalability
 • M0: 20B triples
 • M12: 50B triples
 • M24: 250B triples
 • M36: 1T triples

 D2.4 completed: 50B triples in LOD cache @ DERI
 First deployment of Virtuoso7 Cluster
 • Currently hosting about 55 billion triples
 • 8 node Virtuoso v7 (column store) Cluster
 • 384GB RAM
 • 2TB Disk Storage
 • 14B/quads, excl literals

 Next up:
 • hardware provisioning for 250B and 1T triples
  (need 512GB RAM resp. 2TB RAM somewhere)
LOD2 Title . 02.09.2010 . Page 6                         http://lod2.eu




 Task 2.1: State of the Art, Evaluation & Benchmarking

 Benchmarking

 • creating new benchmarks
      • BSBM-BI (FU Berlin)
      • DBpedia Benchmark (ULEI) – best paper award
      • RDF-H (OGL,CWI)
      • Social Intelligence Benchmark (OGL,CWI)
 • running benchmark evaluations
      • BSBM on a large cluster cluster (Lisa @ SARA)
      • BSBM on large single-server (40cores, 1TB RAM)
 • creating industry consensus
      • Benchmark Auditing Service
      • LOD Benchmark Council
LOD2 Title . 02.09.2010 . Page 7                               http://lod2.eu




 BSBM Large Scale Experiments (still ongoing..)

 New Aspects:
 • The Business Intelligence Use Case (BI)
 • Benchmark Rules
 • BSBM V3 Results
 • trying cluster versions

 SARA LISA cluster
 • experiments with up to 64 nodes

 VectorWise high-end server
 • 40-core machine with 1TB RAM

 Benchmarked at SARA and Vectorwise
 4store 1.1.2      Garlik       http://4store.org/
 BigData r4169     SYSTAP LLC   http://www.systap.com/bigdata.htm
 BigOwlim 3.4.3129 OntoText     http://www.ontotext.com/owlim/
 Jena TDB 0.8.9    openjena.org http://www.openjena.org/TDB/
 Fuseki 0.1.0      openjena.org http://openjena.org/wiki/Fuseki
 Virtuoso 7.0      OpenLink     http://virtuoso.openlinksw.com/
LOD2 Title . 02.09.2010 . Page 9                           http://lod2.eu




           Social Intelligence Benchmark




                                       14 dictionaries
                                        of real data
Facebook schema style
                                     Realistic scenario
                                        simulation

         Synthetic Generated Data                         Linked Open Data
LOD2 Title . 02.09.2010 . Page 11                                  http://lod2.eu




 Technical Work: Recycling (D2.4)

 Dynamic caching of intermediate query results
 • SPARQL problem: hard to index workload / expensive backward chaining
 Idea: compute once, re-use many times
LOD2 Title . 02.09.2010 . Page 13                           http://lod2.eu




 Technical Work: Virtuoso 7

 Major now upcoming release V7, due for release in 2012

 • column store technology:
       • aggressive compression  more data fits in RAM
       • vectored execution  things run faster
 • elastic cluster implementation
       • partitions can migrate across nodes
 • bringing computation to the data
       • arbitrary recursive functions in the cluster
 • geospatial support
       • full openGIS support, R-tree backed, EWKT format
 • future enhancements
       • adaptive query optimization (CWI ROX)
       •re-use of intermediates (CWI recycling)
       • using SSDs as cache
LOD2 Title . 02.09.2010 . Page 14                             http://lod2.eu




 Next 6 months


 Virtuoso: sampled query optimizer
 • query optimization in SPARQL is difficult (no stats)
 • use adaptive, run-time, query optimization with sampling

 MonetDB and SPARQL
 • First version in sight (cooperation with FORTH)
 • research tracks
       • RDF clustering on Characteristic Sets
       • correlated join path indexing

 LOD cache at 250B triples
 • what triples to use?
 • what hardware to use? (need 512GB RAM)
SIB . 23.03.2011 . Page 15            http://lod2.eu




      Contact

      Address

      Centrum Wiskunde Informatica (CWI)
      Science Park 123
      1098 XG Amsterdam
      The Netherlands

      monetdb.cwi.nl




Thanks for your attention!
LOD2 Title . 02.09.2010 . Page 16                                  http://lod2.eu




 LOD2 Benchmark Auditing Service

 Benchmarking needs of SPARQL engine vendors:
 • vendors want to publish in their own timescale
 • using new or upcoming releases (not yet public)
 • using properly tuned settings and hardware to their solution
 • yet need credibility (is it fair)

 Tournaments organized by one institution have
 • bad timing, wrong version, one more bug to fix, etc
 • not the right hardware or settings
 • may become a legal liability once matters become more serious

 LOD2 should reach out to the SPARQL technical community and
 provide independent benchmark auditing services
 • start with BSBM  working on Auditing Rules Document
 • maybe other benchmarks later

More Related Content

PPTX
Storage Engine Wars at Parse
PPT
IWMW 1998: Deploying new web technologies
PDF
RocksDB storage engine for MySQL and MongoDB
PDF
Wed garcia hands_on_d_bpedia preservation
PDF
Fractal Tree Indexes : From Theory to Practice
PDF
Change Tracking in Knowledge Organization Systems with skos-history
PDF
skos-history: Tracking the evolution of Knowledge Organization Systems
PPTX
The Hive Think Tank: Rocking the Database World with RocksDB
Storage Engine Wars at Parse
IWMW 1998: Deploying new web technologies
RocksDB storage engine for MySQL and MongoDB
Wed garcia hands_on_d_bpedia preservation
Fractal Tree Indexes : From Theory to Practice
Change Tracking in Knowledge Organization Systems with skos-history
skos-history: Tracking the evolution of Knowledge Organization Systems
The Hive Think Tank: Rocking the Database World with RocksDB

Viewers also liked (9)

PDF
México Edelman Trust Barometer 2013
PPT
Social Media for Small Business
PPT
Podcast
PPTX
Retribution Storyboard1 Pt 1
PDF
How to create cned school servr 40000 total
DOCX
PDF
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
PPTX
バリューチェーン分析・VRIO分析
PDF
JMESI Bioethics Two Applications
México Edelman Trust Barometer 2013
Social Media for Small Business
Podcast
Retribution Storyboard1 Pt 1
How to create cned school servr 40000 total
Ares José Luis - Juicio abreviado: lo que hay que saber - 2010
バリューチェーン分析・VRIO分析
JMESI Bioethics Two Applications
Ad

Similar to LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases (20)

PPTX
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
PPT
PDF
LOD2 Webinar Series: Virtuoso 7
PDF
Open Data Conference - Sören Auer - Linked Open Data
PDF
LOD2 General Presentation 2012
PDF
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
PPTX
SPARQL and Linked Data Benchmarking
PDF
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
PDF
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
PDF
Limes webinar
ODP
LOD2 webinar series: Virtuoso by OpenLink Software
PDF
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
PDF
Graph-TA 2013 - Josep Lluís Larriba Pey
PPTX
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
PPTX
The Future of LOD
PPTX
EDF2012 Peter Boncz - LOD benchmarking SRbench
PDF
LDBC 8th TUC Meeting: Introduction and status update
PDF
Free Webinar: LOD2 Stack - 1st release
PPTX
Gilbane Boston 2012 Big Data 101
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2 Webinar Series: Virtuoso 7
Open Data Conference - Sören Auer - Linked Open Data
LOD2 General Presentation 2012
LOD2: State of Play WP5 - Linked Data Visualization, Browsing and Authoring
SPARQL and Linked Data Benchmarking
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
Limes webinar
LOD2 webinar series: Virtuoso by OpenLink Software
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
Graph-TA 2013 - Josep Lluís Larriba Pey
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
The Future of LOD
EDF2012 Peter Boncz - LOD benchmarking SRbench
LDBC 8th TUC Meeting: Introduction and status update
Free Webinar: LOD2 Stack - 1st release
Gilbane Boston 2012 Big Data 101
Ad

More from LOD2 Creating Knowledge out of Interlinked Data (20)

PDF
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
PPTX
LOD2 Webinar Series: 3rd relase of the Stack
PPT
PDF
LOD2 Webinar Series: DBpedia Spotlight
PDF
LOD2 Webinar Series: publicdata.eu and CKAN
PDF
LOD2 Webinar Series: Zemanta / Open refine
PPT
LOD2 Webinar Series: LOD2 in information and publishing industry
PPT
LOD2 Webinar Series: PoolParty
PPT
LOD2 Webinar Series: D2R and Sparqlify
PDF
LOD2 Plenary Vienna 2012: WP12 - Project Management
PPT
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
PDF
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
ODP
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
PPTX
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
PPT
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
PPTX
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
PDF
LOD2 Plenary Vienna 2012: WP5 - Linked Data Browsing, Visualization and Autho...
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: LOD2 in information and publishing industry
LOD2 Webinar Series: PoolParty
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Plenary Vienna 2012: WP12 - Project Management
LOD2 Plenary Vienna 2012: WP10 - Training, Dissemination, Community Building,...
LOD2 Plenary Vienna 2012: WP9A - LOD for a Distributed Marketplace for Public...
LOD2 Plenary Vienna 2012: WP9 publicdata.eu – Publishing Governmental Informa...
LOD2 Plenary Vienna 2012: WP8: Linked Open Data for Enterprise Data Web
LOD2 Plenary Vienna 2012: WP7 - Linked Open Data for Media and Publishing
LOD2 Plenary Vienna 2012: WP6 - Interfaces, Integration & LOD2 Stack
LOD2 Plenary Vienna 2012: WP5 - Linked Data Browsing, Visualization and Autho...

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Structure & Organelles in detailed.
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
Complications of Minimal Access Surgery at WLH
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Classroom Observation Tools for Teachers
Cell Structure & Organelles in detailed.
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
O7-L3 Supply Chain Operations - ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pre independence Education in Inndia.pdf
Pharma ospi slides which help in ospi learning
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial disease of the cardiovascular and lymphatic systems
102 student loan defaulters named and shamed – Is someone you know on the list?
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPH.pptx obstetrics and gynecology in nursing

LOD2 Plenary Vienna 2012: WP2 - Storing and Querying Very Large Knowledge Bases

  • 1. SIB . 23.03.2011 . Page 1 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Vienna Update March 2012 – M18 Peter Boncz http://lod2.eu
  • 2. SIB . 23.03.2011 . Page 2 http://lod2.eu Table of Contents • WP2 Refresher • LOD Cloud Hosted on the Knowledge Store Cluster * 50B mark reached, column-store Virtuoso deployed • State of the Art LOD Laboratory (“Benchmarking”) * LDBC – RDF Store Industry council * BSBM at large scale * RDF-H + Social Intelligence Benchmark (SIB) • Technical work * column-store Virtuoso  cluster version * recycling query results • Next up * LOD cloud @250B triples * Virtuoso: adaptive query optimizer (and more) * first MonetDB/SPARQL version (RDF clustering, graph indexing)
  • 3. LOD2 Title . 02.09.2010 . Page 3 http://lod2.eu WP2 Organization CWI (MonetDB): • Peter Boncz (also in VUA group of Frank v Harmelen) • Duc Pham Minh (Phd student) • Irini Fundulaki (1-year sabbatical from FORTH) OpenLink (Virtuoso): • Orri Erling • Hugh Williams • Ivan Mikhailov + FU Berlin (BSBM) + DERI (BSBM text+ LOD cloud + text retrieval/sindice) + ULEI (DBpedia benchmark)
  • 4. SIB . 23.03.2011 . Page 4 http://lod2.eu WP2 Storing and Querying Very Large Knowledge Bases Goal: enabling large-scale, feature-rich & enterprise-ready Linked Data management solutions Database Partners in LOD2: CWI: Leading open source analytics RDBMS OpenLink: Leading Linked data deployment platform Technological Excellence: Creating and publishing metrics for choosing RDF solutions Bringing Column Store Technology for Business Intelligence on RDF Ground-breaking database innovations for RDF stores (Dynamic Query optimization, Adaptive Caching of Joins, Optimized Graph Processing, Cluster/Cloud scalability)
  • 5. LOD2 Title . 02.09.2010 . Page 5 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking LOD cloud cache scalability • M0: 20B triples • M12: 50B triples • M24: 250B triples • M36: 1T triples D2.4 completed: 50B triples in LOD cache @ DERI First deployment of Virtuoso7 Cluster • Currently hosting about 55 billion triples • 8 node Virtuoso v7 (column store) Cluster • 384GB RAM • 2TB Disk Storage • 14B/quads, excl literals Next up: • hardware provisioning for 250B and 1T triples (need 512GB RAM resp. 2TB RAM somewhere)
  • 6. LOD2 Title . 02.09.2010 . Page 6 http://lod2.eu Task 2.1: State of the Art, Evaluation & Benchmarking Benchmarking • creating new benchmarks • BSBM-BI (FU Berlin) • DBpedia Benchmark (ULEI) – best paper award • RDF-H (OGL,CWI) • Social Intelligence Benchmark (OGL,CWI) • running benchmark evaluations • BSBM on a large cluster cluster (Lisa @ SARA) • BSBM on large single-server (40cores, 1TB RAM) • creating industry consensus • Benchmark Auditing Service • LOD Benchmark Council
  • 7. LOD2 Title . 02.09.2010 . Page 7 http://lod2.eu BSBM Large Scale Experiments (still ongoing..) New Aspects: • The Business Intelligence Use Case (BI) • Benchmark Rules • BSBM V3 Results • trying cluster versions SARA LISA cluster • experiments with up to 64 nodes VectorWise high-end server • 40-core machine with 1TB RAM Benchmarked at SARA and Vectorwise 4store 1.1.2 Garlik http://4store.org/ BigData r4169 SYSTAP LLC http://www.systap.com/bigdata.htm BigOwlim 3.4.3129 OntoText http://www.ontotext.com/owlim/ Jena TDB 0.8.9 openjena.org http://www.openjena.org/TDB/ Fuseki 0.1.0 openjena.org http://openjena.org/wiki/Fuseki Virtuoso 7.0 OpenLink http://virtuoso.openlinksw.com/
  • 8. LOD2 Title . 02.09.2010 . Page 9 http://lod2.eu Social Intelligence Benchmark 14 dictionaries of real data Facebook schema style Realistic scenario simulation Synthetic Generated Data Linked Open Data
  • 9. LOD2 Title . 02.09.2010 . Page 11 http://lod2.eu Technical Work: Recycling (D2.4) Dynamic caching of intermediate query results • SPARQL problem: hard to index workload / expensive backward chaining Idea: compute once, re-use many times
  • 10. LOD2 Title . 02.09.2010 . Page 13 http://lod2.eu Technical Work: Virtuoso 7 Major now upcoming release V7, due for release in 2012 • column store technology: • aggressive compression  more data fits in RAM • vectored execution  things run faster • elastic cluster implementation • partitions can migrate across nodes • bringing computation to the data • arbitrary recursive functions in the cluster • geospatial support • full openGIS support, R-tree backed, EWKT format • future enhancements • adaptive query optimization (CWI ROX) •re-use of intermediates (CWI recycling) • using SSDs as cache
  • 11. LOD2 Title . 02.09.2010 . Page 14 http://lod2.eu Next 6 months Virtuoso: sampled query optimizer • query optimization in SPARQL is difficult (no stats) • use adaptive, run-time, query optimization with sampling MonetDB and SPARQL • First version in sight (cooperation with FORTH) • research tracks • RDF clustering on Characteristic Sets • correlated join path indexing LOD cache at 250B triples • what triples to use? • what hardware to use? (need 512GB RAM)
  • 12. SIB . 23.03.2011 . Page 15 http://lod2.eu Contact Address Centrum Wiskunde Informatica (CWI) Science Park 123 1098 XG Amsterdam The Netherlands monetdb.cwi.nl Thanks for your attention!
  • 13. LOD2 Title . 02.09.2010 . Page 16 http://lod2.eu LOD2 Benchmark Auditing Service Benchmarking needs of SPARQL engine vendors: • vendors want to publish in their own timescale • using new or upcoming releases (not yet public) • using properly tuned settings and hardware to their solution • yet need credibility (is it fair) Tournaments organized by one institution have • bad timing, wrong version, one more bug to fix, etc • not the right hardware or settings • may become a legal liability once matters become more serious LOD2 should reach out to the SPARQL technical community and provide independent benchmark auditing services • start with BSBM  working on Auditing Rules Document • maybe other benchmarks later

Editor's Notes

  • #9: From the aforementioned reasons, we proposed an RDF and graph database benchmark, called Social Intelligence benchmark, that can exploit the advantages of RDF in graph representation. We are aiming at testing the graph database performance on a highly connected graph. As social network is a high profile for graph data management, we design our benchmark over the scenarios of a social network. We try to generate data as realistic as possible with correlations and offer challenging queries over the data correlations.Besides, since a very large amount of useful information is available in many linked-open datasets, we exploit these resources by linking to them.
  • #10: Now, I will describe the data specification of SIB. As Facebook is the most popular social network with more than 800 millions active users, we take the schema style of Facebook as the baseline for designing SIB. For generating realistic data, we use 14 dictionaries that we build from real data. These dictionaries cover various domains, for example, geographical information, personal names,..SIB data is designed so that it can simulate realistic scenario including the real behaviors of the users and the characteristics of data distributions in social networks.As we mention before, our synthetic data is linked with well-known linked open data. And here, SIB is linked with DBPedia, one of the largest linked open dataset.
  • #11: I think most of us know FB and even have a Facebook account. The logical schema of our benchmark simulates the Facebook schema in which a user can have many friends, and there are friendships between them. A user can provide many profile information such as his name, where he is studying at, where he is living at. He can also specify his current status, for example, in Relation ship with another user. The user can upload many photo, start a discussion by writing posts, and get a lot of comments from his friends.