SlideShare a Scribd company logo
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Graph Databases Lifecycle Methodology 
and Tool to Support Index/Store 
Versioning
Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi, Nadia Rauch
DISIT Lab
Dipartimento di Ingegneria dell’Informazione, DINFO
Università degli Studi di Firenze
Via S. Marta 3, 50139, Firenze, Italy
Tel: +39-055-2758511, fax: +39-055-2758570
http://www.disit.dinfo.unifi.it alias http://www.disit.org
Paolo.nesi@unifi.it
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 1
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Context and Motivations
• Graph database are taking place for systems exploiting 
knowledge base, KB
– Include a set of ontologies and data instances: static data, 
reconciliation data, real time data, historical data, gelocated
data, etc. 
– Smart city, smart cloud, smart learning, etc.
• KB need complex and non consolidated methodologies 
for their implementation
– Many issues may lead to invalidate a building in favor of new 
version, e.g., looking for more inference, corrections in model 
ontology, changes in historical data, etc.
– Methodologies present iterative processes that lead to 
restructure/rebuild the knowledge base 
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 2
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Context and Motivations (2)
• RDF stores (end they end points) are a way for 
Knowledge Base implementation 
• RDF stores see Ontologies and Instances as triple 
(quadruple) s‐p‐o (context)
• RDF stores presents several problems in managing:
– Versioning and thus deleting triples including the 
inferred triples… at real time performance ?
– Performance in db store building (inference)
– Performance in db store querying (inference)
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 3
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
The paper contributions
• A methodology for knowledge base life cycle 
building and improvement
– addressing more details and data kind and cases of 
the state of the art methodologies 
• A versioning system for RDF store building 
supporting the methodology
– Completely new tool that can save up to 90% of time 
in RDF rebuilding
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 4
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RDF store service: building process
• built incrementally via progressive refinements 
mediating among
– reusing ontological models,
– increasing the capability of making deductions and 
reasoning on the knowledge base concepts,
– maintaining acceptable: query and rendering 
performances, 
– simplifying the design of the front‐end services, 
– flexibility to the arrival of additional data and models 
and/or corrections, 
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 5
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RDF KB life cycle methodology
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 6
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Ontology Construction
• Domain Analysis: 
– concepts, abstractions, aggregation, 
classes vs attributes, etc. 
• Available Ontologies
– Diffusion, Licensing, quality, inference, 
etc.
• Ontology Integration
– Glued concepts, more inference 
• Ontology review
– Conceptual assessment
– Formal verification with metrics and 
tools
• Points from which one has to restart, 
have to be saved
• Integrated version of the ontology goes 
IN USE
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 7
In use
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Km4city example
• reuses: 
– dcterms to set of properties and classes for modeling 
metadata; 
– foaf dedicated to relations among people or groups; 
– schema.org for a description of people and organizations; 
– wgs84_pos representing latitude and longitude; 
– GoodRelations for a description of business entities and their 
locations; 
– OWL‐Time for temporal modeling; 
– OTN for transport aspects; 
– GIS Dictionary, to represent the spatial component of 
geographic features
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 8
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Data ingestion and enrichment
• Static Data can be: 
– Open Data, 
– Historical real time data (from 
– to)
– Enrichment data (to be 
identified)
– Reconciliation Data (to be 
identified)
• Problems may derive from: 
– Inconsistencies instance level ?
– Incompleteness: missing onto 
concepts, missing links, …
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 9
In use
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Real Time Data
• Need to be already verified and 
reconciled 
– No changes in their structure
– No or known variability on 
instances
• May produce large volume of 
Cumulated data 
– can become the substantial part of 
the KB
– cannot be deleted
– Cannot be easily extracted from 
the RDF store, thus historical data 
may be saved for rebuilding
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 10
In use
Histori
cal
triples
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RDF Indexing Flow and Requirements
• There are several reasons to revise/change the 
RDF Indexing and thus the RDF Store itself 
–the ontology and thus the:
• data mapping and triplification invalidating the 
indexing and the materialization of triples
–the data triples coming from ingestion, 
historical, reconciliation, as: 
• a new data mapping, quality improvement, 
• changes in performing reconciliation, enrichment 
and triplification processes. 
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 11
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RDF Index and RDF Index Descriptor
• The RDF Index is substantially the RDF Store 
containing the triple (ontology, static data, etc…)
• The RDF Index Descriptor is:
– the recipe to create the RDF Index and 
– the set of triples adopted to build it
• Since the version of the RDF Store is not an a 
viable task (without redesigning the RDF store) 
we performed the versioning of the RDF Index 
Descriptor
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 12
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Versioning: File and RDF Index
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 13
Contains reconciliation
of triples connecting
parking locations wrt
civic numbers depends
on the ontology and on
the parking area data
sets.
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Verification and Validation
• In a real production of big data RDF store, 
– hundreds of files containing triples are produced
– The restarting from scratch is time consuming, may 
be error prone, may lose the versioning / evolution 
value
• Not all changes can produce consistent RDF KB 
Store.
• The saving of intermediate consistent version 
can lead t o save time in exploiting the saved 
version as restarting point.
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 14
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Example: 4 versions on the same core
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 15
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RDF Index Manager tool
• Keep tracing RDF KB Store Versions, RKBSV, in terms of files of triples, 
index‐description, and RDF Index;
• Maintaining a repository of RKBSVs where they could be stored and 
retrieved; 
• Selecting a RKBSV from the repository for modification, to examine 
changes and the history version, to be used as base for building a new 
version;
• Managing the index descriptor as a list of files containing triples;
• Generating a RDF KB index on the basis of an RKBSV independently 
from the RDF store kind automatically, and in particular for SESAME 
OWLIM and Virtuoso;
• Monitoring the RDF KB index generation and the feeding state;
• Suggest the closest version of the RKBSV with respect to the 
demanded new index in terms of files of triples;
• Avoiding manually managing the script file of indexing, since it is time 
consuming and an error prone process.
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 16
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RIM tool: building monitor during production
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 17
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
RIM tool: during RDF assessment
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 18
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Experimental Results
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 19
Ontologi
es
+ street 
graphs
+ smart city 
Services
+Enrich& 
Reconciliations
+Historical 
data 1 month
Indexing process
Final number of triples 15.809 33.547.501 34.462.930 34.557.142 44.218.719
Final number of Files 12 137 178 185 27794
Added triples with respect to previous version 15809 33.531.692 915.429 94.212 9.661.577
Added Files with respect to previous version 12 125 41 7 27609
OWLIM SE 4.3
Indexing Time without RIM (s) 18 6536 6198 7516 12093
Indexing Time with RIM (s) 11 6029 514 343 5745
% of saved time,  RIM versioning 38,9 7,8 91,7 95,4 52,5
Final Number of triples (including geo + 
inferred) 16062 57.486.956 59.395.432 59.486.748 73.441.126
disk space in Mbyte 310 8669 8936 9039 13110
VIRTUOSO 7.2
Indexing Time without RIM (s) 146 806 964 1000 2487
Indexing Time with RIM (s) 156 833 421 296 1932
% of saved time,  RIM versioning ‐6,8 ‐3,3 56,3 70,4 22,3
Final Number of triples (including geo, no 
inferred) 21.628 35.452.613 36.301.322 36.420.445 46.232.510
disk space in Mbyte 68 1450 1632 1631 2294
GraphDB 6.1
Indexing Time without RIM (s) 9 7818 7929 7671 12915
Indexing Time with RIM (s) 2 6791 454 214 4849
% of saved time,  RIM versioning 77,8 13,1 94,3 97,2 62,45
Final Number of triples (including geo + 
inferred) 15.809 57.486.415 59.394.891 59.487.551 73.441.929
disk space in Mbyte 96 4276 4466 4643 5714
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
New Version
• Support Ontology Licensing
– To take into account in ontology building and in KB RDF 
store usage, querying
• Support Data Licensing
• To help selecting ontologies
• To take into account in KB RDF store usage, querying
• Support Data ingestion process with 
– integrated Data Ingestion Manager
– Maintaining under control the data sets to be included, 
their licenses, triplification, etc. 
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 20
DISIT Lab, Distributed Data Intelligence and Technologies
Distributed Systems and Internet Technologies
Department of Information Engineering (DINFO)
http://www.disit.dinfo.unifi.it
Conclusions
• The RIM model and tool allow:
– Keeping under control and trace the RDF life cycle 
from construction of the ontology to the indexing 
and validation. 
– Reducing the time for indexing (RDF store 
construction) up to the 97% in some cases.
The benefits have been demonstrated for the 
most diffused RDF stores: OWLIM, GraphDB and 
Virtuoso.
DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 21

More Related Content

PDF
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
PDF
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
PDF
Smart Cloud Engine and Solution based on Knowledge Base
PDF
Descobrindo o tesouro escondido nos seus dados usando grafos.
PPT
Shifting the Burden from the User to the Data Provider
PDF
What's all the data about? - Linking and Profiling of Linked Datasets
PDF
Overview la componente ICT vs Big Data
PPT
bonino
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
Smart Cloud Engine and Solution based on Knowledge Base
Descobrindo o tesouro escondido nos seus dados usando grafos.
Shifting the Burden from the User to the Data Provider
What's all the data about? - Linking and Profiling of Linked Datasets
Overview la componente ICT vs Big Data
bonino

What's hot (16)

PPTX
Archaeology and cultural heritage application working group
PDF
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
PPT
LinkedUp - Linked Data & Education
PDF
Machine Support for Interacting with Scientific Publications Improving Inform...
DOC
Advanced Probabilistic Modeling Algorithms for Clustering ...
PPT
Combining a co-occurrence-based and a semantic measure for entity linking
PPTX
Towards Knowledge-Enabled Society
PDF
PPTX
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
PDF
Semantic Web / Linked Data Technologies
PDF
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
PPTX
Linked Data at the Open University: From Technical Challenges to Organization...
PDF
KnowEscape workshop, OKCon 2013
PDF
10 problems 06
PPTX
Doing Clever Things with the Semantic Web
PDF
Publications Fatima Dargam - 09.05.2016
Archaeology and cultural heritage application working group
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
LinkedUp - Linked Data & Education
Machine Support for Interacting with Scientific Publications Improving Inform...
Advanced Probabilistic Modeling Algorithms for Clustering ...
Combining a co-occurrence-based and a semantic measure for entity linking
Towards Knowledge-Enabled Society
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Semantic Web / Linked Data Technologies
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Linked Data at the Open University: From Technical Challenges to Organization...
KnowEscape workshop, OKCon 2013
10 problems 06
Doing Clever Things with the Semantic Web
Publications Fatima Dargam - 09.05.2016
Ad

Viewers also liked (17)

PDF
Km4City: A reusable example of a Metropolitan-Wide Data Platform, MAJORCITIES...
PDF
Aggregatore di Open Data del territorio fiorentino e toscano
PDF
Smart City Strategic Forecast, SmartCity360, Bratislava
PDF
RESOLUTE: Resilience management guidelines and Operationalization applied to ...
PDF
Estrazione e Deduzione della Conoscenza via Modelli Semantici: From Social N...
PDF
Km4City, Smart City Urban Platform, From Data to Services for the Sentient Ci...
PDF
ICARO: business cloud accelerator !
PDF
FODD 2015 Mobile App based on ServiceMap, http://www.disit.org/fodd
PDF
Smart City and Open Data Projects and tools of DISIT Lab
PDF
Km4City Smart City API: an integrated support for mobility services
PDF
Km4City White Paper: Production tools for Smart City, from data to services f...
PDF
Functional Resonance Analysis Method based- Decision Support tool for Urban T...
PDF
Km4City: Knowledge Model 4 the City: molti dati + km4city = +conoscenza e se...
PDF
Monitoring Public Attention on Environment Issues with Twitter Vigilance
PDF
Overview on Smart City, DISIT lab solution for beginners, 2015, Part 7: Distr...
PDF
Twitter Vigilance: Modelli e Strumenti per l’Analisi e lo Studio di Dati Soci...
PDF
Km4City: una soluzione aperta per erogare servizi Smart City
Km4City: A reusable example of a Metropolitan-Wide Data Platform, MAJORCITIES...
Aggregatore di Open Data del territorio fiorentino e toscano
Smart City Strategic Forecast, SmartCity360, Bratislava
RESOLUTE: Resilience management guidelines and Operationalization applied to ...
Estrazione e Deduzione della Conoscenza via Modelli Semantici: From Social N...
Km4City, Smart City Urban Platform, From Data to Services for the Sentient Ci...
ICARO: business cloud accelerator !
FODD 2015 Mobile App based on ServiceMap, http://www.disit.org/fodd
Smart City and Open Data Projects and tools of DISIT Lab
Km4City Smart City API: an integrated support for mobility services
Km4City White Paper: Production tools for Smart City, from data to services f...
Functional Resonance Analysis Method based- Decision Support tool for Urban T...
Km4City: Knowledge Model 4 the City: molti dati + km4city = +conoscenza e se...
Monitoring Public Attention on Environment Issues with Twitter Vigilance
Overview on Smart City, DISIT lab solution for beginners, 2015, Part 7: Distr...
Twitter Vigilance: Modelli e Strumenti per l’Analisi e lo Studio di Dati Soci...
Km4City: una soluzione aperta per erogare servizi Smart City
Ad

Similar to Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versioning (20)

PDF
DISIT Lab overview: smart city, big data, semantic computing, cloud
PDF
Knowledge mining and Semantic Models: from Cloud to Smart City
PDF
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
PDF
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
PDF
Km4City: Smart City Ontology Building for Effective Erogation of Services
PDF
Big Data Smart City processes and tools, Real Time data processing tools
PDF
Technologies for Enhancing Knowledge and Training, the future of e-learning t...
PDF
"Km4City: Smart City Ontology Building for Effective Erogation of Services"
PDF
A Smart City Development kit for designing Web and Mobile Apps
PDF
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
PDF
Data management plans – EUDAT Best practices and case study | www.eudat.eu
PDF
Industria 4.0 @ DISIT lab
PDF
DISIT lab Overview on Tourism and Training, June 2014
PPTX
A coordinated framework for open data open science in Botswana/Simon Hodson
PPTX
Information entanglement
PDF
ESWC 2015 - EU Networking Session
PDF
DSpace-CRIS_An open source solution for Research_EDU15
PDF
Overview on Smart City: Smart City for Beginners
PDF
KM4city, Il Valore degli #OpenData: Esperienze a confronto
PDF
DAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon
DISIT Lab overview: smart city, big data, semantic computing, cloud
Knowledge mining and Semantic Models: from Cloud to Smart City
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Km4City: Smart City Ontology Building for Effective Erogation of Services
Big Data Smart City processes and tools, Real Time data processing tools
Technologies for Enhancing Knowledge and Training, the future of e-learning t...
"Km4City: Smart City Ontology Building for Effective Erogation of Services"
A Smart City Development kit for designing Web and Mobile Apps
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Industria 4.0 @ DISIT lab
DISIT lab Overview on Tourism and Training, June 2014
A coordinated framework for open data open science in Botswana/Simon Hodson
Information entanglement
ESWC 2015 - EU Networking Session
DSpace-CRIS_An open source solution for Research_EDU15
Overview on Smart City: Smart City for Beginners
KM4city, Il Valore degli #OpenData: Esperienze a confronto
DAI DATI INTELLIGENTI AI SERVIZI Smart City API Hackathon

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
MYSQL Presentation for SQL database connectivity
Digital-Transformation-Roadmap-for-Companies.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
A comparative analysis of optical character recognition models for extracting...
Building Integrated photovoltaic BIPV_UPV.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine Learning_overview_presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versioning

  • 1. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Graph Databases Lifecycle Methodology  and Tool to Support Index/Store  Versioning Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi, Nadia Rauch DISIT Lab Dipartimento di Ingegneria dell’Informazione, DINFO Università degli Studi di Firenze Via S. Marta 3, 50139, Firenze, Italy Tel: +39-055-2758511, fax: +39-055-2758570 http://www.disit.dinfo.unifi.it alias http://www.disit.org Paolo.nesi@unifi.it DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 1
  • 2. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Context and Motivations • Graph database are taking place for systems exploiting  knowledge base, KB – Include a set of ontologies and data instances: static data,  reconciliation data, real time data, historical data, gelocated data, etc.  – Smart city, smart cloud, smart learning, etc. • KB need complex and non consolidated methodologies  for their implementation – Many issues may lead to invalidate a building in favor of new  version, e.g., looking for more inference, corrections in model  ontology, changes in historical data, etc. – Methodologies present iterative processes that lead to  restructure/rebuild the knowledge base  DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 2
  • 3. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Context and Motivations (2) • RDF stores (end they end points) are a way for  Knowledge Base implementation  • RDF stores see Ontologies and Instances as triple  (quadruple) s‐p‐o (context) • RDF stores presents several problems in managing: – Versioning and thus deleting triples including the  inferred triples… at real time performance ? – Performance in db store building (inference) – Performance in db store querying (inference) DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 3
  • 4. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it The paper contributions • A methodology for knowledge base life cycle  building and improvement – addressing more details and data kind and cases of  the state of the art methodologies  • A versioning system for RDF store building  supporting the methodology – Completely new tool that can save up to 90% of time  in RDF rebuilding DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 4
  • 5. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RDF store service: building process • built incrementally via progressive refinements  mediating among – reusing ontological models, – increasing the capability of making deductions and  reasoning on the knowledge base concepts, – maintaining acceptable: query and rendering  performances,  – simplifying the design of the front‐end services,  – flexibility to the arrival of additional data and models  and/or corrections,  DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 5
  • 6. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RDF KB life cycle methodology DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 6
  • 7. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Ontology Construction • Domain Analysis:  – concepts, abstractions, aggregation,  classes vs attributes, etc.  • Available Ontologies – Diffusion, Licensing, quality, inference,  etc. • Ontology Integration – Glued concepts, more inference  • Ontology review – Conceptual assessment – Formal verification with metrics and  tools • Points from which one has to restart,  have to be saved • Integrated version of the ontology goes  IN USE DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 7 In use
  • 8. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Km4city example • reuses:  – dcterms to set of properties and classes for modeling  metadata;  – foaf dedicated to relations among people or groups;  – schema.org for a description of people and organizations;  – wgs84_pos representing latitude and longitude;  – GoodRelations for a description of business entities and their  locations;  – OWL‐Time for temporal modeling;  – OTN for transport aspects;  – GIS Dictionary, to represent the spatial component of  geographic features DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 8
  • 9. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Data ingestion and enrichment • Static Data can be:  – Open Data,  – Historical real time data (from  – to) – Enrichment data (to be  identified) – Reconciliation Data (to be  identified) • Problems may derive from:  – Inconsistencies instance level ? – Incompleteness: missing onto  concepts, missing links, … DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 9 In use
  • 10. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Real Time Data • Need to be already verified and  reconciled  – No changes in their structure – No or known variability on  instances • May produce large volume of  Cumulated data  – can become the substantial part of  the KB – cannot be deleted – Cannot be easily extracted from  the RDF store, thus historical data  may be saved for rebuilding DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 10 In use Histori cal triples
  • 11. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RDF Indexing Flow and Requirements • There are several reasons to revise/change the  RDF Indexing and thus the RDF Store itself  –the ontology and thus the: • data mapping and triplification invalidating the  indexing and the materialization of triples –the data triples coming from ingestion,  historical, reconciliation, as:  • a new data mapping, quality improvement,  • changes in performing reconciliation, enrichment  and triplification processes.  DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 11
  • 12. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RDF Index and RDF Index Descriptor • The RDF Index is substantially the RDF Store  containing the triple (ontology, static data, etc…) • The RDF Index Descriptor is: – the recipe to create the RDF Index and  – the set of triples adopted to build it • Since the version of the RDF Store is not an a  viable task (without redesigning the RDF store)  we performed the versioning of the RDF Index  Descriptor DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 12
  • 13. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Versioning: File and RDF Index DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 13 Contains reconciliation of triples connecting parking locations wrt civic numbers depends on the ontology and on the parking area data sets.
  • 14. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Verification and Validation • In a real production of big data RDF store,  – hundreds of files containing triples are produced – The restarting from scratch is time consuming, may  be error prone, may lose the versioning / evolution  value • Not all changes can produce consistent RDF KB  Store. • The saving of intermediate consistent version  can lead t o save time in exploiting the saved  version as restarting point. DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 14
  • 15. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Example: 4 versions on the same core DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 15
  • 16. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RDF Index Manager tool • Keep tracing RDF KB Store Versions, RKBSV, in terms of files of triples,  index‐description, and RDF Index; • Maintaining a repository of RKBSVs where they could be stored and  retrieved;  • Selecting a RKBSV from the repository for modification, to examine  changes and the history version, to be used as base for building a new  version; • Managing the index descriptor as a list of files containing triples; • Generating a RDF KB index on the basis of an RKBSV independently  from the RDF store kind automatically, and in particular for SESAME  OWLIM and Virtuoso; • Monitoring the RDF KB index generation and the feeding state; • Suggest the closest version of the RKBSV with respect to the  demanded new index in terms of files of triples; • Avoiding manually managing the script file of indexing, since it is time  consuming and an error prone process. DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 16
  • 17. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RIM tool: building monitor during production DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 17
  • 18. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it RIM tool: during RDF assessment DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 18
  • 19. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Experimental Results DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 19 Ontologi es + street  graphs + smart city  Services +Enrich&  Reconciliations +Historical  data 1 month Indexing process Final number of triples 15.809 33.547.501 34.462.930 34.557.142 44.218.719 Final number of Files 12 137 178 185 27794 Added triples with respect to previous version 15809 33.531.692 915.429 94.212 9.661.577 Added Files with respect to previous version 12 125 41 7 27609 OWLIM SE 4.3 Indexing Time without RIM (s) 18 6536 6198 7516 12093 Indexing Time with RIM (s) 11 6029 514 343 5745 % of saved time,  RIM versioning 38,9 7,8 91,7 95,4 52,5 Final Number of triples (including geo +  inferred) 16062 57.486.956 59.395.432 59.486.748 73.441.126 disk space in Mbyte 310 8669 8936 9039 13110 VIRTUOSO 7.2 Indexing Time without RIM (s) 146 806 964 1000 2487 Indexing Time with RIM (s) 156 833 421 296 1932 % of saved time,  RIM versioning ‐6,8 ‐3,3 56,3 70,4 22,3 Final Number of triples (including geo, no  inferred) 21.628 35.452.613 36.301.322 36.420.445 46.232.510 disk space in Mbyte 68 1450 1632 1631 2294 GraphDB 6.1 Indexing Time without RIM (s) 9 7818 7929 7671 12915 Indexing Time with RIM (s) 2 6791 454 214 4849 % of saved time,  RIM versioning 77,8 13,1 94,3 97,2 62,45 Final Number of triples (including geo +  inferred) 15.809 57.486.415 59.394.891 59.487.551 73.441.929 disk space in Mbyte 96 4276 4466 4643 5714
  • 20. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it New Version • Support Ontology Licensing – To take into account in ontology building and in KB RDF  store usage, querying • Support Data Licensing • To help selecting ontologies • To take into account in KB RDF store usage, querying • Support Data ingestion process with  – integrated Data Ingestion Manager – Maintaining under control the data sets to be included,  their licenses, triplification, etc.  DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 20
  • 21. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it Conclusions • The RIM model and tool allow: – Keeping under control and trace the RDF life cycle  from construction of the ontology to the indexing  and validation.  – Reducing the time for indexing (RDF store  construction) up to the 97% in some cases. The benefits have been demonstrated for the  most diffused RDF stores: OWLIM, GraphDB and  Virtuoso. DISIT Lab (DINFO UNIFI), DMS 2015, Vancouver, Canada 21