SlideShare a Scribd company logo
A Library Data Management Platform 
Based on Linked Open Data 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
25 November, 2014 
Jens Mittelbach | Robert Glaß
A Library Data Management Platform Based on Linked Open Data 
 Back in Those Days 
 The Age of Discovery 
 Library Data Management 
 Qualify, Link and Free Your Data: D:SWARM 
 Live Demo 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
D:SWARM 
25 November 2014 | Page 2 
Dr. Jens Mittelbach
Data Heterogeneity 
 Multiple individual data silos 
• ILS, document repositories, databases, … 
 Data saved in heterogeneous formats 
• MAB, MARC21, … 
 Each data silo gets processed individually 
• Multiple admin interfaces 
• Multiple search interfaces 
• Data unrelated to one another 
 Comprehensive view of resources almost 
impossible (for users and librarians) 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
Back in Those Days … 
09 December 2014 | Page 3 
Dr. Jens Mittelbach
Data Normalization 
 More comprehensive view of 
resources for users, but no real 
discovery/exploration 
 Data gets normalized into one 
storage but not integrated 
 Data available in record-oriented 
structures 
• External data (e.g. GND) has to 
be squeezed in the record 
• Metadata records are 
independent of each other 
• No explicit semantic quality of 
data 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
The Age of “Discovery” 
09 December 2014 | Page 4 
Dr. Jens Mittelbach
Library Data Management 
What Libraries Actually Need 
 Get rid of data silos 
• Open formats for exchange 
 Lossless data integration instead of reductive 
normalization 
 Data integration with entity level granularity 
• Get rid of pre-compiled data records 
 Focus on linking entities/objects: 
• Graph structures creating the knowledge 
graph 
 Stick to quality policy of libraries 
• Versioning and provenance of data 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 5 
Dr. Jens Mittelbach 
Library Data
Library Data Management 
What Should Library Data Actually Look Like? 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 6 
Dr. Jens Mittelbach
Library Data Management 
Whose Job Is Library Data Integration? 
 Data integration should be done by domain experts 
• Librarians, not IT staff (IT always understaffed) 
• Programming skills should not be a requirement 
• Good user experience is a prerequisite for adoption 
 Example driven modelling approach 
 Value created in the community should be reusable 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 7 
Dr. Jens Mittelbach
Library Data Management 
What Tools Do We Need? 
Our Approach: An Open Source Data Management Platform 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 8 
Dr. Jens Mittelbach
Library Data Management 
How Can Data Integration Be Done? 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 9 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Who’s behind this Project? 
 Collaborative development team of SLUB Dresden and Avantgarde Labs GmbH 
 Started work in June 2013 
 Funded from the European Regional Development Fund (ERDF) 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 10 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Our Challenge: Existing Data Formats: MAB, MARC 
• „selection of keywords“ 
• Relevant MAB fields are 902x, 907x, 912x, 917x, 922x. 
• These fields have subfields a, b, c, … coded with 
further information (type of keyword, person, time, 
place, concept...) 
• From field 902x to field 922x we have to check 
• If in subfield "a" there is one of these strings 
(800|801|820|830|845|850|860|870|880)? 
• If so, is there one of these strings (c|g|k|p|s|t|z) in 
subfield "b“? 
• If so, the value in subfield "c“ qualifies as a keyword 
• Keyword needs to be trimmed (which is the easiest 
part) 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 11 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Our Challenge: Existing Tools: Talend 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 12 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Our Challenge: Existing Tools: Open Refine 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 13 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
What Is D:SWARM? 
 Graphical web based ETL modelling tool that serves to: 
• import data from heterogeneous sources with different formats 
• map input to output schemata and design transformation workflows 
• load transformed data into property graph database 
 With additional functionalities: 
• Exporting of data models as RDF 
• Sharing mappings and transformation workflows 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 14 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
How Does D:SWARM Work? 
 Modelling GUI and job repository 
 Execution environment 
• Operational data from heterogeneous data sources (ILS, OAI-PMH, CSV …) 
get processed according to the transformation logics defined in modelling GUI 
 Admin centre 
• Scheduling & execution planning 
• Monitoring of system (data ingest, processing, errors) 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 15 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Why a Property Graph? 
 Node (S) – Edge (P) – Node (O) 
 Extension of RDF data model - each element can be 
endowed with additional information (key : value) 
• Version number 
• Provenance information 
• Type information 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 16 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Intermediate Results as of November 2014 
 Modelling GUI in 2nd version 
• Available file importer: XML, CSV, MABXML 
• Simple schema editor & graphic schema mapper 
• Transformation workflow designer & filter (Metafacture) 
 Execution of mappings and transformations in modelling GUI 
 Persistence in graph database (Neo4J) 
 Exporter: Turtle, N-Quads, N3, … 
 Publication under Open Source licence (Apache 2): https://github.com/dswarm 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
9 December 2014 | Page 17 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Live Demo 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 18 
Dr. Jens Mittelbach 
http://demo.dswarm.org
Qualify, Link and Free Your Data: D:SWARM 
Our Next Steps 
 Provision of URI templates for resource matching and linking 
 Scalable execution engine for production mode 
 Extension of transformation function set 
 Extension of importers 
 Implementation of an administration centre 
 Deduplication and FRBRization 
 Integration of SLUBsemantics Enrichtment Service 
 Implementation of sharing features 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 19 
Dr. Jens Mittelbach
Qualify, Link and Free Your Data: D:SWARM 
Your Next Steps 
 Follow us on twitter.com/dswarm or www.dswarm.org or github.com/dswarm 
 Try it out and get in contact with us 
• http://demo.dswarm.org 
• https://github.com/dswarm/dswarm-documentation/wiki 
• team@dswarm.org 
 Help us prioritize our backlog 
• https://jira.slub-dresden.de/ 
 Fork us on github.com/dswarm 
SLUB Dresden slub-dresden.de 
CC BY-SA 4.0 
Avantgarde Labs 
Robert Glaß 
09 December 2014 | Page 20 
Dr. Jens Mittelbach

More Related Content

PPTX
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
PDF
Seige arndt-lightning talk swib13
PDF
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
PDF
Open content opens up new avenues of research
PDF
20170501 Distributed Network of Digital Heritage Information
PDF
DBPedia-past-present-future
PDF
Linked Open Projects (DCMI Library Community)
PDF
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Seige arndt-lightning talk swib13
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
Open content opens up new avenues of research
20170501 Distributed Network of Digital Heritage Information
DBPedia-past-present-future
Linked Open Projects (DCMI Library Community)
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...

What's hot (20)

PDF
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
PDF
Linked Data for Architecture, Engineering and Construction (AEC)
PDF
Web at 25 - Ontos Linked Open Data
PPTX
WG5: A data wrangling experiment
PPTX
Open Science Days 2014 - Becker - Repositories and Linked Data
PPTX
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
PDF
Session 1.6 slovak public metadata governance and management based on linke...
PPTX
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
PDF
Elasticsearch: a key element of Invenio 3
PPT
Open Data Masterclass - Europeana and LOD
PPTX
Making social science more reproducible by encapsulating access to linked data
PPTX
SWIB14 Weaving repository contents into the Semantic Web
PDF
Linked Data
PPTX
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
PDF
Wikidata
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
PDF
Mind the gap! Reflections on the state of repository data harvesting
PDF
Finding Data Sets
PDF
EuropeanaTech 2018: A distributed network of digital heritage information
PDF
Session 1.2 improving access to digital content by semantic enrichment
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Linked Data for Architecture, Engineering and Construction (AEC)
Web at 25 - Ontos Linked Open Data
WG5: A data wrangling experiment
Open Science Days 2014 - Becker - Repositories and Linked Data
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
Session 1.6 slovak public metadata governance and management based on linke...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Elasticsearch: a key element of Invenio 3
Open Data Masterclass - Europeana and LOD
Making social science more reproducible by encapsulating access to linked data
SWIB14 Weaving repository contents into the Semantic Web
Linked Data
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Wikidata
DBpedia: A Public Data Infrastructure for the Web of Data
Mind the gap! Reflections on the state of repository data harvesting
Finding Data Sets
EuropeanaTech 2018: A distributed network of digital heritage information
Session 1.2 improving access to digital content by semantic enrichment
Ad

Viewers also liked (11)

PDF
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
PDF
Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
PDF
Setting Yourself up for Success: Building an Analytics Schema and Data Dictio...
PDF
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
PPTX
MarketView Marketing Database Platform | Data Services, Inc.
PPTX
Falcon - Data Management Platform on Hadoop (Beyond ETL)
PDF
Dmp essential
PPTX
What Is a Data Management Platform and Why You Should Care?
PDF
Bluekai: Data Management Platforms (dmp) for Publishers
PPTX
DMP Data Management Platform
PDF
The DMP 101 - Data Management Platforms Explained
Schlanke Discovery-Lösung auf Basis von TYPO3. Der neue Bibliothekskatalog de...
Open Source Bibliotheksmanagement (mit D:SWARM + AMSL)
Setting Yourself up for Success: Building an Analytics Schema and Data Dictio...
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
MarketView Marketing Database Platform | Data Services, Inc.
Falcon - Data Management Platform on Hadoop (Beyond ETL)
Dmp essential
What Is a Data Management Platform and Why You Should Care?
Bluekai: Data Management Platforms (dmp) for Publishers
DMP Data Management Platform
The DMP 101 - Data Management Platforms Explained
Ad

Similar to d:swarm - A Library Data Management Platform Based on a Linked Open Data Approach (20)

ODP
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
PDF
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
PDF
Data-as-a-Service: DataGraft
PDF
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
PDF
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
PDF
Introduction to Neo4j for the Emirates & Bahrain
PPTX
Ai & Data Analytics 2018 - Azure Databricks for data scientist
PPTX
Neo4j GraphTalk Oslo - Introduction to Graphs
PDF
Graphs for Enterprise Architects
PPTX
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
PDF
Hadoop meets Agile! - An Agile Big Data Model
PPTX
Southwickc lampert lodlam_training
PPSX
IRMAC April 2015 - DMBOK2 DWBI New Content
PDF
How to build your own Delve: combining machine learning, big data and SharePoint
PDF
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
PDF
Chocolate Flavoured Data Science
PDF
Data Visualisation Workshop (GovHack Brisbane 2014)
PDF
Big Data for Data Scientists - Info Session
PDF
Using graphs for recommendations
PDF
GraphTour 2020 - Neo4j: What's New?
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Data-as-a-Service: DataGraft
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Introduction to Neo4j for the Emirates & Bahrain
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Neo4j GraphTalk Oslo - Introduction to Graphs
Graphs for Enterprise Architects
GraphTalk Wien - Intelligente Lösungen mit Graphen erstellen
Hadoop meets Agile! - An Agile Big Data Model
Southwickc lampert lodlam_training
IRMAC April 2015 - DMBOK2 DWBI New Content
How to build your own Delve: combining machine learning, big data and SharePoint
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Chocolate Flavoured Data Science
Data Visualisation Workshop (GovHack Brisbane 2014)
Big Data for Data Scientists - Info Session
Using graphs for recommendations
GraphTour 2020 - Neo4j: What's New?

More from Jens Mittelbach (9)

PDF
New work culture and the social intranet
PDF
Prozessoptimierung in Bibliotheken — Transparenz durch Visualisierung
PDF
Wissenschaft und Wissen schaffen - Informationskompetenz als Metakompetenz
PDF
Modernes Datenmanagement: Linked Open Data und die offene Bibliothek
PDF
Ein rundes Service-Konzept: Vermittlung von Informationskompetenz und der For...
PPTX
Resource Discovery neu definiert
PPTX
Eine Wissensbar für die SLUB Dresden
PDF
A Map to Go
PPT
Projekt Integration DACHELA 2011 St. Gallen
New work culture and the social intranet
Prozessoptimierung in Bibliotheken — Transparenz durch Visualisierung
Wissenschaft und Wissen schaffen - Informationskompetenz als Metakompetenz
Modernes Datenmanagement: Linked Open Data und die offene Bibliothek
Ein rundes Service-Konzept: Vermittlung von Informationskompetenz und der For...
Resource Discovery neu definiert
Eine Wissensbar für die SLUB Dresden
A Map to Go
Projekt Integration DACHELA 2011 St. Gallen

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release

d:swarm - A Library Data Management Platform Based on a Linked Open Data Approach

  • 1. A Library Data Management Platform Based on Linked Open Data SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 25 November, 2014 Jens Mittelbach | Robert Glaß
  • 2. A Library Data Management Platform Based on Linked Open Data  Back in Those Days  The Age of Discovery  Library Data Management  Qualify, Link and Free Your Data: D:SWARM  Live Demo SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß D:SWARM 25 November 2014 | Page 2 Dr. Jens Mittelbach
  • 3. Data Heterogeneity  Multiple individual data silos • ILS, document repositories, databases, …  Data saved in heterogeneous formats • MAB, MARC21, …  Each data silo gets processed individually • Multiple admin interfaces • Multiple search interfaces • Data unrelated to one another  Comprehensive view of resources almost impossible (for users and librarians) SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß Back in Those Days … 09 December 2014 | Page 3 Dr. Jens Mittelbach
  • 4. Data Normalization  More comprehensive view of resources for users, but no real discovery/exploration  Data gets normalized into one storage but not integrated  Data available in record-oriented structures • External data (e.g. GND) has to be squeezed in the record • Metadata records are independent of each other • No explicit semantic quality of data SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß The Age of “Discovery” 09 December 2014 | Page 4 Dr. Jens Mittelbach
  • 5. Library Data Management What Libraries Actually Need  Get rid of data silos • Open formats for exchange  Lossless data integration instead of reductive normalization  Data integration with entity level granularity • Get rid of pre-compiled data records  Focus on linking entities/objects: • Graph structures creating the knowledge graph  Stick to quality policy of libraries • Versioning and provenance of data SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 5 Dr. Jens Mittelbach Library Data
  • 6. Library Data Management What Should Library Data Actually Look Like? SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 6 Dr. Jens Mittelbach
  • 7. Library Data Management Whose Job Is Library Data Integration?  Data integration should be done by domain experts • Librarians, not IT staff (IT always understaffed) • Programming skills should not be a requirement • Good user experience is a prerequisite for adoption  Example driven modelling approach  Value created in the community should be reusable SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 7 Dr. Jens Mittelbach
  • 8. Library Data Management What Tools Do We Need? Our Approach: An Open Source Data Management Platform SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 8 Dr. Jens Mittelbach
  • 9. Library Data Management How Can Data Integration Be Done? SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 9 Dr. Jens Mittelbach
  • 10. Qualify, Link and Free Your Data: D:SWARM Who’s behind this Project?  Collaborative development team of SLUB Dresden and Avantgarde Labs GmbH  Started work in June 2013  Funded from the European Regional Development Fund (ERDF) SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 10 Dr. Jens Mittelbach
  • 11. Qualify, Link and Free Your Data: D:SWARM Our Challenge: Existing Data Formats: MAB, MARC • „selection of keywords“ • Relevant MAB fields are 902x, 907x, 912x, 917x, 922x. • These fields have subfields a, b, c, … coded with further information (type of keyword, person, time, place, concept...) • From field 902x to field 922x we have to check • If in subfield "a" there is one of these strings (800|801|820|830|845|850|860|870|880)? • If so, is there one of these strings (c|g|k|p|s|t|z) in subfield "b“? • If so, the value in subfield "c“ qualifies as a keyword • Keyword needs to be trimmed (which is the easiest part) SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 11 Dr. Jens Mittelbach
  • 12. Qualify, Link and Free Your Data: D:SWARM Our Challenge: Existing Tools: Talend SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 12 Dr. Jens Mittelbach
  • 13. Qualify, Link and Free Your Data: D:SWARM Our Challenge: Existing Tools: Open Refine SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 13 Dr. Jens Mittelbach
  • 14. Qualify, Link and Free Your Data: D:SWARM What Is D:SWARM?  Graphical web based ETL modelling tool that serves to: • import data from heterogeneous sources with different formats • map input to output schemata and design transformation workflows • load transformed data into property graph database  With additional functionalities: • Exporting of data models as RDF • Sharing mappings and transformation workflows SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 14 Dr. Jens Mittelbach
  • 15. Qualify, Link and Free Your Data: D:SWARM How Does D:SWARM Work?  Modelling GUI and job repository  Execution environment • Operational data from heterogeneous data sources (ILS, OAI-PMH, CSV …) get processed according to the transformation logics defined in modelling GUI  Admin centre • Scheduling & execution planning • Monitoring of system (data ingest, processing, errors) SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 15 Dr. Jens Mittelbach
  • 16. Qualify, Link and Free Your Data: D:SWARM Why a Property Graph?  Node (S) – Edge (P) – Node (O)  Extension of RDF data model - each element can be endowed with additional information (key : value) • Version number • Provenance information • Type information SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 16 Dr. Jens Mittelbach
  • 17. Qualify, Link and Free Your Data: D:SWARM Intermediate Results as of November 2014  Modelling GUI in 2nd version • Available file importer: XML, CSV, MABXML • Simple schema editor & graphic schema mapper • Transformation workflow designer & filter (Metafacture)  Execution of mappings and transformations in modelling GUI  Persistence in graph database (Neo4J)  Exporter: Turtle, N-Quads, N3, …  Publication under Open Source licence (Apache 2): https://github.com/dswarm SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 9 December 2014 | Page 17 Dr. Jens Mittelbach
  • 18. Qualify, Link and Free Your Data: D:SWARM Live Demo SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 18 Dr. Jens Mittelbach http://demo.dswarm.org
  • 19. Qualify, Link and Free Your Data: D:SWARM Our Next Steps  Provision of URI templates for resource matching and linking  Scalable execution engine for production mode  Extension of transformation function set  Extension of importers  Implementation of an administration centre  Deduplication and FRBRization  Integration of SLUBsemantics Enrichtment Service  Implementation of sharing features SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 19 Dr. Jens Mittelbach
  • 20. Qualify, Link and Free Your Data: D:SWARM Your Next Steps  Follow us on twitter.com/dswarm or www.dswarm.org or github.com/dswarm  Try it out and get in contact with us • http://demo.dswarm.org • https://github.com/dswarm/dswarm-documentation/wiki • team@dswarm.org  Help us prioritize our backlog • https://jira.slub-dresden.de/  Fork us on github.com/dswarm SLUB Dresden slub-dresden.de CC BY-SA 4.0 Avantgarde Labs Robert Glaß 09 December 2014 | Page 20 Dr. Jens Mittelbach

Editor's Notes

  • #3: Where we come from, what we have, what we want to have How we can achieve it with D:SWARM Live presentation of D:SWARM
  • #5: Data normalization poses quite a number of problems to librarians, admins and users normalisation: at the cost of information richness Deduplication questionable Reliable enrichment only for parts of data Linkage (especially with external resources) is technically not possible (at this stage) Metadata records are independent of each other (sit side by side in xml silos)
  • #6: Data integration instead of mere normalization: Harvesting (external) data from a variety of sources and integrate all available information into a knowledge structures Versioning and provenance: this is related to what Markus Krötzsch said in his talk about statements according to the Wikidata data model
  • #8: Example driven modelling approach: users can directly observe the concrete results of their abstract modelling work
  • #12: Robert
  • #18: http://hub.culturegraph.org/resources/static/mab2.pdf