SlideShare a Scribd company logo
LINKED DATA EXPERIENCE AT MACMILLAN 
Building discovery services for scientific and 
scholarly content on top of a semantic data model 
22 October 2014 
Tony Hammond 
Michele Pasin
Linked Data at Macmillan | 22 October 2014 
1 
Background 
About Macmillan and what we are doing
Macmillan Science and Education 
Group brands and businesses 
Linked Data at Macmillan | 22 October 2014
MS&E Current trends 
Developing a richer graph of objects 
Change Drivers 
● Digital first workflow 
– print becomes secondary 
– support for multiple workflows 
● User-centric design 
– things, not data 
– focus on user experience 
● Deeply integrated datasets 
– standard naming convention 
– common metadata model 
– flexible schema management 
– rich dataset descriptions 
Linked Data at Macmillan | 22 October 2014
NPG Linked Data Platform (2012) 
data.nature.com 
Deliverables (2012–2014) 
● Prototype for external use 
● Two RDF dataset releases in 2012 
– April 2012 (22m triples) 
– July 2012 (270m triples) 
● Live updates to query endpoint 
● SPARQL query service (decommissioned) 
Current Work (2014–) 
● Focus on internal use-cases 
● Publish ontology pages 
● Periodic data snapshots 
Linked Data at Macmillan | 22 October 2014
NPG Core Ontology (2014) 
Things: assets, documents, events, types 
Features 
● Classes: ~65 
● Properties: ~200 
● Named graphs (per class) 
Namespaces 
● npg: => http://ns.nature.com/terms/ 
● npgg: => http://ns.nature.com/graphs/ 
Approach 
● Incremental formalization (RDF, RDFS, OWL-DL) 
● Shared metamodel vs. automatic inference 
● Minimal commitment to external vocabs 
Linked Data at Macmillan | 22 October 2014
NPG Subject Pages (2014) 
Topical access to content 
Features 
● Based on SKOS taxonomy 
– >2500 scientific terms 
– content inherited via SKOS tree 
● Dynamically generated 
– one webpage per subject term 
– secondary pages for article types 
● Various formats, e.g. e-alerts, feeds 
– allows people to ‘follow’ a subject 
● Customized related content 
– ads, jobs, events, etc. 
Linked Data at Macmillan | 22 October 2014
Linked Data at Macmillan | 22 October 2014 
2 
Data Storage and Query 
Achieving speed by means of a hybrid architecture
Content Hub 
Managed content warehouse for data discovery 
Capabilities 
● Discovery – Graph 
● Storage – Content Repos 
Features 
● Hybrid RDF + XML architecture 
– MarkLogic for XML, RDF/XML 
– Triplestore (TDB) for RDF validation 
● Repo’s for binary assets 
Datasets 
● Documents (large; >1m) 
● Ontologies (small; <10k) 
Linked Data at Macmillan | 22 October 2014
System Architecture 
Hub content 
Linked Data at Macmillan | 22 October 2014
Content Discovery – Principles 
Readying the API for applications 
Generations 
● 1st – Generic linked data API (RDF/*) 
● 2nd – Specific page model API (JSON) 
Concerns 
● Speed (20ms single object; 200ms filtered object) 
● Simplicity (data construction) 
● Stability (backup, clustering, security, transactions) 
Principles 
● Chunky not chatty, all data in a single response 
● Data as consumed, rather than as stored 
● Support common use cases in simple, obvious ways 
● Ensure a guaranteed, consistent speed of response for more complex queries 
● Build on foundation of standard, pragmatic REST (collections, items) 
Linked Data at Macmillan | 22 October 2014
Content Discovery – Optimization 
Tuning the API for performance 
Approaches 
● TDB + Fuseki – SPARQL 
● MarkLogic Semantics – SPARQL 
● MarkLogic – XQuery 
● MarkLogic (Optimized) – XQuery 
Techniques 
● Partitioning – RDF/XML objects 
● Streaming – serialization 
● Hashing – dictionary lookup 
● Cacheing – Varnish 
Linked Data at Macmillan | 22 October 2014
Content Storage – Layout and Indexing 
Readying the data for page delivery 
Challenges 
● Sort orders 
● RDF Lists 
● Facetting, counting 
Layout 
● Semantic RDF/XML includes in XML 
● RDF objects serialized in list order 
● Application XML for subject hierarchy 
Indexes 
● Indexes over all elements 
● Range indexes for datatypes (e.g. datetimes) 
Linked Data at Macmillan | 22 October 2014
In Conclusion 
A few lessons learned 
Summary 
● An RDF metamodel allows for scalable enterprise-level data organization 
● It is crucial to adequately distinguish between external and internal use cases 
● A hybrid architecture proved to be an efficient internal solution for content delivery 
Future Work 
● Grow the ontology so that it matches product requirements more closely 
● Support automated reasoning and richer query options – both RDF and XML based 
● Maintain and expand the vision of a shared semantic model as a core enterprise asset 
Linked Data at Macmillan | 22 October 2014
For more information 
please contact 
TONY HAMMOND 
Data Architect, Content Data 
Servicestony.hammond@macmillan.com 
MICHELE PASIN 
Information Architect, Product Office 
michele.pasin@macmillan.com 
Thank you

More Related Content

PDF
Linked data experience at Macmillan: Building discovery services for scientif...
PPTX
Digital Preservation at UNM Libraries
PDF
Linked Data Experiences at Springer Nature
PDF
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
PPTX
Scalable Web Data Management using RDF
PDF
ODI Summit 2016 - Linked Open Data at Springer Nature
PDF
Discovering Related Data Sources in Data Portals
PPTX
Linked Open Data and DANS
 
Linked data experience at Macmillan: Building discovery services for scientif...
Digital Preservation at UNM Libraries
Linked Data Experiences at Springer Nature
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
Scalable Web Data Management using RDF
ODI Summit 2016 - Linked Open Data at Springer Nature
Discovering Related Data Sources in Data Portals
Linked Open Data and DANS
 

What's hot (20)

ODP
Retooling a Research Data Repository: data.depositar.io
PPTX
Semantic Web related top conference review
PPTX
DataverseNL as structured data hub
 
PDF
Smart Data Applications powered by the Wikidata Knowledge Graph
PDF
ESWC 2017 Tutorial Knowledge Graphs
PPTX
Ariadne: Interoperability
PPTX
Project update: A collaborative approach to "filling the digital preservation...
PDF
Querying the Wikidata Knowledge Graph
PPT
British Library Linked Open Data Presentation for ALA June 2014
PPTX
Linked Data: from Library Entities to the Web of Data
PPTX
Iochem.carles bo
PDF
Ephedra: efficiently combining RDF data and services using SPARQL federation
PDF
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
PDF
Beyond 2022 project presentation 2021
PDF
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
PPT
Scripting User Contributed Interlinking
PPSX
Dm1.1
PDF
iRODS/Dataverse Project by Jonathan Crabtree
PPTX
RDM Infrastructure components at Lancaster University
PDF
ARIADNE: progress in the first nine month
Retooling a Research Data Repository: data.depositar.io
Semantic Web related top conference review
DataverseNL as structured data hub
 
Smart Data Applications powered by the Wikidata Knowledge Graph
ESWC 2017 Tutorial Knowledge Graphs
Ariadne: Interoperability
Project update: A collaborative approach to "filling the digital preservation...
Querying the Wikidata Knowledge Graph
British Library Linked Open Data Presentation for ALA June 2014
Linked Data: from Library Entities to the Web of Data
Iochem.carles bo
Ephedra: efficiently combining RDF data and services using SPARQL federation
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Beyond 2022 project presentation 2021
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Scripting User Contributed Interlinking
Dm1.1
iRODS/Dataverse Project by Jonathan Crabtree
RDM Infrastructure components at Lancaster University
ARIADNE: progress in the first nine month
Ad

Viewers also liked (9)

PPT
PPT
Agile Descriptions
PDF
Prokariotoen pareta zelularra 1
PPT
Bionlp 07
PPT
Handle 08
PDF
Biokimika
PDF
Google docs aurkezpena
PPTX
Techniques used in RDF Data Publishing at Nature Publishing Group
PPT
OpenURL - The Rough Guide
Agile Descriptions
Prokariotoen pareta zelularra 1
Bionlp 07
Handle 08
Biokimika
Google docs aurkezpena
Techniques used in RDF Data Publishing at Nature Publishing Group
OpenURL - The Rough Guide
Ad

Similar to Iswc 2014-hammond-pasin-presentation-final (20)

PPTX
Opening up MOOCs for OER management on the Web of linked data
PPTX
Rdap12 wrap up reagan moore
PDF
Graph basedrdf storeforapachecassandra
PPTX
Describing Theses and Dissertations Using Schema.org
PPT
Linked Data Competency Index : Mapping the field for teachers and learners
PDF
Tools for Next Generation of CMS: XML, RDF, & GRDDL
PDF
Semantic Technologies for Big Data
PDF
A Mobile-First, Cloud-First Stack at Pearson
PDF
On-Demand RDF Graph Databases in the Cloud
PPTX
Semantic web
PPTX
SWIB14 Weaving repository contents into the Semantic Web
PPTX
Manchesterjan2015
PPTX
08 learning object repository with cordra
PDF
Research Plan 2014
PPTX
The nature.com ontologies portal: nature.com/ontologies
PPTX
Linked Data from a Digital Object Management System
PPTX
Why I don't use Semantic Web technologies anymore, event if they still influe...
PDF
The Semantic Web and Drupal 7 - Loja 2013
PDF
Organic.Edunet Repository Tools
Opening up MOOCs for OER management on the Web of linked data
Rdap12 wrap up reagan moore
Graph basedrdf storeforapachecassandra
Describing Theses and Dissertations Using Schema.org
Linked Data Competency Index : Mapping the field for teachers and learners
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Semantic Technologies for Big Data
A Mobile-First, Cloud-First Stack at Pearson
On-Demand RDF Graph Databases in the Cloud
Semantic web
SWIB14 Weaving repository contents into the Semantic Web
Manchesterjan2015
08 learning object repository with cordra
Research Plan 2014
The nature.com ontologies portal: nature.com/ontologies
Linked Data from a Digital Object Management System
Why I don't use Semantic Web technologies anymore, event if they still influe...
The Semantic Web and Drupal 7 - Loja 2013
Organic.Edunet Repository Tools

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Unlocking AI with Model Context Protocol (MCP)
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release

Iswc 2014-hammond-pasin-presentation-final

  • 1. LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin
  • 2. Linked Data at Macmillan | 22 October 2014 1 Background About Macmillan and what we are doing
  • 3. Macmillan Science and Education Group brands and businesses Linked Data at Macmillan | 22 October 2014
  • 4. MS&E Current trends Developing a richer graph of objects Change Drivers ● Digital first workflow – print becomes secondary – support for multiple workflows ● User-centric design – things, not data – focus on user experience ● Deeply integrated datasets – standard naming convention – common metadata model – flexible schema management – rich dataset descriptions Linked Data at Macmillan | 22 October 2014
  • 5. NPG Linked Data Platform (2012) data.nature.com Deliverables (2012–2014) ● Prototype for external use ● Two RDF dataset releases in 2012 – April 2012 (22m triples) – July 2012 (270m triples) ● Live updates to query endpoint ● SPARQL query service (decommissioned) Current Work (2014–) ● Focus on internal use-cases ● Publish ontology pages ● Periodic data snapshots Linked Data at Macmillan | 22 October 2014
  • 6. NPG Core Ontology (2014) Things: assets, documents, events, types Features ● Classes: ~65 ● Properties: ~200 ● Named graphs (per class) Namespaces ● npg: => http://ns.nature.com/terms/ ● npgg: => http://ns.nature.com/graphs/ Approach ● Incremental formalization (RDF, RDFS, OWL-DL) ● Shared metamodel vs. automatic inference ● Minimal commitment to external vocabs Linked Data at Macmillan | 22 October 2014
  • 7. NPG Subject Pages (2014) Topical access to content Features ● Based on SKOS taxonomy – >2500 scientific terms – content inherited via SKOS tree ● Dynamically generated – one webpage per subject term – secondary pages for article types ● Various formats, e.g. e-alerts, feeds – allows people to ‘follow’ a subject ● Customized related content – ads, jobs, events, etc. Linked Data at Macmillan | 22 October 2014
  • 8. Linked Data at Macmillan | 22 October 2014 2 Data Storage and Query Achieving speed by means of a hybrid architecture
  • 9. Content Hub Managed content warehouse for data discovery Capabilities ● Discovery – Graph ● Storage – Content Repos Features ● Hybrid RDF + XML architecture – MarkLogic for XML, RDF/XML – Triplestore (TDB) for RDF validation ● Repo’s for binary assets Datasets ● Documents (large; >1m) ● Ontologies (small; <10k) Linked Data at Macmillan | 22 October 2014
  • 10. System Architecture Hub content Linked Data at Macmillan | 22 October 2014
  • 11. Content Discovery – Principles Readying the API for applications Generations ● 1st – Generic linked data API (RDF/*) ● 2nd – Specific page model API (JSON) Concerns ● Speed (20ms single object; 200ms filtered object) ● Simplicity (data construction) ● Stability (backup, clustering, security, transactions) Principles ● Chunky not chatty, all data in a single response ● Data as consumed, rather than as stored ● Support common use cases in simple, obvious ways ● Ensure a guaranteed, consistent speed of response for more complex queries ● Build on foundation of standard, pragmatic REST (collections, items) Linked Data at Macmillan | 22 October 2014
  • 12. Content Discovery – Optimization Tuning the API for performance Approaches ● TDB + Fuseki – SPARQL ● MarkLogic Semantics – SPARQL ● MarkLogic – XQuery ● MarkLogic (Optimized) – XQuery Techniques ● Partitioning – RDF/XML objects ● Streaming – serialization ● Hashing – dictionary lookup ● Cacheing – Varnish Linked Data at Macmillan | 22 October 2014
  • 13. Content Storage – Layout and Indexing Readying the data for page delivery Challenges ● Sort orders ● RDF Lists ● Facetting, counting Layout ● Semantic RDF/XML includes in XML ● RDF objects serialized in list order ● Application XML for subject hierarchy Indexes ● Indexes over all elements ● Range indexes for datatypes (e.g. datetimes) Linked Data at Macmillan | 22 October 2014
  • 14. In Conclusion A few lessons learned Summary ● An RDF metamodel allows for scalable enterprise-level data organization ● It is crucial to adequately distinguish between external and internal use cases ● A hybrid architecture proved to be an efficient internal solution for content delivery Future Work ● Grow the ontology so that it matches product requirements more closely ● Support automated reasoning and richer query options – both RDF and XML based ● Maintain and expand the vision of a shared semantic model as a core enterprise asset Linked Data at Macmillan | 22 October 2014
  • 15. For more information please contact TONY HAMMOND Data Architect, Content Data Servicestony.hammond@macmillan.com MICHELE PASIN Information Architect, Product Office michele.pasin@macmillan.com Thank you