SlideShare a Scribd company logo
The nature.com
ontologies portal
nature.com/ontologies
Tony Hammond, Michele Pasin
Macmillan Science and Education
Who we are
We are both part of Macmillan Science and Education*
-  Macmillan S&E is a global STM publisher
-  Tony Hammond is Data Architect, Technology
@tonyhammond
-  Michele Pasin is Information Architect, Product Office
@lambdaman
* We merged earlier this year (May 2015) with Springer Science+Business Media
to become Springer Nature. We are currently actively engaged in integrating our
businesses.
Macmillan: science and education brands
May 2015
We publish a lot of science! (1845-2015)
http://www.nature.com/developers/hacks/articles/by-year
1,2 million articles in total
Why we’re here today: to ask some questions
We have been making semantic data available in RDF models for a number of
years through our data.nature.com portal (2012–2015)
Big questions:
-  Is this data of any use to the Linked Science community?
-  Should Springer Nature continue to invest in LOD sharing?
More specifically:
-  Does the data contain enough items of interest? [Content]
-  Are the vocabularies understandable and useful? [Structure]
-  Are the data easy to get and to reuse? [Accessibility]
-  Is dereference / download / query the preferred option?
Our work so far
-  Step 1: Linked Data Platform (2012–2014)
-  datasets
-  downloads + SPARQL endpoint
-  linked data dereference
-  Step 2: Ontologies Portal (2015–)
-  datasets + models (core, domain)
-  downloads
-  extensive documentation
The Ontologies Portal
www.nature.com/ontologies
Our goals and rationale
-  Semantic technologies are an effective way to do enterprise metadata
management at web scale
-  Initially used primarily for data publishing / sharing (data.nature.com, 2011)
-  Since 2013, a core component of our digital publishing workflow (see ISWC14 paper)
-  Contributing to an emerging web of linked science data
-  As a major publisher since 1845, ideally positioned to bootstrap a science ‘publications hub’
-  Building on the fundamental ties that exist between the actual research works and the
publications that tell the story about it
The vision of a science graph
What’s available
The core ontology
-  Language: OWL 2, Profile: ALCHI(D)
-  Entities: ~50 classes, ~140 properties
-  Principles: Incremental Formalization/ Enterprise Integration / Model Coherence
http://www.nature.com/ontologies/core/
The core ontology: mappings
:Asset
:Thing
:Publication
:Concept
:Event
:Subject
:Type
:Agent
:ArticleType
:Publishing
Event
:Aggregation
Event
:Component
:Document
:Serial
cidoc-crm:
Information_Carrier
cidoc-crm:
Conceptual_Object
dbpedia:Agent
dc:Agent
dcterms:Agent
cidoc-crm:Agent
vcard:Agent
foaf:Agent
event:Event
bibo:Event
schema:Event
cidoc-crm:
TemporalEntity
cidoc-crm:Type
vcard:Type
fabio:SubjectTerm
bibo:Document
cidoc-crm:Document
foaf:Document
bibo:Periodical
fabio:Periodical
schema:Periodical
bibo:DocumentPart
fabio:Expression
cidoc-crm:InformationObject
= owl:equivalentClass
http://www.nature.com/ontologies/linksets/core/
Domain models: subjects ontology
-  Structure: SKOS, multi hierarchical tree, 6 branches, 7 levels of depth
-  Entities: ~2500 concepts
-  Mappings: 100% of terms, using skos:broadMatch or skos:closeMatch, (Dbpedia and
MESH) www.nature.com/ontologies/models/subjects/
http://www.nature.com/developers/hacks/#1
Subjects visualizations
Datasets
-  Articles: 25m records (for 1.2m articles) with metadata like title, publication etc.. except authors
-  Contributors: 11m records (for 2.7m contributors) i.e. the article’s authors, structured and ordered
but not disambiguated
-  Citations: 218m records (for 9.3m citations) – from an earlier release
Datasets: articles-wikipedia links
How: data extracted using wikipedia search API, 51,309 links over 145 years
Quality: only ~900 were links to nature.com without a DOI, rest all use DOIs correctly
Encoding: cito:isCitedBy => wiki URL, foaf:topic => dbPedia URI
http://www.nature.com/developers/hacks/wikilinks
Data publishing: sources
Sources:
Ontologies (small scale; RDF native)
-  mastered as RDF data (Turtle)
-  managed in GitHub
-  in-memory RDF models built using Apache Jena
-  models augmented at build time using SPIN rules
-  deployed to MarkLogic as RDF/XML for query
-  exported as RDF dataset (Turtle) and as CSV
Documents (large scale; XML native)
-  mastered as XML data
-  managed in MarkLogic XML database
-  data mined from XML documents (1.2m articles) using Scala
-  in-memory RDF models built using Apache Jena
-  injected as RDF/XML sections into XML documents for query
-  exported as RDF dataset (N-Quads)
Organization:
Named graphs – one graph per class
Data publishing: workflows
Data publishing: rules (enrichment)
construct {
?s npg:publicationStartYear ?xds1 .
?s npg:publicationStartYearMonth ?xds2 .
?s npg:publicationStartDate ?xds3 .
?s npg:publicationEndYear ?xde1 .
?s npg:publicationEndYearMonth ?xde2 .
?s npg:publicationEndDate ?xde3 .
}
where {
?s a npg:Journal .
optional { ?s npg:dateStart ?dateStart } optional { ?s npg:dateEnd ?dateEnd }
{
bind (if(regex(?dateStart, "^d{4}"), substr(?dateStart,1,4), "") as ?ds1)
bind (xsd:gYear(?ds1) as ?xds1)
} union {
bind (if(regex(?dateStart, "^d{4}-d{2}"), substr(?dateStart,1,7), "") as ?ds2)
bind (xsd:gYearMonth(?ds2) as ?xds2)
} union {
bind (if(regex(?dateStart, "^d{4}-d{2}-d{2}$"), substr(?dateStart,1,10), "") as ?ds3)
bind (xsd:date(?ds3) as ?xds3)
} union {
…
}
filter (?xds1 != "" || ?xds2 != "" || ?xds3 != "" || ?xde1 != "" || ?xde2 != "" || ?xde3 != "")
}
Data publishing: rules (validation)
construct {
npgg:journals npg:hasConstraintViolation [
a spin:ConstraintViolation ;
npg:severityLevel "Warning" ;
rdfs:label ?message ;
spin:rule [ a sp:Construct ; sp:text ?query ; ] ;
] .
}
where {
{ select (count(?s) as ?count)
where {
?s a npg:Journal .
filter ( not exists { ?s bibo:shortTitle ?h . } ) }
}
bind (concat("! Found ", str(?count), " journals with no short title") as ?message)
bind (""”
construct {
npgg:journals npg:hasConstraintViolation [
a spin:ConstraintViolation ;
spin:violationRoot ?s ; … ] .
} where { … }
""" as ?query)
}
Data publishing: rules (contracts)
knowledge-bases:public
...
npg:hasContract [
rdfs:comment "Contract for ArticleTypes Ontology" ;
npg:graph npgg:article-types ;
npg:hasBinding [
npg:onOntology article-types: ;
npg:allowsPredicate
dc:creator , dc:date , dc:publisher , dc:rights , dcterms:license ,
npg:webpage , owl:imports , owl:versionInfo , rdf:type , rdfs:comment ,
skos:definition , skos:prefLabel , skos:note ,
vann:preferredNamespacePrefix , vann:preferredNamespaceUri
;
] , [
npg:onInstanceOf npg:ArticleType ;
npg:allowsPredicate
npg:hasRoot , npg:isPrimaryArticleType ,
npg:id , npg:isLeaf , npg:isRoot , npg:treeDepth ,
rdf:type , rdfs:isDefinedBy , rdfs:seeAlso ,
skos:broadMatch , skos:broader , skos:closeMatch ,
skos:definition , skos:exactMatch , skos:inScheme , skos:narrower ,
skos:prefLabel , skos:relatedMatch , skos:topConceptOf
;
] ;
] ;
...
Data publishing: rules (contracts)
Next steps
More features:
-  Linked data dereference
-  Richer dataset descriptions (VoID, PROV, HCLS Profile, etc.)
-  SPARQL endpoint?
-  JSON-LD API?
More data:
-  Adding extra data points (funding info, affiliations, …)
-  Revamp citations dataset
-  Longer term: extending archive to include Springer content
More feedback:
-  User testing around data accessibility
-  Surveying communities/users for this data
Looking ahead: how can a publisher make linked
science happen?
From a business perspective:
-  Finding adequate licensing solutions
-  Justifying the effort to publishers
-  What’s the ROI?
From a communities perspective:
-  Do we actually know who are the users?
-  How do we get more feedback/uptake?
-  Should we work more with non-linked-data communities?
Questions?

More Related Content

PDF
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
PDF
nanopub-java: A Java Library for Nanopublications
PDF
ODI Summit 2016 - Linked Open Data at Springer Nature
PPTX
The nature.com ontologies portal: nature.com/ontologies
PDF
Linked Data Experiences at Springer Nature
PPT
The Power of Semantic Technologies to Explore Linked Open Data
PDF
Illuminating DSpace's Linked Data Support
PDF
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
DH11: Browsing Highly Interconnected Humanities Databases Through Multi-Resul...
nanopub-java: A Java Library for Nanopublications
ODI Summit 2016 - Linked Open Data at Springer Nature
The nature.com ontologies portal: nature.com/ontologies
Linked Data Experiences at Springer Nature
The Power of Semantic Technologies to Explore Linked Open Data
Illuminating DSpace's Linked Data Support
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

What's hot (20)

PDF
Cenitpede: Analyzing Webcrawl
PDF
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
PPT
A Semantic Data Model for Web Applications
PDF
Using the whole web as your dataset
PPTX
grlc Makes GitHub Taste Like Linked Data APIs
PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
PPTX
Repeatable Semantic Queries for the Linked Data Agnostic
PDF
Knowledge discoverylaurahollink
PPTX
Knowledge Graph Construction and the Role of DBPedia
ODP
State of the Semantic Web
PPT
Analytics and Access to the UK web archive
PDF
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
PDF
The RDF Report Card: Beyond the Triple Count
PPTX
Linked Data as an enabling framework for resource discovery across libraries,...
PDF
Beyond 2022 project presentation 2021
PPT
Uk discovery-jisc-project-showcase
PPTX
Linked Open Data and DANS
 
PPTX
SSSW2015 Data Workflow Tutorial
PDF
The Bounties of Semantic Data Integration for the Enterprise
Cenitpede: Analyzing Webcrawl
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
A Semantic Data Model for Web Applications
Using the whole web as your dataset
grlc Makes GitHub Taste Like Linked Data APIs
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Repeatable Semantic Queries for the Linked Data Agnostic
Knowledge discoverylaurahollink
Knowledge Graph Construction and the Role of DBPedia
State of the Semantic Web
Analytics and Access to the UK web archive
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
The RDF Report Card: Beyond the Triple Count
Linked Data as an enabling framework for resource discovery across libraries,...
Beyond 2022 project presentation 2021
Uk discovery-jisc-project-showcase
Linked Open Data and DANS
 
SSSW2015 Data Workflow Tutorial
The Bounties of Semantic Data Integration for the Enterprise
Ad

Viewers also liked (15)

PPTX
Kelompok 8 variabel
PPT
Numbers 1 20
PDF
Voltage Stability Indices: Taxonomy, Formulation and Calculation algorithm
PDF
This is Welcome Lookbook SS16
DOCX
Quiero salir de mi mundo
PPT
Nancy Nkanyani
PPTX
Big Data Content Organization, Discovery, and Management
PDF
Tour of language landscape (code.talks)
PPTX
حجية الدليل الرقمي وموقع المشروع اليبي
PPTX
Personal Income Tax 2016 Guide Part 3
PDF
Rural Digiserv project
PDF
2016 Springer - publishing scientific research - dublin
PPTX
Personal relief 2016
PPTX
Personal Income Tax 2016 Guide Part 9
PPTX
Учебный план для highload гуру / Андрей Аксёнов (Sphinx Technologies Inc.)
Kelompok 8 variabel
Numbers 1 20
Voltage Stability Indices: Taxonomy, Formulation and Calculation algorithm
This is Welcome Lookbook SS16
Quiero salir de mi mundo
Nancy Nkanyani
Big Data Content Organization, Discovery, and Management
Tour of language landscape (code.talks)
حجية الدليل الرقمي وموقع المشروع اليبي
Personal Income Tax 2016 Guide Part 3
Rural Digiserv project
2016 Springer - publishing scientific research - dublin
Personal relief 2016
Personal Income Tax 2016 Guide Part 9
Учебный план для highload гуру / Андрей Аксёнов (Sphinx Technologies Inc.)
Ad

Similar to The Nature.com ontologies portal - Linked Science 2015 (20)

PDF
Linked data experience at Macmillan: Building discovery services for scientif...
PPTX
Iswc 2014-hammond-pasin-presentation-final
PDF
Where is the World is my Open Government Data?
PDF
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
PDF
Make our Scientific Datasets Accessible and Interoperable on the Web
PPT
Finding knowledge, data and answers on the Semantic Web
PPTX
How the Web can change social science research (including yours)
PDF
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
PPT
Exploring and using the Semantic Web - SSSW09 tutorial
PPT
Web 3.0 Emerging
PDF
20141112 courtot big_datasemwebontologies
PPTX
Knowledge Graph Introduction
PDF
Semantic Interoperability - grafi della conoscenza
PPT
A Model of the Scholarly Community
PPTX
A Knowledge Discovery Framework for Planetary Defense
PDF
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
PDF
Semantic Linking & Retrieval for Digital Libraries
PPT
Resource Description Framework Approach to Data Publication and Federation
PPTX
Lotico oct 2010
PPT
Semantic Web: Technolgies and Applications for Real-World
Linked data experience at Macmillan: Building discovery services for scientif...
Iswc 2014-hammond-pasin-presentation-final
Where is the World is my Open Government Data?
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Make our Scientific Datasets Accessible and Interoperable on the Web
Finding knowledge, data and answers on the Semantic Web
How the Web can change social science research (including yours)
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Exploring and using the Semantic Web - SSSW09 tutorial
Web 3.0 Emerging
20141112 courtot big_datasemwebontologies
Knowledge Graph Introduction
Semantic Interoperability - grafi della conoscenza
A Model of the Scholarly Community
A Knowledge Discovery Framework for Planetary Defense
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Semantic Linking & Retrieval for Digital Libraries
Resource Description Framework Approach to Data Publication and Federation
Lotico oct 2010
Semantic Web: Technolgies and Applications for Real-World

More from Michele Pasin (12)

PDF
Designing great dashboards: a slidedeck for dashboard developers
PDF
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
PDF
How do philosophers think their own disciplines?
PDF
Exploring highly interconnected humanities data: are faceted browsers always ...
PDF
Semantic Web Approaches in Digital History: an Introduction
PDF
Prosopography and Computer Ontologies: Towards a Formal Representation of the...
PDF
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
PDF
An Ontological View of Canonical Citations
PDF
Livecoding with impromptu
PDF
Introducing FRBR-OO (CCH KR workshop 2.2)
PDF
Introducing CIDOC-CRM (Cch KR workshop #2.1)
PDF
KR Workshop 1 - Ontologies
Designing great dashboards: a slidedeck for dashboard developers
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
How do philosophers think their own disciplines?
Exploring highly interconnected humanities data: are faceted browsers always ...
Semantic Web Approaches in Digital History: an Introduction
Prosopography and Computer Ontologies: Towards a Formal Representation of the...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
An Ontological View of Canonical Citations
Livecoding with impromptu
Introducing FRBR-OO (CCH KR workshop 2.2)
Introducing CIDOC-CRM (Cch KR workshop #2.1)
KR Workshop 1 - Ontologies

Recently uploaded (20)

PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Business Analytics and business intelligence.pdf
Business Acumen Training GuidePresentation.pptx
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to machine learning and Linear Models
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

The Nature.com ontologies portal - Linked Science 2015

  • 1. The nature.com ontologies portal nature.com/ontologies Tony Hammond, Michele Pasin Macmillan Science and Education
  • 2. Who we are We are both part of Macmillan Science and Education* -  Macmillan S&E is a global STM publisher -  Tony Hammond is Data Architect, Technology @tonyhammond -  Michele Pasin is Information Architect, Product Office @lambdaman * We merged earlier this year (May 2015) with Springer Science+Business Media to become Springer Nature. We are currently actively engaged in integrating our businesses.
  • 3. Macmillan: science and education brands May 2015
  • 4. We publish a lot of science! (1845-2015) http://www.nature.com/developers/hacks/articles/by-year 1,2 million articles in total
  • 5. Why we’re here today: to ask some questions We have been making semantic data available in RDF models for a number of years through our data.nature.com portal (2012–2015) Big questions: -  Is this data of any use to the Linked Science community? -  Should Springer Nature continue to invest in LOD sharing? More specifically: -  Does the data contain enough items of interest? [Content] -  Are the vocabularies understandable and useful? [Structure] -  Are the data easy to get and to reuse? [Accessibility] -  Is dereference / download / query the preferred option?
  • 6. Our work so far -  Step 1: Linked Data Platform (2012–2014) -  datasets -  downloads + SPARQL endpoint -  linked data dereference -  Step 2: Ontologies Portal (2015–) -  datasets + models (core, domain) -  downloads -  extensive documentation
  • 8. Our goals and rationale -  Semantic technologies are an effective way to do enterprise metadata management at web scale -  Initially used primarily for data publishing / sharing (data.nature.com, 2011) -  Since 2013, a core component of our digital publishing workflow (see ISWC14 paper) -  Contributing to an emerging web of linked science data -  As a major publisher since 1845, ideally positioned to bootstrap a science ‘publications hub’ -  Building on the fundamental ties that exist between the actual research works and the publications that tell the story about it
  • 9. The vision of a science graph
  • 11. The core ontology -  Language: OWL 2, Profile: ALCHI(D) -  Entities: ~50 classes, ~140 properties -  Principles: Incremental Formalization/ Enterprise Integration / Model Coherence http://www.nature.com/ontologies/core/
  • 12. The core ontology: mappings :Asset :Thing :Publication :Concept :Event :Subject :Type :Agent :ArticleType :Publishing Event :Aggregation Event :Component :Document :Serial cidoc-crm: Information_Carrier cidoc-crm: Conceptual_Object dbpedia:Agent dc:Agent dcterms:Agent cidoc-crm:Agent vcard:Agent foaf:Agent event:Event bibo:Event schema:Event cidoc-crm: TemporalEntity cidoc-crm:Type vcard:Type fabio:SubjectTerm bibo:Document cidoc-crm:Document foaf:Document bibo:Periodical fabio:Periodical schema:Periodical bibo:DocumentPart fabio:Expression cidoc-crm:InformationObject = owl:equivalentClass http://www.nature.com/ontologies/linksets/core/
  • 13. Domain models: subjects ontology -  Structure: SKOS, multi hierarchical tree, 6 branches, 7 levels of depth -  Entities: ~2500 concepts -  Mappings: 100% of terms, using skos:broadMatch or skos:closeMatch, (Dbpedia and MESH) www.nature.com/ontologies/models/subjects/
  • 15. Datasets -  Articles: 25m records (for 1.2m articles) with metadata like title, publication etc.. except authors -  Contributors: 11m records (for 2.7m contributors) i.e. the article’s authors, structured and ordered but not disambiguated -  Citations: 218m records (for 9.3m citations) – from an earlier release
  • 16. Datasets: articles-wikipedia links How: data extracted using wikipedia search API, 51,309 links over 145 years Quality: only ~900 were links to nature.com without a DOI, rest all use DOIs correctly Encoding: cito:isCitedBy => wiki URL, foaf:topic => dbPedia URI http://www.nature.com/developers/hacks/wikilinks
  • 17. Data publishing: sources Sources: Ontologies (small scale; RDF native) -  mastered as RDF data (Turtle) -  managed in GitHub -  in-memory RDF models built using Apache Jena -  models augmented at build time using SPIN rules -  deployed to MarkLogic as RDF/XML for query -  exported as RDF dataset (Turtle) and as CSV Documents (large scale; XML native) -  mastered as XML data -  managed in MarkLogic XML database -  data mined from XML documents (1.2m articles) using Scala -  in-memory RDF models built using Apache Jena -  injected as RDF/XML sections into XML documents for query -  exported as RDF dataset (N-Quads) Organization: Named graphs – one graph per class
  • 19. Data publishing: rules (enrichment) construct { ?s npg:publicationStartYear ?xds1 . ?s npg:publicationStartYearMonth ?xds2 . ?s npg:publicationStartDate ?xds3 . ?s npg:publicationEndYear ?xde1 . ?s npg:publicationEndYearMonth ?xde2 . ?s npg:publicationEndDate ?xde3 . } where { ?s a npg:Journal . optional { ?s npg:dateStart ?dateStart } optional { ?s npg:dateEnd ?dateEnd } { bind (if(regex(?dateStart, "^d{4}"), substr(?dateStart,1,4), "") as ?ds1) bind (xsd:gYear(?ds1) as ?xds1) } union { bind (if(regex(?dateStart, "^d{4}-d{2}"), substr(?dateStart,1,7), "") as ?ds2) bind (xsd:gYearMonth(?ds2) as ?xds2) } union { bind (if(regex(?dateStart, "^d{4}-d{2}-d{2}$"), substr(?dateStart,1,10), "") as ?ds3) bind (xsd:date(?ds3) as ?xds3) } union { … } filter (?xds1 != "" || ?xds2 != "" || ?xds3 != "" || ?xde1 != "" || ?xde2 != "" || ?xde3 != "") }
  • 20. Data publishing: rules (validation) construct { npgg:journals npg:hasConstraintViolation [ a spin:ConstraintViolation ; npg:severityLevel "Warning" ; rdfs:label ?message ; spin:rule [ a sp:Construct ; sp:text ?query ; ] ; ] . } where { { select (count(?s) as ?count) where { ?s a npg:Journal . filter ( not exists { ?s bibo:shortTitle ?h . } ) } } bind (concat("! Found ", str(?count), " journals with no short title") as ?message) bind (""” construct { npgg:journals npg:hasConstraintViolation [ a spin:ConstraintViolation ; spin:violationRoot ?s ; … ] . } where { … } """ as ?query) }
  • 21. Data publishing: rules (contracts) knowledge-bases:public ... npg:hasContract [ rdfs:comment "Contract for ArticleTypes Ontology" ; npg:graph npgg:article-types ; npg:hasBinding [ npg:onOntology article-types: ; npg:allowsPredicate dc:creator , dc:date , dc:publisher , dc:rights , dcterms:license , npg:webpage , owl:imports , owl:versionInfo , rdf:type , rdfs:comment , skos:definition , skos:prefLabel , skos:note , vann:preferredNamespacePrefix , vann:preferredNamespaceUri ; ] , [ npg:onInstanceOf npg:ArticleType ; npg:allowsPredicate npg:hasRoot , npg:isPrimaryArticleType , npg:id , npg:isLeaf , npg:isRoot , npg:treeDepth , rdf:type , rdfs:isDefinedBy , rdfs:seeAlso , skos:broadMatch , skos:broader , skos:closeMatch , skos:definition , skos:exactMatch , skos:inScheme , skos:narrower , skos:prefLabel , skos:relatedMatch , skos:topConceptOf ; ] ; ] ; ...
  • 22. Data publishing: rules (contracts)
  • 23. Next steps More features: -  Linked data dereference -  Richer dataset descriptions (VoID, PROV, HCLS Profile, etc.) -  SPARQL endpoint? -  JSON-LD API? More data: -  Adding extra data points (funding info, affiliations, …) -  Revamp citations dataset -  Longer term: extending archive to include Springer content More feedback: -  User testing around data accessibility -  Surveying communities/users for this data
  • 24. Looking ahead: how can a publisher make linked science happen? From a business perspective: -  Finding adequate licensing solutions -  Justifying the effort to publishers -  What’s the ROI? From a communities perspective: -  Do we actually know who are the users? -  How do we get more feedback/uptake? -  Should we work more with non-linked-data communities?