SlideShare a Scribd company logo
David Kuilman, Gina Donato, Dr. Rinke Hoekstra
A content standard for data-platform use cases:
Content Profiles
& linked documents
NISO Diversity of formats
February 10, 2021 11:00am
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
2
Elsevier Data Platform vision
…entity-driven processes
(Early) access
and visibility
Expedite shapes
Lineage
Provenance
Policy / license
Priority of
content and
authorship
Content is data
Content and data
operate seamlessly
Content structure
follows document
entity structure
Rich HTML5 literals
for UI/UX use cases
Role based
processing
Content typology
Granular
Context-based
using process
and purpose
intelligence
Content is
shared
All content can be
leveraged throughout
the platform by all
contributor/consumer
roles using a common
vocabulary
Zero organisational
boundaries
Policies for compliance
Continuous
flow and
hydration
Partial and
complete resources
Extensible types
and enrichments
Optimisation
of formats
Machine
learning
Human
interaction
Agile, extensible
and resilient
Fast services development
Nimble models
Extensible models
Arbitrary content (types)
Service level agreement
Handle exception flows
gracefully and informed
Business requirement: from a content perspective
Anatomy of content entity processes on a data platform
Source
Data
Harvesting Normalisation Extraction matching Linking Curation Publishing
… entity driven workflow
Classic document driven workflow…
manuscript Internal format copyedit Mastercopy Product
mappings mappings
The Content Profiles & Linked Document standard (CP/LD) is the result of
adopting content platform principles to provide the flexibility, extensibility and
connectivity required on a
data platform for academic, research and professional content
Lets consider a few critical design considerations first…
Pipeline to cyclic
Human-in-the-loop
Merging data entities and content entities on demand
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Key concept: think cyclic, not linear…
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Sourcing Harvesting Normalisation Extraction matching Linking Curation Publishing
… in parallel workflows
… author
… review
… approve
… connect
… edit
… recommend
… annotate
…
Human-in-the-loop
Key concept: think human-in-the-loop and machine learning
Sourcing
Harvesting
Normalizing
Extraction
Matching
linking
Publishing
Gold set
Test sets
Human curation within
content centric workflows
Human curation within
Machine Learning
Contributor
Consumer
Continuous improvement
Content operations
Platform operations
Continuous deployment
Model operations
Content
artefacts
Enhanced
Content
artefacts
Human supervised
Content usage metrics
The CP/LD standard uses established standards to create the
format framework that supports data platform content
operations without compromise
Linked data and HTML5 unite syntax, structure and semantics
needed on the platform
HTML5
JSON-LD +
Structured narrative
Semantic data layer
XHTML dialect
Linked Data
Usage standard and guidelines
Independent of any particular use case
Content Profile standard & Linked Document
XML Schema
RDF Schema
SHACL
XML
Schema
RDF: Discovery
XML: consistency
JSON: messaging
JSON-LD: knowledge infusion
HTML5: representation
Business roles
This is a part of text that has a specific style (italic)
This is a paragraph
This paragraph is the abstract of the paper
This paragraph is the title of the paper
This is author Alba Grifoni
This is a citation of another paper
This is a result reported on in this paper
This is a mention of the “COVID-19” concept
This is a mention of the “SARS-CoV2” concept
This states that “SARS-CoV2” reactive “CD4+ T-cells” exist in ~40%-
60% of unexposed individuals, suggesting cross-reactive T-cell
recognition with “common cold”
doi:10.1126/sciimunol.aan5393
“55425663600”
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (COVID-19)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (T-CD4+)
hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2)
hgraph:id-a28e7725-1919-34f0-a648-45721d8bd6a2 (common cold)
reactive to
reactive to
The anatomy of a Linked
Document
service
service
service
service
service
service
service
service
service
service
assertions
documents
resources
Aggregations
products
Content Topics blueprint for data platform
Bespoke normalizers Linked Data processors Query
harversting
Harvested
manuscript
Normalized
document
Enriched article A finished
article
Article
Author
Document
Document
Document
Author
Document
Article
Document
Author
attributes
Manuscript
Conclusion
Abstract
Author
String
Author
String
Activating the platform: listen and merge application
An author manuscript
Author mention
Author as Person Entity
Author as Entity and representation
Conclusion
Abstract
service
service
service
merge
Activating the platform: merge topics and create a product view
After merging the topics, the
finished view offers:
• A manuscript becomes an
Document
• the position of an abstract
and a conclusion
• An person has been identified
as author
• The author string has been
identified within the
document.
• The author has entity
attributes
• The document assembly is a
scientific article of type
‘Finished’ because it satisfies
the above criteria
merge
Article Author
Author
attributes
Abstract
Author
String
Conclusion
Outside document
Inside document
HTML5 vocabulary
JSON-LD predicates
Relationships legend
A finished article
Key takeaways
• Content is data; treat it as data not as documents
• Normalization is great divider from files to entities, items and assertions
• Entity-designed data and Author-designed data become blended
• Machine learner and researcher forge alliance
On standards & formats…
• RDF and XML schema technology (remain) backbone for information
modelling
• JSON, JSON-LD and HTML5 serialisations dominant for content standards
Working Group initiative to create a NISO standard for the interchange
of academic, research, and professional content, data, and semantics
Further information:
Kuliman "Content Profiles & linked documents"

More Related Content

PPTX
The Kedarnath Trajedy
PPTX
Hypersensitivity and its classification .pptx
DOCX
The tantric cities of angkor
PPT
Flood in Uttarakhand, 2013
PPTX
Application Visibility and Experience through Flexible Netflow
PPTX
Building an effective sharepoint team
PPT
OpenKM commercial
PPTX
The Future of Apache Hadoop an Enterprise Architecture View
The Kedarnath Trajedy
Hypersensitivity and its classification .pptx
The tantric cities of angkor
Flood in Uttarakhand, 2013
Application Visibility and Experience through Flexible Netflow
Building an effective sharepoint team
OpenKM commercial
The Future of Apache Hadoop an Enterprise Architecture View

Similar to Kuliman "Content Profiles & linked documents" (20)

PPTX
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
PPTX
FHIR Client Development with .NET
PDF
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
PPSX
The path to an hybrid open source paradigm
PPTX
TSPUG: Content Management in SharePoint 2010
PPTX
SOFTWARE ENGINEERING PROJECT FOR AI AND APPLICATION
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PPTX
FAIR Computational Workflows
PPTX
Serving Information Needs of Knowledge Workers
PPTX
Research Object Community Update
PPS
Modular Documentation Joe Gelb Techshoret 2009
PPTX
Approaches to machine actionable links
PDF
How to govern and secure a Data Mesh?
PDF
Microsoft SharePoint Syntex
PPTX
Mark Orange - SharePoint 2010 Content Types Model - SPC NZ 2011
PPT
Enterprise Content Management Migration Best Practices Feat Migrations From...
PPTX
Hughes RDAP11 Data Publication Repositories
PPT
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
PPTX
Metadata: Digital Humanties
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
FHIR Client Development with .NET
LavaCon 2017 - Authored by Man and Machine: Interactive Documents?
The path to an hybrid open source paradigm
TSPUG: Content Management in SharePoint 2010
SOFTWARE ENGINEERING PROJECT FOR AI AND APPLICATION
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
FAIR Computational Workflows
Serving Information Needs of Knowledge Workers
Research Object Community Update
Modular Documentation Joe Gelb Techshoret 2009
Approaches to machine actionable links
How to govern and secure a Data Mesh?
Microsoft SharePoint Syntex
Mark Orange - SharePoint 2010 Content Types Model - SPC NZ 2011
Enterprise Content Management Migration Best Practices Feat Migrations From...
Hughes RDAP11 Data Publication Repositories
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
Metadata: Digital Humanties
Ad

More from National Information Standards Organization (NISO) (20)

PPTX
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
PPTX
Potash "Our Journey & Vision for Accessible Content"
PPTX
O'Leary "Progress Assessment - How Far Are We from Delivery"
PPTX
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
PPTX
Davidian "Transfer Code of Practice Standing Committee Update"
PPTX
Patham "NISO Open Discovery Initiative (ODI) Update"
PPTX
Hichliffe "A Standard Terminology for Peer Review"
PPTX
Levin "KBART RP Update at ALA Annual 2025"
PPTX
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Carpenter "2025 NISO Annual Members Meeting"
PPTX
Allen "Social Marketing in Scholarly Communications"
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
PPTX
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
PPTX
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
PPTX
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
PPTX
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
Potash "Our Journey & Vision for Accessible Content"
O'Leary "Progress Assessment - How Far Are We from Delivery"
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
Davidian "Transfer Code of Practice Standing Committee Update"
Patham "NISO Open Discovery Initiative (ODI) Update"
Hichliffe "A Standard Terminology for Peer Review"
Levin "KBART RP Update at ALA Annual 2025"
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Carpenter "2025 NISO Annual Members Meeting"
Allen "Social Marketing in Scholarly Communications"
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Ad

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Classroom Observation Tools for Teachers
PPTX
Institutional Correction lecture only . . .
PPTX
Cell Types and Its function , kingdom of life
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
RMMM.pdf make it easy to upload and study
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Pre independence Education in Inndia.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
GDM (1) (1).pptx small presentation for students
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial disease of the cardiovascular and lymphatic systems
Classroom Observation Tools for Teachers
Institutional Correction lecture only . . .
Cell Types and Its function , kingdom of life
TR - Agricultural Crops Production NC III.pdf
PPH.pptx obstetrics and gynecology in nursing
RMMM.pdf make it easy to upload and study
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Pre independence Education in Inndia.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pharma ospi slides which help in ospi learning
Anesthesia in Laparoscopic Surgery in India
Abdominal Access Techniques with Prof. Dr. R K Mishra

Kuliman "Content Profiles & linked documents"

  • 1. David Kuilman, Gina Donato, Dr. Rinke Hoekstra A content standard for data-platform use cases: Content Profiles & linked documents NISO Diversity of formats February 10, 2021 11:00am Working Group initiative to create a NISO standard for the interchange of academic, research, and professional content, data, and semantics
  • 2. 2 Elsevier Data Platform vision …entity-driven processes
  • 3. (Early) access and visibility Expedite shapes Lineage Provenance Policy / license Priority of content and authorship Content is data Content and data operate seamlessly Content structure follows document entity structure Rich HTML5 literals for UI/UX use cases Role based processing Content typology Granular Context-based using process and purpose intelligence Content is shared All content can be leveraged throughout the platform by all contributor/consumer roles using a common vocabulary Zero organisational boundaries Policies for compliance Continuous flow and hydration Partial and complete resources Extensible types and enrichments Optimisation of formats Machine learning Human interaction Agile, extensible and resilient Fast services development Nimble models Extensible models Arbitrary content (types) Service level agreement Handle exception flows gracefully and informed Business requirement: from a content perspective
  • 4. Anatomy of content entity processes on a data platform Source Data Harvesting Normalisation Extraction matching Linking Curation Publishing … entity driven workflow Classic document driven workflow… manuscript Internal format copyedit Mastercopy Product mappings mappings
  • 5. The Content Profiles & Linked Document standard (CP/LD) is the result of adopting content platform principles to provide the flexibility, extensibility and connectivity required on a data platform for academic, research and professional content Lets consider a few critical design considerations first… Pipeline to cyclic Human-in-the-loop Merging data entities and content entities on demand
  • 6. Sourcing Harvesting Normalizing Extraction Matching linking Publishing Key concept: think cyclic, not linear… Sourcing Harvesting Normalizing Extraction Matching linking Publishing Sourcing Harvesting Normalizing Extraction Matching linking Publishing Sourcing Harvesting Normalizing Extraction Matching linking Publishing Sourcing Harvesting Normalisation Extraction matching Linking Curation Publishing … in parallel workflows … author … review … approve … connect … edit … recommend … annotate … Human-in-the-loop
  • 7. Key concept: think human-in-the-loop and machine learning Sourcing Harvesting Normalizing Extraction Matching linking Publishing Gold set Test sets Human curation within content centric workflows Human curation within Machine Learning Contributor Consumer Continuous improvement Content operations Platform operations Continuous deployment Model operations Content artefacts Enhanced Content artefacts Human supervised Content usage metrics
  • 8. The CP/LD standard uses established standards to create the format framework that supports data platform content operations without compromise Linked data and HTML5 unite syntax, structure and semantics needed on the platform
  • 9. HTML5 JSON-LD + Structured narrative Semantic data layer XHTML dialect Linked Data Usage standard and guidelines Independent of any particular use case Content Profile standard & Linked Document XML Schema RDF Schema SHACL XML Schema RDF: Discovery XML: consistency JSON: messaging JSON-LD: knowledge infusion HTML5: representation Business roles
  • 10. This is a part of text that has a specific style (italic) This is a paragraph This paragraph is the abstract of the paper This paragraph is the title of the paper This is author Alba Grifoni This is a citation of another paper This is a result reported on in this paper This is a mention of the “COVID-19” concept This is a mention of the “SARS-CoV2” concept This states that “SARS-CoV2” reactive “CD4+ T-cells” exist in ~40%- 60% of unexposed individuals, suggesting cross-reactive T-cell recognition with “common cold” doi:10.1126/sciimunol.aan5393 “55425663600” hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (COVID-19) hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2) hgraph:id-88f9e4ca-c776-3380-933b-f1218c4ef1fd (T-CD4+) hgraph:id-2ab6cd87-e543-3229-85ff-c862a90f415c (SARS-CoV2) hgraph:id-a28e7725-1919-34f0-a648-45721d8bd6a2 (common cold) reactive to reactive to The anatomy of a Linked Document
  • 11. service service service service service service service service service service assertions documents resources Aggregations products Content Topics blueprint for data platform Bespoke normalizers Linked Data processors Query harversting Harvested manuscript Normalized document Enriched article A finished article
  • 12. Article Author Document Document Document Author Document Article Document Author attributes Manuscript Conclusion Abstract Author String Author String Activating the platform: listen and merge application An author manuscript Author mention Author as Person Entity Author as Entity and representation Conclusion Abstract service service service merge
  • 13. Activating the platform: merge topics and create a product view After merging the topics, the finished view offers: • A manuscript becomes an Document • the position of an abstract and a conclusion • An person has been identified as author • The author string has been identified within the document. • The author has entity attributes • The document assembly is a scientific article of type ‘Finished’ because it satisfies the above criteria merge Article Author Author attributes Abstract Author String Conclusion Outside document Inside document HTML5 vocabulary JSON-LD predicates Relationships legend A finished article
  • 14. Key takeaways • Content is data; treat it as data not as documents • Normalization is great divider from files to entities, items and assertions • Entity-designed data and Author-designed data become blended • Machine learner and researcher forge alliance On standards & formats… • RDF and XML schema technology (remain) backbone for information modelling • JSON, JSON-LD and HTML5 serialisations dominant for content standards Working Group initiative to create a NISO standard for the interchange of academic, research, and professional content, data, and semantics Further information:

Editor's Notes

  • #11: XML DTD 5.6 (OPS), XOCS… Common Index Profile (CIP) -> structure & metadata NLP: CM2, FPE, Leadmine, MedScan, Termite (SciBite) … Linking: Parity, FPE, …