SlideShare a Scribd company logo
Linked Data and
the Future of Scientific Publishing
Bradley P. Allen, Elsevier Labs
Presentation to NFAIS Webinar – “Linked Data: What It Is, What It
Does and The Future of Information Discovery”
2012-10-25
Scientific knowledge in a post-print world


 “Our new knowledge does not consist of a
   careful set of works that have passed through
   a series of gates. … Our new knowledge is not
   even a set of works. It is an infrastructure of
   connection.”
 David Weinberger. 2011. Too Big to Know: Rethinking Knowledge Now That the Facts Aren't
 the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room, Basic
 Books, New York, NY




                                                                                             2
                                                                                             2
“Infrastructure of connection” = linked data
 Type of data        Content Inputs        Linked Data Outputs                 Benefits
                 • XML                    • Asset metadata            • Better discoverability
                 • Long-form free text    • Citations                 • Better visualization and
                                                                        understandability
                 • Short-form free text   • Classifications
   What the      • Tables                 • Clusters
                                                                      • Better integration for use
                                                                        in information solutions
 literature is   • Images                 • Entities
     about       • Video                  • Relations
                 • Audio                  • Language models
                                          • Probabilistic graphical
                                            models
                 • Article views          • Article-level metrics     • Provides the researcher
                 • Search queries         • Sentiment analysis          insight about her career
   How the       • User behavior          • Ranking and impact
                                                                      • Provides institutions data
                                                                        about their performance
 literature is   • Social media streams     metrics
                                                                        and impact
                                          • User interest profiles
  being used                                                          • Provides publishers data
                                                                        for optimizing our
                                                                        business



                                                                                                   3
                                                                                                   3
Linked data as standards and best practices
 “Linked data is just a term                   1.       Use URIs as names for
  for how to publish data on                            things
  the web while working                        2.       Use HTTP URIs so that
  with the web. And the web                             people can look up those
  is the best architecture we                           names
  know for publishing
  information in a hugely                      3.       When someone looks up
  diverse and distributed                               a URI, provide useful
  environment, in a gradual                             information, using the
  and sustainable way.”                                 standards
                                               4.       Include links to
    Jeni Tennison. 2010. Why Linked Data for
    data.gov.uk?                                        other URIs, so that they
    http://www.jenitennison.com/blog/node/
    140                                                 can discover more things
                                                    Tim Berners-Lee. 2006. Linked Data
                                                    http://www.w3.org/DesignIssues/LinkedData.html
Scientific publication as linked data
                                  Linked data


                                                              Provenance
                                                               metadata
                                             Entity record
                           Relational
                           Metadata
                Document                                            Asset
                                                                   metadata




      Acquire                   Relational                   Relational       Deliver
                                Metadata                     metadata

                                             Media object


                                                                Asset
                            Asset                              metadata
                           Metadata




                                  Transform,
                            Enhance, Index, Analyze,
                                   Compose




                                                                                        5
Linked data is increasingly important in science




                                                   6
The challenge for publishers
 • Create greater online engagement with our content
   and platform
 • Semantically enrich our content and enhance value of
   discovery services compared to the same and similar
   content at other platforms
 • Drive additional usage (in journals and books, in
   downloads and interactivity)
 • Improve our ability to be a partner in research, and as
   a publisher that adds value
 • Improve our connection with the scientific community
   through productive collaborations that improve
   search and discovery for all researchers

                                                         7
Elsevier’s approach to linked data
 • Expose existing asset and subject metadata as linked
   data in Web pages to aid discovery
 • Embrace linked data principles while leveraging our
   existing content production workflow and
   infrastructure
 • Leverage partners for content enhancement and
   knowledge organization
 • Reuse Web-standard vocabularies, taxonomies,
   ontologies and entity resources where possible
 • Collaborate in building needed authoritative resources
   for identity resolution and metrics
 • Deliver benefits across the complementary use cases
   of researcher and practitioner

                                                        8
Creating smart content by extracting & linking

                                    Asset
                                   Metadata


                        Usage                   Entities




                           Citations      Relations




                                                           9
Methods for extracting and linking content & data




• Very mature, but      • Variable degrees of maturity, but huge      • Language-driven,
  hard to scale           strides through machine learning research     so challenging to
• Crowdsourcing is a      and practical application on the consumer     generalize and
  possible solution,      Internet                                      scale
  but quality control   • Data-driven, so the more data the better    • Crucial to realize
  is a challenge        • Models can be used to build applications,     promise of ease of
                          can be a new type of publication              integration



                                                                                   10
Packaging linked data for content production
             tag:satelliteWrapper +
             XML Schema
              rdf:RDF+namespaces

             sat:Satellite

              Concept schemes                               SKOS
             Statement 1                                    Generator




                                 Tags
             Diabetes
             Statement 2

             Hypertension                                                              LDR
                        ...                                 RDF
                                                            Generator



             Para1-Statement-1
                                 Region Tags

             Diabetes                          Example RDF Statements
                        ...                    Tags from a taxonomy for a given document
                                               Document sections relevant to a given concept
             Para2-Statement-2
                                               Document sections providing answers to a given question
             Hypertension                      Learning objects compliant with a given state educational standard
                                               Genes mentioned in a given document
                                               Documents supporting or disputing conclusions of a given document
                                               Concepts that are in the areas of expertise for a given author
                    ...



                                                                                                             11
Infrastructure for storing and publishing linked data
                                 Loader (REST)

          Data Spaces




                                                      tes
                                                      Satelli
                                                      ation
                                                      Annot


                                                                    es
                                                                    Satellit
                                                                    Asset


                                                                               es
                                                                               Satellit
                                                                               Vocab

                                                                                          Data
                                                                                          Party
                                                                                          3rd
                               Pipeline
                               Coordination      Pipeline Services (Hadoop EMR)

                                                          N-
                                                                                               RDF Ontology
                                                  JSON                 Reaso       Interlin    ValidatiSvcs
                                                          Quads
                                                  Transform
                                                          Extract      ning        king        on
          Discovery Services




                                     Amazo               MongoDB                    SIREN/                Virtuoso
                                     n S3                                           SOLR                  Triplestor
                                                                                                          e

                                        Discovery
                                                         Atom              Admin&                     Ontology         SPARQL
                     A&E                Service API                                       Analytics
                                                         Feed              Monotoring                 Service          Endpoint
                                        (REST)


                      Load Balance & Failover (Akamai GTM & Amazon ELB)




                                                                                                                                  12
Integrating content & data services with linked data




                                                   13
Delivering linked data through multiple online services
Organization                             Main driver                                     Example             Benefits         Linked data
S&T    Journals                          Making the article more engaging and            Article of the      Understanding, Entities, Citations,
                                         informative through visualization and linking   Future              Discovery      Relations
       Books                             Making the book more engaging and               Brain Navigator     Understanding, Entities
                                         informative through visualization and linking                       Discovery
       A&G                               Making the discovery of relevant content        Lipids SciVerse     Discovery,       Entities, Asset
       Research                          easier and more engaging                        App                 Integration      Metadata
       A&G                               Making data about the production and use        SciVal Spotlight    Understanding    Entities, Citations,
       Institutional                     of scientific content easier to understand                                           Usage
       Corporate       Alternative       Making the exploration of design                Elsevier Biofuels   Discovery        Entities, Citations
                       Fuels             alternatives easier
                       Bibliographical   Automating the indexing of content for          Embase              Discovery        Asset Metadata,
                       Databases         traditional discovery channels                                                       Entities
                       Engineering &     Making the discovery of technology trends       Illumin8            Discovery        Entities, Citations,
                       Technology        and sources easier                                                                   Relations
                       Pharma Biotech    Rich integration of content and data in         Target Insights     Discovery,       Entities, Citations
                                         support of research and design workflows                            Understanding
HS     CDS                               Delivering actionable information in the        Order Sets          Integration      Entities, Relations
                                         context of medical decision making
       GCR                               Making the discovery of relevant medical        Clinical Key        Discovery        Entities, Asset
                                         content easier and contextual                                                        Metadata
       NHP                               Making the delivery and organization of         General             Discovery,       Entities, Asset
                                         medical content easier to integrate with        Education           Integration      Metadata,
                                         educational workflows                           Platform                             Relations




                                                                                                                                          14
Challenges in implementing linked data
 • Access to content and data                 • Production
    – Usage data not integrated or               – Manually intensive knowledge
                                                   engineering
      leveraged
                                                 – Balancing production validation and
    – Hard to stage content for modeling           rapid iterative development
      and analytics                              – Relation extraction needed but
                                                   capabilities are minimal at best
 • Integration                                   – Tools for syntactic rather than
    – Adoption of standards across silos           semantic validation
      and legacy systems                      • Sharing
    – Globalization/localization of              – Culture and legacy
      knowledge organization systems             – Business model disincentives
    – Named entity registries for identity       – Identifier, URI and namespace
      resolution for accreditation,                governance
      provenance and trust                    • Quality control
 • Human resources                               – Lack of clean external data
                                                 – Gaps in linked data resources
    – Scarcity of data scientists, language      – Bugs in knowledge organization
      engineers                                    systems
Trends within Elsevier today
 • Increasing acquisition of data and text analytics
   capabilities
 • Shifting dependence from partners to in-house
   resources for content enhancement and
   knowledge organization
 • Innovation in new knowledge organization
   systems (some through integration of existing
   ones)
    – Two main design emphases: taxonomy for discovery,
      ontology for understanding and integration
 • Emergence of shared smart content
   infrastructure based on linked data principles

                                                          16
Smart content is a bridge to the future of publishing
 • Smart content allows publishers to create new
   products and services through structuring
   content for better discovery, insight and utility
    – The value is in the structure, not the content
    – Creating that structure is hard work
    – The kind of hard work that publishers have
      traditionally focused on
 • Consumer Internet businesses are using text and
   data mining to add structure to content today…
   quickly and on the cheap
 • Publishers, societies and libraries both large and
   small can use the same techniques to follow suit

                                                       17
Thank you

Bradley P. Allen
b.allen@elsevier.com
bradleypallen on twitter, github

More Related Content

PDF
Innovation and the STM publisher of the future (SSP IN Conference 2011)
PDF
376 sspin2011 bradleyallen
PDF
Adding structure to unstructured content for enhanced findability hakan tylen
PDF
Open Source for Enterprise Search: Breaking Down the Barriers to Information
PDF
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
PDF
Elsevier Smart Content LDR SemTech 2012
PDF
ConceptClassifier for SharePoint Turbo Charging the Public Sector
PDF
Improving Findability: The Role of Information Architecture in Effective Search
Innovation and the STM publisher of the future (SSP IN Conference 2011)
376 sspin2011 bradleyallen
Adding structure to unstructured content for enhanced findability hakan tylen
Open Source for Enterprise Search: Breaking Down the Barriers to Information
Elsevier Smart Content LDR SemTech NYC Oct-17-2012
Elsevier Smart Content LDR SemTech 2012
ConceptClassifier for SharePoint Turbo Charging the Public Sector
Improving Findability: The Role of Information Architecture in Effective Search

What's hot (18)

PDF
KEY
PhD Defense of Wim Le Page
PPT
SharePoint 2010 ECM: The Best Practices of Organizing and Finding Information...
PPTX
Tagging Up - MMS and Taxonomy In SharePoint 2010
PPTX
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
PPTX
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
PPTX
2011 Sharepoint Summit - Overview of enterprise content management in share_...
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
AP Metadata Services, SemTechBiz 2012
PPSX
PPT
Introducation to metadata
PDF
Aardvark Final Www2010
PDF
Dh2012 enriching digital libraries contents with pundit system
PPTX
Taxonomy Management, Automatic Metadata Tagging & Auto Classification in Shar...
PDF
Semantic web personalization
ZIP
Overlappings and Underpinnings - Content Strategy and Information Architecture
PDF
Personalisation, behavioral targeting and online mkt optimisation
PPT
Data mining - GDi Techno Solutions
PhD Defense of Wim Le Page
SharePoint 2010 ECM: The Best Practices of Organizing and Finding Information...
Tagging Up - MMS and Taxonomy In SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
2011 Sharepoint Summit - Overview of enterprise content management in share_...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
AP Metadata Services, SemTechBiz 2012
Introducation to metadata
Aardvark Final Www2010
Dh2012 enriching digital libraries contents with pundit system
Taxonomy Management, Automatic Metadata Tagging & Auto Classification in Shar...
Semantic web personalization
Overlappings and Underpinnings - Content Strategy and Information Architecture
Personalisation, behavioral targeting and online mkt optimisation
Data mining - GDi Techno Solutions
Ad

Viewers also liked (10)

PPTX
Project Page Zero - Bing - disrupting the searchbox (1993 - 2013)
PPTX
Elsevier - Smart Data and Algorithms for the Publishing Industry
PDF
Innovations in Publishing
PPTX
machine learning elsevier demos
POT
PPTX
Data for Science: How Elsevier is using data science to empower researchers
PPT
Elsevier and STM
PDF
Experience with MarkLogic at Elsevier
PDF
DC-2016 Keynote 2016-10-13
PPTX
Practical Steps to Address Piracy
Project Page Zero - Bing - disrupting the searchbox (1993 - 2013)
Elsevier - Smart Data and Algorithms for the Publishing Industry
Innovations in Publishing
machine learning elsevier demos
Data for Science: How Elsevier is using data science to empower researchers
Elsevier and STM
Experience with MarkLogic at Elsevier
DC-2016 Keynote 2016-10-13
Practical Steps to Address Piracy
Ad

Similar to Linked data and the future of scientific publishing (20)

PPTX
Everything Self-Service:Linked Data Applications with the Information Workbench
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
PPTX
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
PPT
Where is the opportunity for libraries in the collaborative data infrastructure?
PPTX
Linked Data as a Service
PDF
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
PPT
Metadata in general and Dublin Core in specific; some experiences
PPTX
Semantics empowered Physical-Cyber-Social Systems for EarthCube
PPTX
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
PPTX
Building a Data Discovery Network for Sustainability Science
PPTX
Repository Federation: Towards Data Interoperability
PPT
Metadata and Taxonomies for More Flexible Information Architecture
PDF
Provenance and Trust
PPT
Evolving Roles in Scholarly Communications
PDF
Information Architecture: Get Your Blue Prints in Order
PPTX
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
PDF
Trust and linked data jmgomez-v1.1
PPTX
Big Data Session Presentations
PDF
Research Data Management: What is it and why is the Library & Archives Servic...
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Everything Self-Service:Linked Data Applications with the Information Workbench
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
Where is the opportunity for libraries in the collaborative data infrastructure?
Linked Data as a Service
NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web
Metadata in general and Dublin Core in specific; some experiences
Semantics empowered Physical-Cyber-Social Systems for EarthCube
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
Building a Data Discovery Network for Sustainability Science
Repository Federation: Towards Data Interoperability
Metadata and Taxonomies for More Flexible Information Architecture
Provenance and Trust
Evolving Roles in Scholarly Communications
Information Architecture: Get Your Blue Prints in Order
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Trust and linked data jmgomez-v1.1
Big Data Session Presentations
Research Data Management: What is it and why is the Library & Archives Servic...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...

More from Bradley Allen (15)

PDF
Smart Content AAP PSP 2012 02-01 rev 1
PPT
Semantic Search using RDF Metadata (SemTech 2005)
PPT
Introducing Siderean Software (PC Forum 2005)
PPT
Searching BBC Rushes Using Semantic Web Techniques (TRECVID 2005)
PPT
Faceted Navigation (LACASIS Fall Workshop 2005)
PPT
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
PPT
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
PPT
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
PPT
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
PPT
Enterprise Navigation (KM World 2007)
PPT
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
PPT
Relational Navigation Brings Social Computing and Semantic Technology Computi...
PDF
Rethinking Faceted Navigation for Online Marketing (2008)
PDF
Siderean and AWS (AWS Startup Event LA 2008)
PPT
Navigation Through Social Computing (Enterprise Search Summit 2008)
Smart Content AAP PSP 2012 02-01 rev 1
Semantic Search using RDF Metadata (SemTech 2005)
Introducing Siderean Software (PC Forum 2005)
Searching BBC Rushes Using Semantic Web Techniques (TRECVID 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
Enterprise Navigation (KM World 2007)
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Rethinking Faceted Navigation for Online Marketing (2008)
Siderean and AWS (AWS Startup Event LA 2008)
Navigation Through Social Computing (Enterprise Search Summit 2008)

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Big Data Technologies - Introduction.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
NewMind AI Weekly Chronicles - August'25 Week I
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf

Linked data and the future of scientific publishing

  • 1. Linked Data and the Future of Scientific Publishing Bradley P. Allen, Elsevier Labs Presentation to NFAIS Webinar – “Linked Data: What It Is, What It Does and The Future of Information Discovery” 2012-10-25
  • 2. Scientific knowledge in a post-print world “Our new knowledge does not consist of a careful set of works that have passed through a series of gates. … Our new knowledge is not even a set of works. It is an infrastructure of connection.” David Weinberger. 2011. Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room, Basic Books, New York, NY 2 2
  • 3. “Infrastructure of connection” = linked data Type of data Content Inputs Linked Data Outputs Benefits • XML • Asset metadata • Better discoverability • Long-form free text • Citations • Better visualization and understandability • Short-form free text • Classifications What the • Tables • Clusters • Better integration for use in information solutions literature is • Images • Entities about • Video • Relations • Audio • Language models • Probabilistic graphical models • Article views • Article-level metrics • Provides the researcher • Search queries • Sentiment analysis insight about her career How the • User behavior • Ranking and impact • Provides institutions data about their performance literature is • Social media streams metrics and impact • User interest profiles being used • Provides publishers data for optimizing our business 3 3
  • 4. Linked data as standards and best practices “Linked data is just a term 1. Use URIs as names for for how to publish data on things the web while working 2. Use HTTP URIs so that with the web. And the web people can look up those is the best architecture we names know for publishing information in a hugely 3. When someone looks up diverse and distributed a URI, provide useful environment, in a gradual information, using the and sustainable way.” standards 4. Include links to Jeni Tennison. 2010. Why Linked Data for data.gov.uk? other URIs, so that they http://www.jenitennison.com/blog/node/ 140 can discover more things Tim Berners-Lee. 2006. Linked Data http://www.w3.org/DesignIssues/LinkedData.html
  • 5. Scientific publication as linked data Linked data Provenance metadata Entity record Relational Metadata Document Asset metadata Acquire Relational Relational Deliver Metadata metadata Media object Asset Asset metadata Metadata Transform, Enhance, Index, Analyze, Compose 5
  • 6. Linked data is increasingly important in science 6
  • 7. The challenge for publishers • Create greater online engagement with our content and platform • Semantically enrich our content and enhance value of discovery services compared to the same and similar content at other platforms • Drive additional usage (in journals and books, in downloads and interactivity) • Improve our ability to be a partner in research, and as a publisher that adds value • Improve our connection with the scientific community through productive collaborations that improve search and discovery for all researchers 7
  • 8. Elsevier’s approach to linked data • Expose existing asset and subject metadata as linked data in Web pages to aid discovery • Embrace linked data principles while leveraging our existing content production workflow and infrastructure • Leverage partners for content enhancement and knowledge organization • Reuse Web-standard vocabularies, taxonomies, ontologies and entity resources where possible • Collaborate in building needed authoritative resources for identity resolution and metrics • Deliver benefits across the complementary use cases of researcher and practitioner 8
  • 9. Creating smart content by extracting & linking Asset Metadata Usage Entities Citations Relations 9
  • 10. Methods for extracting and linking content & data • Very mature, but • Variable degrees of maturity, but huge • Language-driven, hard to scale strides through machine learning research so challenging to • Crowdsourcing is a and practical application on the consumer generalize and possible solution, Internet scale but quality control • Data-driven, so the more data the better • Crucial to realize is a challenge • Models can be used to build applications, promise of ease of can be a new type of publication integration 10
  • 11. Packaging linked data for content production tag:satelliteWrapper + XML Schema rdf:RDF+namespaces sat:Satellite Concept schemes SKOS Statement 1 Generator Tags Diabetes Statement 2 Hypertension LDR ... RDF Generator Para1-Statement-1 Region Tags Diabetes Example RDF Statements ... Tags from a taxonomy for a given document Document sections relevant to a given concept Para2-Statement-2 Document sections providing answers to a given question Hypertension Learning objects compliant with a given state educational standard Genes mentioned in a given document Documents supporting or disputing conclusions of a given document Concepts that are in the areas of expertise for a given author ... 11
  • 12. Infrastructure for storing and publishing linked data Loader (REST) Data Spaces tes Satelli ation Annot es Satellit Asset es Satellit Vocab Data Party 3rd Pipeline Coordination Pipeline Services (Hadoop EMR) N- RDF Ontology JSON Reaso Interlin ValidatiSvcs Quads Transform Extract ning king on Discovery Services Amazo MongoDB SIREN/ Virtuoso n S3 SOLR Triplestor e Discovery Atom Admin& Ontology SPARQL A&E Service API Analytics Feed Monotoring Service Endpoint (REST) Load Balance & Failover (Akamai GTM & Amazon ELB) 12
  • 13. Integrating content & data services with linked data 13
  • 14. Delivering linked data through multiple online services Organization Main driver Example Benefits Linked data S&T Journals Making the article more engaging and Article of the Understanding, Entities, Citations, informative through visualization and linking Future Discovery Relations Books Making the book more engaging and Brain Navigator Understanding, Entities informative through visualization and linking Discovery A&G Making the discovery of relevant content Lipids SciVerse Discovery, Entities, Asset Research easier and more engaging App Integration Metadata A&G Making data about the production and use SciVal Spotlight Understanding Entities, Citations, Institutional of scientific content easier to understand Usage Corporate Alternative Making the exploration of design Elsevier Biofuels Discovery Entities, Citations Fuels alternatives easier Bibliographical Automating the indexing of content for Embase Discovery Asset Metadata, Databases traditional discovery channels Entities Engineering & Making the discovery of technology trends Illumin8 Discovery Entities, Citations, Technology and sources easier Relations Pharma Biotech Rich integration of content and data in Target Insights Discovery, Entities, Citations support of research and design workflows Understanding HS CDS Delivering actionable information in the Order Sets Integration Entities, Relations context of medical decision making GCR Making the discovery of relevant medical Clinical Key Discovery Entities, Asset content easier and contextual Metadata NHP Making the delivery and organization of General Discovery, Entities, Asset medical content easier to integrate with Education Integration Metadata, educational workflows Platform Relations 14
  • 15. Challenges in implementing linked data • Access to content and data • Production – Usage data not integrated or – Manually intensive knowledge engineering leveraged – Balancing production validation and – Hard to stage content for modeling rapid iterative development and analytics – Relation extraction needed but capabilities are minimal at best • Integration – Tools for syntactic rather than – Adoption of standards across silos semantic validation and legacy systems • Sharing – Globalization/localization of – Culture and legacy knowledge organization systems – Business model disincentives – Named entity registries for identity – Identifier, URI and namespace resolution for accreditation, governance provenance and trust • Quality control • Human resources – Lack of clean external data – Gaps in linked data resources – Scarcity of data scientists, language – Bugs in knowledge organization engineers systems
  • 16. Trends within Elsevier today • Increasing acquisition of data and text analytics capabilities • Shifting dependence from partners to in-house resources for content enhancement and knowledge organization • Innovation in new knowledge organization systems (some through integration of existing ones) – Two main design emphases: taxonomy for discovery, ontology for understanding and integration • Emergence of shared smart content infrastructure based on linked data principles 16
  • 17. Smart content is a bridge to the future of publishing • Smart content allows publishers to create new products and services through structuring content for better discovery, insight and utility – The value is in the structure, not the content – Creating that structure is hard work – The kind of hard work that publishers have traditionally focused on • Consumer Internet businesses are using text and data mining to add structure to content today… quickly and on the cheap • Publishers, societies and libraries both large and small can use the same techniques to follow suit 17
  • 18. Thank you Bradley P. Allen b.allen@elsevier.com bradleypallen on twitter, github