SlideShare a Scribd company logo
Abstract: Connecting locally hosted data repositories to internationally hosted related articles has never
been easier. With APIs and other web services becoming standardized at the same time that new linking
standards, such as Datacite DOIs, are being adopted, new ways to distribute and mashup content are now
possible. This presentation will explore emerging trends in linking scholarly literature to data. Both entity
linking and data linking will be discussed. Examples will be presented demonstrating how these
technologies are being employed by publishers and A&I vendors in cooperation with local data repositories.
__________________________________________
Before I get started, I would like to take a minute to set some expectations for this talk. The examples used
will primarily be about hard sciences, my challenge to you is to figure out how to apply these technologies
and methods to the digital humanities.




                                                                                                            1
This is a theoretical framework for looking at the different ways that publications can be connected
to data.
This is also the agenda for the talk. I will first speak about the top left quadrant and then work my
way to the bottom right. This means starting from the easiest to apply to the humanities and
working through to the hardest.




                                                                                                        2
This quadrant is primarily about publications to supplemental data.




                                                                      3
Supplemental data submitted as a file with an article is the traditional way. It has its place, but that
is not what I am talking about today.




                                                                                                       4
Instead, new tools now enable display and direct manipulation of data in new and interesting ways.
This example is an application that displays KML files on a Google Map:
http://www.applications.sciverse.com/action/appDetail/298231?zone=main&pageOrigin=appGallery
&activity=display




                                                                                                 5
Next on the agenda is automating the connection between publications and whole supplementary
or related datasets.




                                                                                               6
One example of this is the PANGAEA app which searches PANGAEA apis by article DOI and
retrieves the coordinates of where supplementary data was collected and then charts these on a
Google map displayed directly on the ScienceDirect article page.




                                                                                                 7
This also works on Scopus record pages (so for lot’s of publishers and journals). From deciding to
put it on Scopus as well it took less than 24 hours for the PANGAEA developer to implement. This
was enabled by the SciVerse Applications platform.




                                                                                                 8
Users can link through to the main record for the dataset on PANGAEA. One thing I would like to
mention here is that there is also a DOI for the dataset. This was done through DataCite.




                                                                                                  9
So what is DataCite and why is it important? It is also very important for creating links to data in
repositories.




                                                                                                       10
Takeaway points: International DOI Foundation enables CrossRef to give out DOIs. DataCite
roughly equivalent to CrossRef. Learn more at the DataCite website. A central institution in Serbia
might want to become a Member Institute.




                                                                                                 11
So those were examples of linking to whole datasets and displaying them in new and interesting
ways. Next to discuss is linking to entities.




                                                                                                 12
Traditional linking involves an author marking up an entity such as a protein so that it can be easily
linked to additional information about that entity in a different database. While this is useful, it is
not what I wish to share with you today. Why make a user follow a link when…




                                                                                                     13
You can now embed a 3D interactive model of the protein directly in context in the article. In this
example the PDB Protein Viewer is embedded directly in the article.




                                                                                                      14
In this example an author adds key structures to the article and they are then embedded using
Reaxys information and software.




                                                                                                15
16
The last examples still required an Author to manually mark up entities. Through text analysis and
mining, this is no longer always necessary.




                                                                                                17
In this example, our partner NextBio automatically recognizes entities in the text of the
article.

Easily extendable to new / other entities
Works retrospectively on older content
Does create recall / precision errors




                                                                                            18
Not only can it display them in the sidebar, but the application framework enables adding links to
the entities in the text on the fly.




                                                                                                     19
A reader can then click those links for additional information form multiple databases.




                                                                                          20
1.   Colours & tags genes, proteins, molecule names
2.   Clicking shows a summary of features for the term (ie: sequence or 2D structure)
3.   User can click on links in the pop-up leading out to more information




                                                                                        21
22
* To summarize, we started with very traditional linking of datasets where an author submits the dataset with the
article. One example of how this can be improved was the Interactive map viewer that displays supplementary KML
files rather than simple attaching the files to the article.
* Next we discussed automated linking to datasets. This included the example of searching PANGAEA APIs for
related datasets and then displaying the locations the data was collected. This will be driven by new standards such as
DataCite.
* Third, authors manually mark up entities that can be linked to in other databases. Now it is possible to embed
content from other databases using APIs.
* Last, is totally automated entity recognition using text analysis and mining, Again, information from third party
databases can be embedded directly in the article itself.
* While I haven’t spoken too much about the technologies enabling these new ways of linking articles to data, one
example is the SciVerse Application Framework, which now enables all of the examples discussed today.
http://www.applications.sciverse.com/action/userhome




                                                                                                                      23
I would like to close with the same questions I opened with. Thank you.




                                                                          24

More Related Content

PDF
Data models and ro
PDF
Connecting Publications and Data
PDF
Distributed Link Prediction in Large Scale Graphs using Apache Spark
PDF
ChemConnect: Poster for European Combustion Meeting 2017
PPT
Scholarly Identity 2.0: What does the Web say about your research?
PPTX
PDF
Document Based Data Modeling Technique
PDF
Llinked open data training for EU institutions
Data models and ro
Connecting Publications and Data
Distributed Link Prediction in Large Scale Graphs using Apache Spark
ChemConnect: Poster for European Combustion Meeting 2017
Scholarly Identity 2.0: What does the Web say about your research?
Document Based Data Modeling Technique
Llinked open data training for EU institutions

Similar to Connecting Publications & Data: Raising visibility of local data collections through linking with international publication databases (20)

PDF
Mendeley Open Repositories 2011 Paper
PDF
X api chinese cop monthly meeting feb.2016
PDF
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
PDF
Linked Data Generation for the University Data From Legacy Database
PPT
Open Archives Initiative Object Reuse and Exchange
PDF
EMPLOYING THE CATEGORIES OF WIKIPEDIA IN THE TASK OF AUTOMATIC DOCUMENTS CLUS...
PDF
Development of a Web based Shopping Cart using the Mongo DB Database for Huma...
PPT
PoolParty Thesaurus Management - ISKO UK, London 2010
PDF
Big data-analytics-cpe8035
PPTX
SKOS as the focal point of linked data strategies
PPSX
Linked Data to Improve the OER Experience
DOC
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
PPTX
reegle - a new key portal for open energy data
PDF
moving_from_relational_to_nosql_couchbase_2016
PDF
The “Big Data” Ecosystem at LinkedIn
PDF
The "Big Data" Ecosystem at LinkedIn
PDF
Graph Databases and Graph Data Science in Neo4j
DOCX
Evaluation criteria for nosql databases
PPTX
Jarrar: Introduction to Linked Data
PPTX
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
Mendeley Open Repositories 2011 Paper
X api chinese cop monthly meeting feb.2016
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Linked Data Generation for the University Data From Legacy Database
Open Archives Initiative Object Reuse and Exchange
EMPLOYING THE CATEGORIES OF WIKIPEDIA IN THE TASK OF AUTOMATIC DOCUMENTS CLUS...
Development of a Web based Shopping Cart using the Mongo DB Database for Huma...
PoolParty Thesaurus Management - ISKO UK, London 2010
Big data-analytics-cpe8035
SKOS as the focal point of linked data strategies
Linked Data to Improve the OER Experience
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
reegle - a new key portal for open energy data
moving_from_relational_to_nosql_couchbase_2016
The “Big Data” Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
Graph Databases and Graph Data Science in Neo4j
Evaluation criteria for nosql databases
Jarrar: Introduction to Linked Data
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
Ad

More from Michael Habib (10)

PPTX
Complexities in Open Access Discovery Interfaces
PDF
Ubiquitous Open Access: Changing culture by integrating OA into user workflows
PDF
Measure for Measure: The role of metrics in assessing research performance - ...
PPT
Application Platforms and Developer Communities - New software tools and app...
PPT
"New Technologies: Empowering the Research community for Better Outcomes", L...
PPT
Scopus March 2012 release overview: New Document Details Pages, Interoperabil...
PPT
SNEAK PREVIEW Scopus Analyze Results: Overview and use case
PPT
From Academic Library 2.0 to (Literature) Research 2.0
PPT
Scholarly Reputation Management Online : The Challenges and Opportunities of ...
PPT
Engaging a New Generation of Authors, Reviewers & Readers through Web 2.0
Complexities in Open Access Discovery Interfaces
Ubiquitous Open Access: Changing culture by integrating OA into user workflows
Measure for Measure: The role of metrics in assessing research performance - ...
Application Platforms and Developer Communities - New software tools and app...
"New Technologies: Empowering the Research community for Better Outcomes", L...
Scopus March 2012 release overview: New Document Details Pages, Interoperabil...
SNEAK PREVIEW Scopus Analyze Results: Overview and use case
From Academic Library 2.0 to (Literature) Research 2.0
Scholarly Reputation Management Online : The Challenges and Opportunities of ...
Engaging a New Generation of Authors, Reviewers & Readers through Web 2.0
Ad

Recently uploaded (20)

PPTX
Lesson notes of climatology university.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Computing-Curriculum for Schools in Ghana
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
master seminar digital applications in india
PDF
01-Introduction-to-Information-Management.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Basic Mud Logging Guide for educational purpose
PDF
RMMM.pdf make it easy to upload and study
Lesson notes of climatology university.
O5-L3 Freight Transport Ops (International) V1.pdf
Pre independence Education in Inndia.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Computing-Curriculum for Schools in Ghana
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
VCE English Exam - Section C Student Revision Booklet
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
master seminar digital applications in india
01-Introduction-to-Information-Management.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Basic Mud Logging Guide for educational purpose
RMMM.pdf make it easy to upload and study

Connecting Publications & Data: Raising visibility of local data collections through linking with international publication databases

  • 1. Abstract: Connecting locally hosted data repositories to internationally hosted related articles has never been easier. With APIs and other web services becoming standardized at the same time that new linking standards, such as Datacite DOIs, are being adopted, new ways to distribute and mashup content are now possible. This presentation will explore emerging trends in linking scholarly literature to data. Both entity linking and data linking will be discussed. Examples will be presented demonstrating how these technologies are being employed by publishers and A&I vendors in cooperation with local data repositories. __________________________________________ Before I get started, I would like to take a minute to set some expectations for this talk. The examples used will primarily be about hard sciences, my challenge to you is to figure out how to apply these technologies and methods to the digital humanities. 1
  • 2. This is a theoretical framework for looking at the different ways that publications can be connected to data. This is also the agenda for the talk. I will first speak about the top left quadrant and then work my way to the bottom right. This means starting from the easiest to apply to the humanities and working through to the hardest. 2
  • 3. This quadrant is primarily about publications to supplemental data. 3
  • 4. Supplemental data submitted as a file with an article is the traditional way. It has its place, but that is not what I am talking about today. 4
  • 5. Instead, new tools now enable display and direct manipulation of data in new and interesting ways. This example is an application that displays KML files on a Google Map: http://www.applications.sciverse.com/action/appDetail/298231?zone=main&pageOrigin=appGallery &activity=display 5
  • 6. Next on the agenda is automating the connection between publications and whole supplementary or related datasets. 6
  • 7. One example of this is the PANGAEA app which searches PANGAEA apis by article DOI and retrieves the coordinates of where supplementary data was collected and then charts these on a Google map displayed directly on the ScienceDirect article page. 7
  • 8. This also works on Scopus record pages (so for lot’s of publishers and journals). From deciding to put it on Scopus as well it took less than 24 hours for the PANGAEA developer to implement. This was enabled by the SciVerse Applications platform. 8
  • 9. Users can link through to the main record for the dataset on PANGAEA. One thing I would like to mention here is that there is also a DOI for the dataset. This was done through DataCite. 9
  • 10. So what is DataCite and why is it important? It is also very important for creating links to data in repositories. 10
  • 11. Takeaway points: International DOI Foundation enables CrossRef to give out DOIs. DataCite roughly equivalent to CrossRef. Learn more at the DataCite website. A central institution in Serbia might want to become a Member Institute. 11
  • 12. So those were examples of linking to whole datasets and displaying them in new and interesting ways. Next to discuss is linking to entities. 12
  • 13. Traditional linking involves an author marking up an entity such as a protein so that it can be easily linked to additional information about that entity in a different database. While this is useful, it is not what I wish to share with you today. Why make a user follow a link when… 13
  • 14. You can now embed a 3D interactive model of the protein directly in context in the article. In this example the PDB Protein Viewer is embedded directly in the article. 14
  • 15. In this example an author adds key structures to the article and they are then embedded using Reaxys information and software. 15
  • 16. 16
  • 17. The last examples still required an Author to manually mark up entities. Through text analysis and mining, this is no longer always necessary. 17
  • 18. In this example, our partner NextBio automatically recognizes entities in the text of the article. Easily extendable to new / other entities Works retrospectively on older content Does create recall / precision errors 18
  • 19. Not only can it display them in the sidebar, but the application framework enables adding links to the entities in the text on the fly. 19
  • 20. A reader can then click those links for additional information form multiple databases. 20
  • 21. 1. Colours & tags genes, proteins, molecule names 2. Clicking shows a summary of features for the term (ie: sequence or 2D structure) 3. User can click on links in the pop-up leading out to more information 21
  • 22. 22
  • 23. * To summarize, we started with very traditional linking of datasets where an author submits the dataset with the article. One example of how this can be improved was the Interactive map viewer that displays supplementary KML files rather than simple attaching the files to the article. * Next we discussed automated linking to datasets. This included the example of searching PANGAEA APIs for related datasets and then displaying the locations the data was collected. This will be driven by new standards such as DataCite. * Third, authors manually mark up entities that can be linked to in other databases. Now it is possible to embed content from other databases using APIs. * Last, is totally automated entity recognition using text analysis and mining, Again, information from third party databases can be embedded directly in the article itself. * While I haven’t spoken too much about the technologies enabling these new ways of linking articles to data, one example is the SciVerse Application Framework, which now enables all of the examples discussed today. http://www.applications.sciverse.com/action/userhome 23
  • 24. I would like to close with the same questions I opened with. Thank you. 24

Editor's Notes

  • #2: Title: Connecting Publications & Data: Raising visibility of local data collections through linking with international publication databases   Abstract: Connecting locally hosted data repositories to internationally hosted related articles has never been easier. With APIs and other web services becoming standardized at the same time that new linking standards, such as Datacite DOIs, are being adopted, new ways to distribute and mashup content are now possible. This presentation will explore emerging trends in linking scholarly literature to data. Both entity linking and data linking will be discussed. Examples will be presented demonstrating how these technologies are being employed by publishers and A&I vendors in cooperation with local data repositories. __________________________________________ Before I get started, I would like to take a minute to set some expectations for this talk. The examples used will primarily be about hard sciences, my challenge to you is to figure out how to apply these technologies and methods to the digital humanities.
  • #3: This is a theoretical framework for looking at the different ways that publications can be connected to data. This is also the agenda for the talk. I will first speak about the top left quadrant and then work my way to the bottom right. This means starting from the easiest to apply to the humanities and working through to the hardest.
  • #4: This quadrant is primarily about publications to supplemental data.
  • #5: Supplemental data submitted as a file with an article is the traditional way. It has its place, but that is not what I am talking about today.
  • #6: Instead, new tools now enable display and direct manipulation of data in new and interesting ways. This example is an application that displays KML files on a Google Map: http://www.applications.sciverse.com/action/appDetail/298231?zone=main&pageOrigin=appGallery&activity=display
  • #7: Next on the agenda is automating the connection between publications and whole supplementary or related datasets.
  • #8: One example of this is the PANGAEA app which searches PANGAEA apis by article DOI and retrieves the coordinates of where supplementary data was collected and then charts these on a Google map displayed directly on the ScienceDirect article page.
  • #9: This also works on Scopus record pages (so for lot’s of publishers and journals). From deciding to put it on Scopus as well it took less than 24 hours for the PANGAEA developer to implement. This was enabled by the SciVerse Applications platform.
  • #10: Users can link through to the main record for the dataset on PANGAEA. One thing I would like to mention here is that there is also a DOI for the dataset. This was done through DataCite.
  • #11: So what is DataCite and why is it important? It is also very important for creating links to data in repositories.
  • #12: Takeaway points: International DOI Foundation enables CrossRef to give out DOIs. DataCite roughly equivalent to CrossRef. Learn more at the DataCite website. A central institution in Serbia might want to become a Member Institute.
  • #13: So those were examples of linking to whole datasets and displaying them in new and interesting ways. Next to discuss is linking to entities.
  • #14: Traditional linking involves an author marking up an entity such as a protein so that it can be easily linked to additional information about that entity in a different database. While this is useful, it is not what I wish to share with you today. Why make a user follow a link when…
  • #15: You can now embed a 3D interactive model of the protein directly in context in the article. In this example the PDB Protein Viewer is embedded directly in the article.
  • #16: In this example an author adds key structures to the article and they are then embedded using Reaxys information and software.
  • #18: The last examples still required an Author to manually mark up entities. Through text analysis and mining, this is no longer always necessary.
  • #19: In this example, our partner NextBio automatically recognizes entities in the text of the article. Easily extendable to new / other entities Works retrospectively on older content Does create recall / precision errors
  • #20: Not only can it display them in the sidebar, but the application framework enables adding links to the entities in the text on the fly.
  • #21: A reader can then click those links for additional information form multiple databases.
  • #22: Colours & tags genes, proteins, molecule names Clicking shows a summary of features for the term (ie: sequence or 2D structure) User can click on links in the pop-up leading out to more information
  • #23: Colours & tags genes, proteins, molecule names Clicking shows a summary of features for the term (ie: sequence or 2D structure) User can click on links in the pop-up leading out to more information
  • #24: To summarize, we started with very traditional linking of datasets where an author submits the dataset with the article. One example of how this can be improved was the Interactive map viewer that displays supplementary KML files rather than simple attaching the files to the article. Next we discussed automated linking to datasets. This included the example of searching PANGAEA APIs for related datasets and then displaying the locations the data was collected. This will be driven by new standards such as DataCite. Third, authors manually mark up entities that can be linked to in other databases. Now it is possible to embed content from other databases using APIs. Last, is totally automated entity recognition using text analysis and mining, Again, information from third party databases can be embedded directly in the article itself. While I haven’t spoken too much about the technologies enabling these new ways of linking articles to data, one example is the SciVerse Application Framework, which now enables all of the examples discussed today. http://www.applications.sciverse.com/action/userhome
  • #25: I would like to close with the same questions I opened with. Thank you.