SlideShare a Scribd company logo
Text mining PubAg
Jake Lever & Ben Busby
National Center for Biotechnology Information
National Library of Medicine
jake.lever@nih.gov, ben.busby@nih.gov
Why text mine?
Reading papers takes time and effort
https://commons.wikimedia.org/wiki/File:FileStack_retouched.jpg
https://commons.wikimedia.org/wiki/File:Cup-o-coffee-simple.svg
https://commons.wikimedia.org/wiki/File:Modern_clock_chris_kemps_01.svg
https://commons.wikimedia.org/wiki/File:Mad_scientist_transparent_background.svg
What can text mining do?
Help build navigable biological databases
https://www.civicdb.org
https://www.pharmgkb.org/
https://string-db.org/
https://commons.wikimedia.org/wiki/File:Mad_scientist_transparent_background.svg
PubRunner
Keeping text mining up-to-date
PubMed
Abstracts
PubRunner
Text mining tool
Public FTP
or Zenodo
Private Server
Download
XML files
Upload
results
Run tool Get
results
www.pubrunner.org
website
Update status of tool and
location of latest results
http://www.pubrunner.org
https://github.com/NCBI-Hackathons/PubRunner
NutriChem
“There is rising evidence of an inverse association between chronic diseases and
diets characterized by rich fruit and vegetable consumption.”
http://sbb.hku.hk/services/NutriChem-2.0
Jensen, Kasper, Gianni Panagiotou, and Irene Kouskoumvekaki. "NutriChem: a systems chemical biology resource
to explore the medicinal value of plant-based foods." Nucleic acids research 43.D1 (2014): D940-D945.
Disclaimer: We are not involved in
the development of this tool
What we’d like to do
Show the power of text mining on PubAg
PubMed
PubAg
PubRunner NutriChem
jake.lever@nih.gov
ben.busby@nih.gov
What we need: Access to all PubAg
abstracts, preferably through FTP

More Related Content

PDF
Profiling Web Archival Voids for Memento Routing
PDF
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
PPT
Linked open data for science, culture and society
PPTX
ContentMining and Clinical Trials
PPTX
ContentMining and Clinical Trials
PDF
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
PDF
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
PPTX
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archival Voids for Memento Routing
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
Linked open data for science, culture and society
ContentMining and Clinical Trials
ContentMining and Clinical Trials
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
Profiling Web Archive Coverage for Top-Level Domain and Content Language

What's hot (20)

PPTX
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
PDF
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
PPTX
ContentMine and WikiData
PDF
Materials informatics
PPTX
Libraries and Linked open Data
PPTX
Internet research & resources
PPTX
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
PPTX
ContentMine and WikiData
PPT
Something about links
PPTX
Content Mining for Machines and Humans
PPTX
Automatic Extraction of Knowledge from the Literature
PPTX
Big Data and ContentMining for Libraries
PPTX
The Memento Protocol and Research Issues With Web Archiving
PDF
From Open Access to Open Data
PPTX
Contentmineatopencon2
PPTX
Amanuens.is HUmans and machines annotating scholarly literature
PPTX
Open Access: Advantages, Funding, Opportunities
PPTX
RSS feeds using Millennium data
PPTX
Amanuens.is HUmans and machines annotating scholarly literature
PPTX
Making Theses USEFUL
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
ContentMine and WikiData
Materials informatics
Libraries and Linked open Data
Internet research & resources
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
ContentMine and WikiData
Something about links
Content Mining for Machines and Humans
Automatic Extraction of Knowledge from the Literature
Big Data and ContentMining for Libraries
The Memento Protocol and Research Issues With Web Archiving
From Open Access to Open Data
Contentmineatopencon2
Amanuens.is HUmans and machines annotating scholarly literature
Open Access: Advantages, Funding, Opportunities
RSS feeds using Millennium data
Amanuens.is HUmans and machines annotating scholarly literature
Making Theses USEFUL
Ad

Recently uploaded (20)

PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Microbiology with diagram medical studies .pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Cell Membrane: Structure, Composition & Functions
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Microbiology with diagram medical studies .pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Viruses (History, structure and composition, classification, Bacteriophage Re...
neck nodes and dissection types and lymph nodes levels
. Radiology Case Scenariosssssssssssssss
2. Earth - The Living Planet Module 2ELS
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
INTRODUCTION TO EVS | Concept of sustainability
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
POSITIONING IN OPERATION THEATRE ROOM.ppt
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Ad

Proposal for Text Mining PubAg

  • 1. Text mining PubAg Jake Lever & Ben Busby National Center for Biotechnology Information National Library of Medicine jake.lever@nih.gov, ben.busby@nih.gov
  • 2. Why text mine? Reading papers takes time and effort https://commons.wikimedia.org/wiki/File:FileStack_retouched.jpg https://commons.wikimedia.org/wiki/File:Cup-o-coffee-simple.svg https://commons.wikimedia.org/wiki/File:Modern_clock_chris_kemps_01.svg https://commons.wikimedia.org/wiki/File:Mad_scientist_transparent_background.svg
  • 3. What can text mining do? Help build navigable biological databases https://www.civicdb.org https://www.pharmgkb.org/ https://string-db.org/ https://commons.wikimedia.org/wiki/File:Mad_scientist_transparent_background.svg
  • 4. PubRunner Keeping text mining up-to-date PubMed Abstracts PubRunner Text mining tool Public FTP or Zenodo Private Server Download XML files Upload results Run tool Get results www.pubrunner.org website Update status of tool and location of latest results http://www.pubrunner.org https://github.com/NCBI-Hackathons/PubRunner
  • 5. NutriChem “There is rising evidence of an inverse association between chronic diseases and diets characterized by rich fruit and vegetable consumption.” http://sbb.hku.hk/services/NutriChem-2.0 Jensen, Kasper, Gianni Panagiotou, and Irene Kouskoumvekaki. "NutriChem: a systems chemical biology resource to explore the medicinal value of plant-based foods." Nucleic acids research 43.D1 (2014): D940-D945. Disclaimer: We are not involved in the development of this tool
  • 6. What we’d like to do Show the power of text mining on PubAg PubMed PubAg PubRunner NutriChem jake.lever@nih.gov ben.busby@nih.gov What we need: Access to all PubAg abstracts, preferably through FTP