Virtual Open Publishing
Fest, 2020-05-28
openVirus
Knowledge for citizens in the time of COVID-19
Peter Murray-Rust
TheContentMine and collaborators
citizens
knowledgebase
Images from ContentMine CC BY and Wikimedia CC BY-SA
pm286@cam.ac.uk
peter@contentmine.org
https://www.slideshare.net/petermurrayrust/openvirus-tools-for-discovering-literature-on-viruses
ContentMine is OpenLocked Non-Profit http://contentmine.org
The Right to Read is the Right to Mine
openVirus collaborators
Remko Popma,
Lezan Hawizy, Tim Voronov,
Andy Jackson,
Clyde Davies,
Thomas Shafee,
Priya JK , Kareena Singh,
Simon Worthington,
ContentMine Workshops on Mining
Chris Kittel, CM, atMozfest 2015
Stefan Kasberger, CM
The world’s existential problems need
knowledge
2019* “Open Climate Knowledge” (OCK)
to mine scientific articles about climate change.
50-90% of all published science is PAYWALLED.
The rest is very hard to find…
*Simon Worthington and PMR
BUT COVID-19 hit …
… knowledge can prevent viral epidemics …
The Ebola outbreak in Liberia was predicted in 1984 …
… and forgotten. https://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-ebola.html
11,310 dead in
West Africa 2014
OpenVirus at OpenPublishingFest
All: 426,613
Open: 21,919 5% is Open to citizens
Is this article relevant to policy makers?
openVirus will give YOU tools to test
how much vital info is PAYWALLED
PREPRINTS!!
Crossref
EuropePMC
Wikidata
getpapers
AMI
We can change all that!
We can do everything ourselves!
Delhi, IN
Priya and Kareena are 3rd year interns on openVirus
PMR
Gitanjali Yadav
Mining!
• build scrapers for Openly readable sources.
• Users queries for scraping
• download raw content
• clean and semantify
• annotate with dictionaries.
• analyze, display.
Scrape -> Clean-> Annotate -> Display
Open sources publish
|
v
|
v
|
v
Sources
https://ethos.bl.uk/Home.dohttps://www.redalyc.org/
100,000 Theses
4,700,000 abstracts
50,000 preprints
https://doaj.org https://biorxiv.org
https://medrxiv.org
Mexico, Latin America
https://europepmc.org
And your archive?
“(virus OR viral) AND
epidemic”
45 hits
DOAJ
Directory of Open Access Journals100,000
abstracts
Only 4.6 million
more to go 0.05%
20 GB total
Clyde Davies
Complete repo would yield
> 2000 articles
The power is with the READER
UK Theses (EThOS)
A full-text search API to find relevant
theses.
data from the EThOS service and the tools
of the UK Web Archive -> full-text search
API to find relevant theses.
1: Searching eTheses for the openVirus
project
2: Bringing Metadata & Full-text Together
This notebook illustrates how to use the
API
Andy Jackson
framework: ami + CProject data
scrapers: getpapers, Ferret, curl, scrapy
cleaners: PDFBox, Tidy/Jsoup, etc. Grobid
transformers: xml2html, ami ocr, KNIME
dictionaries: ami dictionary
indexing and annotation: Solr, ami
Analysis and display: R, KNIME
openVirus Tools
scrape clean annotate display
Dictionaries
disease.xml
country.xml
Generous support from
Annotation
Dictionary ->
A
R
T
I
C
L
E
Cooccurrence
bioRxiv in
Citizen Health Search (CHS)
A proposal to Wellcome Trust (
Open Research in Health call) with
ContentMine, Cochrane and UCL-EPPI (CCU)
CHS puts semantic search on the desktop
of the searcher. We index all the visible
Medical literature, normalize, section
and index against a bank of user-chosen
dictionaries.
CHS takes input from EPMC, bioRxiv and
emerging community sources such as
Crossref, unpaywall and outputs to Zenodo,
Wikidata and CM-Science Source.
Citizen Dashboard
5 million Open Scientific articles ( 0.5
TB), indexed by ContentMine . Disk
30 GBP Raspberry Pi3. 50 GBP
CC BY, PeterMR
Disk
Raspberry PI
Power
CONTAINERISATION!
TESTERS!!
GRAPHICS
DOCUMENTING
QUERIES
SCRAPERS
SOFTWARE
Contentmine.org. Join us! We need…
http://github.com/petermr/openVirus
http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(Budapest Open Access Initiative, 2003)

More Related Content

PPTX
Open Science Principles and Practice
PPTX
Open Virus Indian Presentation
PPTX
Can machines understand the scientific literature?
PPTX
ContentMine: Mining the Scientific Literature
PPTX
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
PPTX
Omdi2021 Ontologies for (Materials) Science in the Digital Age
PPTX
ContentMine: Liberating scholarship from Open publications and theses
PPTX
ContentMine and WikiData
Open Science Principles and Practice
Open Virus Indian Presentation
Can machines understand the scientific literature?
ContentMine: Mining the Scientific Literature
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Omdi2021 Ontologies for (Materials) Science in the Digital Age
ContentMine: Liberating scholarship from Open publications and theses
ContentMine and WikiData

What's hot (20)

PPTX
Digital Scholarship: Enlightenment or Devastated Landscape?
PPTX
ContentMining in Neuroscience
PPTX
Content Mining for Machines and Humans
PPTX
ContentMining for Synthetic Biology
PPTX
ContentMining for Synthetic Biology
PPTX
Content Mining at Wellcome Trust
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPTX
Automatic Extraction of Knowledge from the Literature
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPTX
Open software and knowledge for MIOSS
PPTX
Open Notebook Science
PPTX
Big Data and ContentMining for Libraries
PPTX
Automatic Extraction of Knowledge from Biomedical literature
PPTX
Open data and Open Science
PPTX
The Content Mine (presented at UKSG)
PPTX
Content Mining of Science in Cambridge
PPTX
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
PPTX
Petermrjisc20141201
PPTX
Open software and knowledge for MIOSS
Digital Scholarship: Enlightenment or Devastated Landscape?
ContentMining in Neuroscience
Content Mining for Machines and Humans
ContentMining for Synthetic Biology
ContentMining for Synthetic Biology
Content Mining at Wellcome Trust
Can Computers understand the scientific literature (includes compscie material)
Automatic Extraction of Knowledge from the Literature
Can Computers understand the scientific literature (includes compscie material)
Open software and knowledge for MIOSS
Open Notebook Science
Big Data and ContentMining for Libraries
Automatic Extraction of Knowledge from Biomedical literature
Open data and Open Science
The Content Mine (presented at UKSG)
Content Mining of Science in Cambridge
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Petermrjisc20141201
Open software and knowledge for MIOSS
Ad

Similar to OpenVirus at OpenPublishingFest (20)

PPTX
Climate Change and Human Migration
PPTX
openVirus - tools for discovering literature on viruses
PPTX
Automatic mining of data from materials science literature
PPTX
Scientific search for everyone
PPTX
ContentMine: Open Data and Social Machines
PPTX
Automatic Extraction of Science and Medicine from the scholarly literature
PPTX
Automatic Extraction of Science and Medicine from the scholarly literature
PPTX
Rapid biomedical search
PPTX
Open Virus Indian Presentation
PPTX
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
PPTX
ContentMining and Copyright at CopyCamp2017
PPTX
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
PPTX
Open Data HK: open science meets open data. A primer from Scott Edmunds
PPTX
ContentMining and Clinical Trials
PPTX
ContentMining and Clinical Trials
PPT
Science 2.0
PPTX
ContentMine: Liberating scholarship from Open publications and theses
PPTX
Plosslides
PPTX
PLOS slides
PPTX
ContentMine: Open Data and Social Machines
Climate Change and Human Migration
openVirus - tools for discovering literature on viruses
Automatic mining of data from materials science literature
Scientific search for everyone
ContentMine: Open Data and Social Machines
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
Rapid biomedical search
Open Virus Indian Presentation
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
ContentMining and Copyright at CopyCamp2017
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Open Data HK: open science meets open data. A primer from Scott Edmunds
ContentMining and Clinical Trials
ContentMining and Clinical Trials
Science 2.0
ContentMine: Liberating scholarship from Open publications and theses
Plosslides
PLOS slides
ContentMine: Open Data and Social Machines
Ad

More from petermurrayrust (16)

PPTX
XML for science; its huge potential; but are pubiishers preventing it?
PPTX
Early Career Reseachers and Open Healthcare
PPTX
Openplant2018 Poster; Semantic searching
PPTX
Extracting science from the archive
PPTX
WikiFactMine: Ontology for Everybody and Everything
PPTX
Disrupting the Publisher-Academic Complex
PPTX
Paradise Lost and The Right to Read is the Right to Mine
PPTX
Young people in an Age of Knowledge Neocolonialism
PPTX
WikiFactMine: Science for Everyone
PPTX
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
PDF
WikiFactMine for Plant Chemistry
PPTX
Biovision2017 Accessing the scientific literature
PPTX
Can machines understand the scientific literature
PPTX
Asking the scientific literature to tell us about metabolism
PPTX
Asking the scientific literature to tell us about metabolism
PPTX
Towards Responsible Content Mining: A Cambridge perspective
XML for science; its huge potential; but are pubiishers preventing it?
Early Career Reseachers and Open Healthcare
Openplant2018 Poster; Semantic searching
Extracting science from the archive
WikiFactMine: Ontology for Everybody and Everything
Disrupting the Publisher-Academic Complex
Paradise Lost and The Right to Read is the Right to Mine
Young people in an Age of Knowledge Neocolonialism
WikiFactMine: Science for Everyone
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
WikiFactMine for Plant Chemistry
Biovision2017 Accessing the scientific literature
Can machines understand the scientific literature
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
Towards Responsible Content Mining: A Cambridge perspective

Recently uploaded (20)

PDF
OSCE Series ( Questions & Answers ) - Set 6.pdf
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PDF
Copy of OB - Exam #2 Study Guide. pdf
PPTX
Approach to chest pain, SOB, palpitation and prolonged fever
PPTX
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
PDF
OSCE SERIES ( Questions & Answers ) - Set 5.pdf
PPTX
CARDIOVASCULAR AND RENAL DRUGS.pptx for health study
PDF
OSCE SERIES ( Questions & Answers ) - Set 3.pdf
PDF
Plant-Based Antimicrobials: A New Hope for Treating Diarrhea in HIV Patients...
PPTX
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
PPTX
preoerative assessment in anesthesia and critical care medicine
PDF
The_EHRA_Book_of_Interventional Electrophysiology.pdf
PDF
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
PDF
Comparison of Swim-Up and Microfluidic Sperm Sorting.pdf
PDF
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
PDF
Lecture 8- Cornea and Sclera .pdf 5tg year
PPTX
Reading between the Rings: Imaging in Brain Infections
PDF
OSCE Series Set 1 ( Questions & Answers ).pdf
PPTX
thio and propofol mechanism and uses.pptx
DOCX
PEADIATRICS NOTES.docx lecture notes for medical students
OSCE Series ( Questions & Answers ) - Set 6.pdf
focused on the development and application of glycoHILIC, pepHILIC, and comm...
Copy of OB - Exam #2 Study Guide. pdf
Approach to chest pain, SOB, palpitation and prolonged fever
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
OSCE SERIES ( Questions & Answers ) - Set 5.pdf
CARDIOVASCULAR AND RENAL DRUGS.pptx for health study
OSCE SERIES ( Questions & Answers ) - Set 3.pdf
Plant-Based Antimicrobials: A New Hope for Treating Diarrhea in HIV Patients...
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
preoerative assessment in anesthesia and critical care medicine
The_EHRA_Book_of_Interventional Electrophysiology.pdf
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
Comparison of Swim-Up and Microfluidic Sperm Sorting.pdf
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
Lecture 8- Cornea and Sclera .pdf 5tg year
Reading between the Rings: Imaging in Brain Infections
OSCE Series Set 1 ( Questions & Answers ).pdf
thio and propofol mechanism and uses.pptx
PEADIATRICS NOTES.docx lecture notes for medical students

OpenVirus at OpenPublishingFest

  • 1. Virtual Open Publishing Fest, 2020-05-28 openVirus Knowledge for citizens in the time of COVID-19 Peter Murray-Rust TheContentMine and collaborators citizens knowledgebase Images from ContentMine CC BY and Wikimedia CC BY-SA pm286@cam.ac.uk peter@contentmine.org https://www.slideshare.net/petermurrayrust/openvirus-tools-for-discovering-literature-on-viruses
  • 2. ContentMine is OpenLocked Non-Profit http://contentmine.org The Right to Read is the Right to Mine openVirus collaborators Remko Popma, Lezan Hawizy, Tim Voronov, Andy Jackson, Clyde Davies, Thomas Shafee, Priya JK , Kareena Singh, Simon Worthington,
  • 3. ContentMine Workshops on Mining Chris Kittel, CM, atMozfest 2015 Stefan Kasberger, CM
  • 4. The world’s existential problems need knowledge 2019* “Open Climate Knowledge” (OCK) to mine scientific articles about climate change. 50-90% of all published science is PAYWALLED. The rest is very hard to find… *Simon Worthington and PMR BUT COVID-19 hit …
  • 5. … knowledge can prevent viral epidemics … The Ebola outbreak in Liberia was predicted in 1984 … … and forgotten. https://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-ebola.html 11,310 dead in West Africa 2014
  • 7. All: 426,613 Open: 21,919 5% is Open to citizens Is this article relevant to policy makers? openVirus will give YOU tools to test how much vital info is PAYWALLED
  • 9. Delhi, IN Priya and Kareena are 3rd year interns on openVirus PMR Gitanjali Yadav
  • 10. Mining! • build scrapers for Openly readable sources. • Users queries for scraping • download raw content • clean and semantify • annotate with dictionaries. • analyze, display. Scrape -> Clean-> Annotate -> Display Open sources publish | v | v | v
  • 11. Sources https://ethos.bl.uk/Home.dohttps://www.redalyc.org/ 100,000 Theses 4,700,000 abstracts 50,000 preprints https://doaj.org https://biorxiv.org https://medrxiv.org Mexico, Latin America https://europepmc.org And your archive?
  • 12. “(virus OR viral) AND epidemic” 45 hits DOAJ Directory of Open Access Journals100,000 abstracts Only 4.6 million more to go 0.05% 20 GB total Clyde Davies Complete repo would yield > 2000 articles The power is with the READER
  • 13. UK Theses (EThOS) A full-text search API to find relevant theses. data from the EThOS service and the tools of the UK Web Archive -> full-text search API to find relevant theses. 1: Searching eTheses for the openVirus project 2: Bringing Metadata & Full-text Together This notebook illustrates how to use the API Andy Jackson
  • 14. framework: ami + CProject data scrapers: getpapers, Ferret, curl, scrapy cleaners: PDFBox, Tidy/Jsoup, etc. Grobid transformers: xml2html, ami ocr, KNIME dictionaries: ami dictionary indexing and annotation: Solr, ami Analysis and display: R, KNIME openVirus Tools scrape clean annotate display
  • 18. bioRxiv in Citizen Health Search (CHS) A proposal to Wellcome Trust ( Open Research in Health call) with ContentMine, Cochrane and UCL-EPPI (CCU) CHS puts semantic search on the desktop of the searcher. We index all the visible Medical literature, normalize, section and index against a bank of user-chosen dictionaries. CHS takes input from EPMC, bioRxiv and emerging community sources such as Crossref, unpaywall and outputs to Zenodo, Wikidata and CM-Science Source. Citizen Dashboard
  • 19. 5 million Open Scientific articles ( 0.5 TB), indexed by ContentMine . Disk 30 GBP Raspberry Pi3. 50 GBP CC BY, PeterMR Disk Raspberry PI Power CONTAINERISATION!
  • 20. TESTERS!! GRAPHICS DOCUMENTING QUERIES SCRAPERS SOFTWARE Contentmine.org. Join us! We need… http://github.com/petermr/openVirus
  • 21. http://www.budapestopenaccessinitiative.org/read … an unprecedented public good. … … completely free and unrestricted access to [peer- reviewed literature] by all scientists, scholars, teachers, students, and other curious minds. … …Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge. (Budapest Open Access Initiative, 2003)