SlideShare a Scribd company logo
Building a massive
biomedical knowledge
graph with citizen science
Benjamin Good
The Scripps Research Institute
@bgood
Not paying attention? be a citizen scientist at http://mark2cure.org
High level goal: improve access
to published knowledge
22
articles added to
PubMed per year
1 every 30 seconds, more than million a year
knowledge graph
Chemicals & drugs
Genes
Organisms
Area of study
Biological Process
Auto!
Knowledge Graph
~10,000
articles
Ngly1
gene
?
New drug
candidate?
Knowledge graph problems
• Assigning meaning to
relations
• Incorrect relations
• Missing relations
• …
Facts of life in computer
processing of human language
• False Positives and False Negatives always
• Human annotators remain the gold standard
• There are not nearly enough professional human
annotators to process every document
published
5 Not paying attention? be a citizen scientist at http://mark2cure.org
Observations
• There are about 2.92 billion Internet users
• Lots of them can read English
6 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/
Hypothesis
• We can generate the equivalent of massive
numbers of professional annotators by
aggregating the labor of large numbers of non-
professional CITIZEN SCIENTISTS!!!
7
Building a Knowledge
Graph
1. Find mentions of concepts in text
2. Identify relationships between concepts
8
Before we try for citizens..
• Can non-scientists collectively identify concepts
in biomedical texts with high quality?
• We used the Amazon Mechanical Turk
crowdsourcing platform to answer the question
9 Not paying attention? be a citizen scientist at http://mark2cure.org
Highlight the “disease”.
Answer was yes
• By combining the responses of multiple non-
professional members of ‘the crowd’, we achieved
equivalent quality to professional annotators
Good et al. “Microtask crowdsourcing for disease mention annotation in pubmed abstracts.”
Pacific Symposium on Biocomputing 2015
http://psb.stanford.edu/psb-online/proceedings/psb15/good.pdf
Mark2Cure.org
Same task, different context
Experiment 1 in progress
Evaluating quality and
quantity of volunteer
annotators
Goal is to complete
about 600 abstracts,
with 15 volunteers per
abstract
Almost there!
mark2cure experiment 1
Tasks/10
New users
Launch
Tweet
Blog post
San Diego
Union Tribune
Article
11:00am Feb. 9
5423, tasks complete
230 signups, 130 have completed a task
Not paying attention? be a citizen scientist at http://mark2cure.org
Next steps
• Implement and test a relation extraction workflow
• Start disease-focused knowledge capture
missions
• First disease: NGLY1 deficiency
• http://ngly1.org
Thanks to the
mark2cure team!
Max Nanis
Andrew Su
@bgood
bgood@scripps.edu
Ginger Tsueng
Chunlei Wu
Thank you to the
citizen scientists
making this possible!
Why do I Mark2Cure?
In memory of my daughter who had Cystic Fibrosis
Studied biology in college and I really miss it!
My 4 year old daughter Phoebe is living with and battling rare disease.
I have Ehlers Danlos Syndrome. I hope to help people learn
about this painful and debilitating disorder, so that others like
me can receive more effective medical care.
I am retired, have a doctorate in medical humanities, and have
two children with Gaucher disease. I am just looking for some
way to put my education to use.
To give back
I Mark2Cure in memory of my son Mike who had type 1
diabetes.
Take part in something that helps humanity.
Building a massive biomedical knowledge graph with citizen science
Increase precision with voting
20
1 or more votes (K=1)
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=2
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=3
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=4
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
Aggregation
function
AMT results: 589 abstracts
compared to gold standard
21
F = 0.87, k = 6

More Related Content

PDF
1476-4598-1-1
PDF
MIPS Bulletin
PDF
You've Got Rhythm Colorado State University Study Finds that Genes Expressed...
PPT
Hyperplasia of Cancer
PDF
Building and Using a Knowledge Graph to Combat Human Trafficking
DOC
Human trafficking in migration process of bangladesh
PPTX
Human trafficking
PPTX
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
1476-4598-1-1
MIPS Bulletin
You've Got Rhythm Colorado State University Study Finds that Genes Expressed...
Hyperplasia of Cancer
Building and Using a Knowledge Graph to Combat Human Trafficking
Human trafficking in migration process of bangladesh
Human trafficking
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine

Viewers also liked (20)

PDF
Sciences Games #Glass2015
PDF
PhD Proposal
PDF
Dagens Næringslivs overgang til Lucene/Solr søk
PDF
B2B Branding Explained
PPTX
Scripps bioinformatics seminar_day_2
PPTX
Short update on The Cure game first week
PPTX
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
PPT
Buyer Remorse
PPTX
2016 mem good
PPT
IMSafer Angel Round
PPTX
Gene Wiki at Phenotype RCN annual meeting
PDF
Resume 2009 Compatible V2 1
PDF
Eishi Company Profile 修改好的
PDF
Light steel villa catalogue log
PPTX
Gene Wiki and Mark2Cure update for BD2K
PDF
Mark Hopper Product And Marketing Exec 2010
PDF
Oslo Solr MeetUp March 2012 - Solr4 alpha
PPTX
genegames.org
PDF
(Bio)Hackathons
PPT
2to3
Sciences Games #Glass2015
PhD Proposal
Dagens Næringslivs overgang til Lucene/Solr søk
B2B Branding Explained
Scripps bioinformatics seminar_day_2
Short update on The Cure game first week
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
Buyer Remorse
2016 mem good
IMSafer Angel Round
Gene Wiki at Phenotype RCN annual meeting
Resume 2009 Compatible V2 1
Eishi Company Profile 修改好的
Light steel villa catalogue log
Gene Wiki and Mark2Cure update for BD2K
Mark Hopper Product And Marketing Exec 2010
Oslo Solr MeetUp March 2012 - Solr4 alpha
genegames.org
(Bio)Hackathons
2to3
Ad

Similar to Building a massive biomedical knowledge graph with citizen science (20)

PDF
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
PDF
Malignant Lymphomas Biology And Molecular Pathogenesis
PPTX
UCSD / DBMI seminar 2015-02-6
PPT
Big Data in Biomedicine: Where is the NIH Headed
PDF
A method to develop a comic based on your research: From scientists, for scie...
PPTX
Crowdsourcing applied to knowledge management in translational research: the ...
PPTX
Microbiome Initiatives and Resources
PPTX
Using Citizen Science to organize biomedical knowledge
PDF
Descriptive Essay On Duck Hunting. Online assignment writing service.
PPTX
Metagenomics talk from uBiome's Jessica Richman
PPTX
Biocuration activities for the International Cancer Genome Consortium (ICGC).
PPTX
Global surveillance One World – One Health
PPTX
Nov 2014 ouellette_windsor_icgc_final
PDF
Maximizing the Benefits of Comprehensive Genomic Testing in Cancer Care with ...
PDF
Bda2015 tutorial-part1-intro
PDF
Final report
PPTX
Federal Research & Development for the Florida system Sept 2014
PPTX
Discovering the 100 Trillion Bacteria Living Within Each of Us
PPTX
Discovering the 100 Trillion Bacteria Living Within Each of Us
PPTX
Knowledge Exchange programmes in Science
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Malignant Lymphomas Biology And Molecular Pathogenesis
UCSD / DBMI seminar 2015-02-6
Big Data in Biomedicine: Where is the NIH Headed
A method to develop a comic based on your research: From scientists, for scie...
Crowdsourcing applied to knowledge management in translational research: the ...
Microbiome Initiatives and Resources
Using Citizen Science to organize biomedical knowledge
Descriptive Essay On Duck Hunting. Online assignment writing service.
Metagenomics talk from uBiome's Jessica Richman
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Global surveillance One World – One Health
Nov 2014 ouellette_windsor_icgc_final
Maximizing the Benefits of Comprehensive Genomic Testing in Cancer Care with ...
Bda2015 tutorial-part1-intro
Final report
Federal Research & Development for the Florida system Sept 2014
Discovering the 100 Trillion Bacteria Living Within Each of Us
Discovering the 100 Trillion Bacteria Living Within Each of Us
Knowledge Exchange programmes in Science
Ad

More from Benjamin Good (20)

PPTX
Representing and reasoning with biological knowledge
PPTX
Integrating Pathway Databases with Gene Ontology Causal Activity Models
PPTX
Pathways2GO: Converting BioPax pathways to GO-CAMs
PPTX
Knowledge Beacons
PPTX
Building a Biomedical Knowledge Garden
PPTX
Science Game Lab
PPTX
Wikidata and the Semantic Web of Food
PPTX
Gene Wiki and Wikimedia Foundation SPARQL workshop
PPTX
Opportunities and challenges presented by Wikidata in the context of biocuration
PPTX
Computing on the shoulders of giants
PPTX
Wikidata workshop for ISB Biocuration 2016
PPTX
Channeling Collaborative Spirit
PPTX
2016 bd2k bgood_wikidata
PPTX
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
PPTX
2015 6 bd2k_biobranch_knowbio
PDF
Citizen sciencepanel2015 pdf
PPTX
Branch: An interactive, web-based tool for building decision tree classifiers
PPTX
Serious games for bioinformatics education. ISMB 2014 education workshop
PPTX
The Cure: Making a game of gene selection for breast cancer survival prediction
PPTX
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Representing and reasoning with biological knowledge
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Pathways2GO: Converting BioPax pathways to GO-CAMs
Knowledge Beacons
Building a Biomedical Knowledge Garden
Science Game Lab
Wikidata and the Semantic Web of Food
Gene Wiki and Wikimedia Foundation SPARQL workshop
Opportunities and challenges presented by Wikidata in the context of biocuration
Computing on the shoulders of giants
Wikidata workshop for ISB Biocuration 2016
Channeling Collaborative Spirit
2016 bd2k bgood_wikidata
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
2015 6 bd2k_biobranch_knowbio
Citizen sciencepanel2015 pdf
Branch: An interactive, web-based tool for building decision tree classifiers
Serious games for bioinformatics education. ISMB 2014 education workshop
The Cure: Making a game of gene selection for breast cancer survival prediction
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
BIOMOLECULES PPT........................
PDF
. Radiology Case Scenariosssssssssssssss
PPT
protein biochemistry.ppt for university classes
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
An interstellar mission to test astrophysical black holes
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
BIOMOLECULES PPT........................
. Radiology Case Scenariosssssssssssssss
protein biochemistry.ppt for university classes
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Comparative Structure of Integument in Vertebrates.pptx
HPLC-PPT.docx high performance liquid chromatography
POSITIONING IN OPERATION THEATRE ROOM.ppt
neck nodes and dissection types and lymph nodes levels
An interstellar mission to test astrophysical black holes
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Biophysics 2.pdffffffffffffffffffffffffff
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Classification Systems_TAXONOMY_SCIENCE8.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Viruses (History, structure and composition, classification, Bacteriophage Re...

Building a massive biomedical knowledge graph with citizen science

  • 1. Building a massive biomedical knowledge graph with citizen science Benjamin Good The Scripps Research Institute @bgood Not paying attention? be a citizen scientist at http://mark2cure.org
  • 2. High level goal: improve access to published knowledge 22 articles added to PubMed per year 1 every 30 seconds, more than million a year knowledge graph
  • 3. Chemicals & drugs Genes Organisms Area of study Biological Process Auto! Knowledge Graph ~10,000 articles Ngly1 gene ? New drug candidate?
  • 4. Knowledge graph problems • Assigning meaning to relations • Incorrect relations • Missing relations • …
  • 5. Facts of life in computer processing of human language • False Positives and False Negatives always • Human annotators remain the gold standard • There are not nearly enough professional human annotators to process every document published 5 Not paying attention? be a citizen scientist at http://mark2cure.org
  • 6. Observations • There are about 2.92 billion Internet users • Lots of them can read English 6 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/
  • 7. Hypothesis • We can generate the equivalent of massive numbers of professional annotators by aggregating the labor of large numbers of non- professional CITIZEN SCIENTISTS!!! 7
  • 8. Building a Knowledge Graph 1. Find mentions of concepts in text 2. Identify relationships between concepts 8
  • 9. Before we try for citizens.. • Can non-scientists collectively identify concepts in biomedical texts with high quality? • We used the Amazon Mechanical Turk crowdsourcing platform to answer the question 9 Not paying attention? be a citizen scientist at http://mark2cure.org
  • 11. Answer was yes • By combining the responses of multiple non- professional members of ‘the crowd’, we achieved equivalent quality to professional annotators Good et al. “Microtask crowdsourcing for disease mention annotation in pubmed abstracts.” Pacific Symposium on Biocomputing 2015 http://psb.stanford.edu/psb-online/proceedings/psb15/good.pdf
  • 14. Experiment 1 in progress Evaluating quality and quantity of volunteer annotators Goal is to complete about 600 abstracts, with 15 volunteers per abstract Almost there!
  • 15. mark2cure experiment 1 Tasks/10 New users Launch Tweet Blog post San Diego Union Tribune Article 11:00am Feb. 9 5423, tasks complete 230 signups, 130 have completed a task Not paying attention? be a citizen scientist at http://mark2cure.org
  • 16. Next steps • Implement and test a relation extraction workflow • Start disease-focused knowledge capture missions • First disease: NGLY1 deficiency • http://ngly1.org
  • 17. Thanks to the mark2cure team! Max Nanis Andrew Su @bgood bgood@scripps.edu Ginger Tsueng Chunlei Wu Thank you to the citizen scientists making this possible!
  • 18. Why do I Mark2Cure? In memory of my daughter who had Cystic Fibrosis Studied biology in college and I really miss it! My 4 year old daughter Phoebe is living with and battling rare disease. I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care. I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use. To give back I Mark2Cure in memory of my son Mike who had type 1 diabetes. Take part in something that helps humanity.
  • 20. Increase precision with voting 20 1 or more votes (K=1) This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=2 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=3 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=4 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. Aggregation function
  • 21. AMT results: 589 abstracts compared to gold standard 21 F = 0.87, k = 6