SlideShare a Scribd company logo
Understanding Biological Function in Times of High
Throughput and Low Output
Iddo Friedberg
Iowa State University
http://iddo-friedberg.net
@iddux
Big Data in my lab
Gene block evolution
Images and Genomes
Host/Microbiome Database error and bias
Critical
Assessment
of Protein
Function
Annotations
Big Data in my lab
Database error and bias
Critical
Assessment
of Protein
Function
Annotations
Big Data in my lab
Database error and bias
Critical
Assessment
of Protein
Function
Annotations
Understanding methods
Understanding the data
Understanding Methods: The Critical Assessment
of protein Function Annotations
Pedja
Wyatt
Sean
Tal
Alex
Large Data Biology has a Bad Rap?
"So we now have a culture which is based on
everything must be high-throughput.I like to
call it low-input, high-throughput, no-output
biology"
– Sydney Brenner
Motivation: The Knowledge Gap
● The gap between data and Information
Information
Data
Temperton & Giovannoni Curr. Opin. Microbiology (2012)
Errors Accumulate in Databases
Schnoes A et al (2009) PLoS Computational Biology, 5 (12)
Assigning Function to Proteins
Low-ish throughput
High throughput
Machine learning
Most Proteins are Annotated Electronically
Experimental
Computational
Electronic
Other
0
10
20
30
40
50
60
70
80
90
100
Arabidopsis
mouse
Cow
Zebrafish
Chicken
Human
Compiled from the GOA project, EBI, 6/2011
Problems
● Most genes are annotated electronically
● Databases have a high error rate which is growing
● Homology transfer is less effective
Solutions?
Assess accuracy of
annotation software
Write better software
Challenges in Picking Targets
● Can't use databases: circularity problem
● Experimental groups have a small “sharing timeframe”
● Function description too vague for precise GO
annotation
● There are “unknown unknowns”
Choose an
annotated protein
Prediction method uses said
annotation to predict function
Circular logic...
… is circular
Choosing Assessment Benchmarks
Function unknown Function still unknown Function still unknown
Challenge
opens
Submission
deadline
Assessment
time
Function unknown Function still unknown Function known
Benchmark?
Time
BLAST
Naive
Molecular Function precision/ recall
BLAST
Naive
Biological process precision/ recall
Case Study: hPNPase
Gadi Schuster (Technion)
Successful Methods?log(obs/exp)
Biological
Process
Molecular
Function
profile-profilealignments
literature
ortholog
sequenceproperties
proteininteractions
geneexpression
phylogenysequencealignments
otherfunctionalinformation
machinelearningbasedmethod
sequence-profilealignments
-0.4
-0.2
0
0.2
0.4
0.6
0.8
proteininteractions
geneexpression
literatureprofile-profilealignments
ortholog
sequenceproperties
phylogeny
sequencealignmentstherfunctionalinformation
inelearningbasedmethodquence-profilealignments
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
CAFA2 vs. CAFA1
CAFA2 was held in 2014-2015
More targets (100,00 vs. 50,000)
More groups (56 vs 29)
Creator:MetaPost 0.993
CreationDate:2015.09.30:1653
CAFA2 vs. CAFA1
CAFA2 was held in 2014-2015
100,000 targets
147 participants
Methods have improved
CAFA Conclusions & What's Next
● Homology transfer still rules.
● Combined methods work best
● Molecular Function is easier to predict than Biological
Process
● Generally, the field can use improvement
● Comparison of metrics is very much needed
– Why do methods perform differently under different metrics?
– Is there a “best” metric? What is “best”?
● Databases are biased
Understanding Methods: The Critical Assessment
of protein Function Annotations
Pedja
Wyatt
Sean
Tal
Alex
protein binding
protein homodimerization
activity
zinc ion binding
transcription activator
activity
chromatin binding
transcription repressor
activity
transcription factor
activity
two-component sensor
activity
specific transcriptional
repressor activity
DNA binding
calcium ion binding
identical protein binding
manganese ion binding
ATP binding
beta-galactoside alpha-
2,3-sialyltransferase
activity
magnesium ion binding
enzyme binding
electron carrier activity
structural constituent of
ribosome
metal ion binding
Leaf terms Molecular Function
David Ream(MU)
Alexander Thorman (MU)
Alexandra Schnoes (UCSF)
Protein Binding
Activity
Annotations per article
Schnoes et al PloS Comp Biol (2013)
Information is in an inverse relationship to the
number of proteins annotated
1 <10 <100 ≥100
Molecular Function
1 <10 <100 ≥100
Biological Process
1 <10 <100 ≥100
Cellular Component
1 <10 <100 ≥100
Informationcontent
Schnoes et al (2013)
Single throughput
(1 protein/study)
High information
(12 bits)
Low information
(3 bits)
High throughput
(≥ proteins/study)
High Throughput Experiments
● Bias our knowledge
● Bias priors for function
prediction programs
● Are less informative
than low-throughput
experiments
● Exclusively annotate
genes otherwise
unknown
● Fewer $$$
● Fast results
● Consistency
The GoodThe Bad
Data that Will Drive Computation
● Whole chromosome
sequencing
● Epigenomics
● Integrating images:
phenomics→ genomics
relationships
● Fragment-based
sequencing
● Proteomics
● Metabolomics
● Documents
● Network data
● Images
Current Future
Thank you
http://iddo-friedberg.net

More Related Content

PPTX
Complex Systems Biology Informed Data Analysis and Machine Learning
PPTX
Machine Learning Powered Metabolomic Network Analysis
PPTX
Data analysis workflows part 1 2015
PPTX
Connecting Metabolomic Data with Context
PPTX
Omic Data Integration Strategies
PPT
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
PPTX
Data analysis workflows part 2 2015
PDF
Considerations and challenges in building an end to-end microbiome workflow
Complex Systems Biology Informed Data Analysis and Machine Learning
Machine Learning Powered Metabolomic Network Analysis
Data analysis workflows part 1 2015
Connecting Metabolomic Data with Context
Omic Data Integration Strategies
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
Data analysis workflows part 2 2015
Considerations and challenges in building an end to-end microbiome workflow

What's hot (16)

PPTX
Metabolomic data analysis and visualization tools
PPTX
Supporting researchers in the molecular life sciences Jeff Christiansen
DOCX
Resume Chi Zhang
PPTX
An examination of data quality on QSAR Modeling in regards to the environment...
PPTX
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
PDF
BioFuel - MetaTranscriptomics - Enzyme Activity
PPTX
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
PDF
Data Management Lab: Session 3 Slides
PDF
David Tyrpak CV
PDF
Validating microbiome claims – including the latest DNA techniques
PPT
Multivariate data analysis and visualization tools for biological data
PDF
Prediction and Meta-Analysis
PDF
An Introduction to Biology with Computers
PDF
An Introduction to Machine Learning and Genomics
PDF
Digital transformation of translational medicine
PDF
Expert Panel on Data Challenges in Translational Research
Metabolomic data analysis and visualization tools
Supporting researchers in the molecular life sciences Jeff Christiansen
Resume Chi Zhang
An examination of data quality on QSAR Modeling in regards to the environment...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
BioFuel - MetaTranscriptomics - Enzyme Activity
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
Data Management Lab: Session 3 Slides
David Tyrpak CV
Validating microbiome claims – including the latest DNA techniques
Multivariate data analysis and visualization tools for biological data
Prediction and Meta-Analysis
An Introduction to Biology with Computers
An Introduction to Machine Learning and Genomics
Digital transformation of translational medicine
Expert Panel on Data Challenges in Translational Research
Ad

Viewers also liked (20)

PPTX
Picture openucc2
DOCX
PDF
Edwards Signaling E-FSA64RD Installation Manual
PPTX
MÉTODOS DE ENSEÑANZAS CON TICS
DOCX
Berikut cara update posting blog melalui hp
PDF
แนวข้อสอบนักวิชาการพาณิชย์ปฏิบัติการ กรมเจรจาการค้าระหว่างประเทศ
PPTX
стратегическое планирование в интернет бизнесе
PDF
Random Musings on Fixing Data Shambles in Science
PPTX
PDF
Tropical Panache Thane
PDF
Career Support Seminars at Sept 6 Cleared Job Fair
PDF
IT Certification Roadmap
DOCX
Teorias pedagogicas diplomado
PDF
IoTReport
PPTX
Vaguistes a espanya (1915 1920) by Víctor antón
PPTX
keys to reducing interactions with quinolones
PPTX
aroma deodorant
PPT
Programming
PPTX
Sjogren syndrome, halitosis & treatment of osf
Picture openucc2
Edwards Signaling E-FSA64RD Installation Manual
MÉTODOS DE ENSEÑANZAS CON TICS
Berikut cara update posting blog melalui hp
แนวข้อสอบนักวิชาการพาณิชย์ปฏิบัติการ กรมเจรจาการค้าระหว่างประเทศ
стратегическое планирование в интернет бизнесе
Random Musings on Fixing Data Shambles in Science
Tropical Panache Thane
Career Support Seminars at Sept 6 Cleared Job Fair
IT Certification Roadmap
Teorias pedagogicas diplomado
IoTReport
Vaguistes a espanya (1915 1920) by Víctor antón
keys to reducing interactions with quinolones
aroma deodorant
Programming
Sjogren syndrome, halitosis & treatment of osf
Ad

Similar to Understanding Biological Function in Times of High Throughput and Low Output (20)

ODP
The roles communities play in improving bioinformatics: better software, bett...
PPTX
BioAssay Express: Creating and exploiting assay metadata
PPTX
Zmasek TOPSAN Biohackathon 2011
PPTX
Proteomics_Slideshow system biology info
PPT
dkNET Webinar: The Mouse Metabolic Phenotyping Centers: Services and Data 01/...
PDF
Introduction to Bioinformatics for Molecular Studies
PPT
Kurtz biomanufacturing
PDF
How bioinformatic and sequencing data might inform the regulatory process - O...
PDF
PEDSnet : 18 month summary on data integration and data quality
PDF
CINECA webinar slides: Modular and reproducible workflows for federated molec...
PPTX
Data preprocessing
PPTX
Mapping to the Metabolomic Manifold
PPT
Next Generation Data and Opportunities for Clinical Pharmacologists
PPTX
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
PPSX
Metabolic engineering approaches in medicinal plants
PDF
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
PDF
Nowomics at Cambridge Open Research
PPT
Semantic Web for Health Care and Biomedical Informatics
PDF
Resume gill,prabhpreet final
PPT
Cncp 2010
 
The roles communities play in improving bioinformatics: better software, bett...
BioAssay Express: Creating and exploiting assay metadata
Zmasek TOPSAN Biohackathon 2011
Proteomics_Slideshow system biology info
dkNET Webinar: The Mouse Metabolic Phenotyping Centers: Services and Data 01/...
Introduction to Bioinformatics for Molecular Studies
Kurtz biomanufacturing
How bioinformatic and sequencing data might inform the regulatory process - O...
PEDSnet : 18 month summary on data integration and data quality
CINECA webinar slides: Modular and reproducible workflows for federated molec...
Data preprocessing
Mapping to the Metabolomic Manifold
Next Generation Data and Opportunities for Clinical Pharmacologists
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolic engineering approaches in medicinal plants
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
Nowomics at Cambridge Open Research
Semantic Web for Health Care and Biomedical Informatics
Resume gill,prabhpreet final
Cncp 2010
 

More from Iddo (20)

PDF
What can Community Challenges do for You?
PDF
Surviving Scientific Presentations
PDF
Friedberg lab-overview-grad-students-2019-nr
PDF
Why Your Microbiome Analysis is Wrong
PDF
Tracing the Ancestry of Genomes in Bacteria
PDF
Computational Challenges in Biological Data Science: an Optimistically Cautio...
PDF
Friedberg lab-overview-grad-students
PDF
Genome Informatics 2015 Bacteriocin Discovery
ODP
Convergent divergent
ODP
Some US Science Funding sources
PDF
CAFA poster presented at CSHL Genome Informatics 2013
PPTX
Ewan Birney Biocuration 2013
ODP
Metagenomics Biocuration 2013
PDF
Ismb grant-writing-2012
PDF
David Jones AFP/CAFA2011
PDF
Vienna afp2011
PDF
Afp cafa djuric
PDF
Go camp 2010_cacao
ODP
Ignobel2010
ODP
Critical Assessment of Function Annotation, 2005
What can Community Challenges do for You?
Surviving Scientific Presentations
Friedberg lab-overview-grad-students-2019-nr
Why Your Microbiome Analysis is Wrong
Tracing the Ancestry of Genomes in Bacteria
Computational Challenges in Biological Data Science: an Optimistically Cautio...
Friedberg lab-overview-grad-students
Genome Informatics 2015 Bacteriocin Discovery
Convergent divergent
Some US Science Funding sources
CAFA poster presented at CSHL Genome Informatics 2013
Ewan Birney Biocuration 2013
Metagenomics Biocuration 2013
Ismb grant-writing-2012
David Jones AFP/CAFA2011
Vienna afp2011
Afp cafa djuric
Go camp 2010_cacao
Ignobel2010
Critical Assessment of Function Annotation, 2005

Recently uploaded (20)

PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Microbiology with diagram medical studies .pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
An interstellar mission to test astrophysical black holes
PPTX
2. Earth - The Living Planet earth and life
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
2. Earth - The Living Planet Module 2ELS
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Cardiovascular system_structure and functions-1
Classification Systems_TAXONOMY_SCIENCE8.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Placing the Near-Earth Object Impact Probability in Context
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
INTRODUCTION TO EVS | Concept of sustainability
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Microbiology with diagram medical studies .pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
AlphaEarth Foundations and the Satellite Embedding dataset
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
An interstellar mission to test astrophysical black holes
2. Earth - The Living Planet earth and life
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
2. Earth - The Living Planet Module 2ELS

Understanding Biological Function in Times of High Throughput and Low Output