SlideShare a Scribd company logo
THERE AND BACK AGAIN
4th Annual BCB Symposium, March 30, 2018
Adina Howe
Assistant Professors
Department of Agricultural and
Biosystems Engineering
Iowa State University
www.germslab.org
NGS SEQUENCING
HOW DID WE GET HERE
Understanding community dynamics
¨ Who is there?
¨ What are they doing?
¨ How are they doing it?
Kim Lewis, 2010
Gene / Genome Sequencing
¨ Collect samples
¨ Extract DNA
¨ Sequence DNA
¨ “Analyze” DNA to identify its content and origin
Taxonomy
(e.g., pathogenic E. Coli)
Function
(e.g., degrades cellulose)
Cost of Sequencing
Stein, Genome Biology, 2010
E. Coli genome 4,500,000 bp ($4.5M, 1992)
1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012
Year
0.1
1
10
100
1,000
10,000
100,000
1,000,000
DNASequencing,Mbpper$
10,000,000
100,000,000
Rapidly decreasing costs with
NGS Sequencing
Stein, Genome Biology, 2010
Next Generation Sequencing
4,500,000 bp (E. Coli, $100, presently)
1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012
Year
0.1
1
10
100
1,000
10,000
100,000
1,000,000
DNASequencing,Mbpper$
10,000,000
100,000,000
Effects of low cost sequencing…
First free-living bacterium sequenced for
billions of dollars and years of analysis
Personal genome can be
mapped in a few days and
hundreds of dollars
The experimental continuum
Single Isolate
Pure Culture
Enrichment
Mixed Cultures
Natural systems
The era of big data in biology
Stein, Genome Biology, 2010
Computational Hardware
(doubling time 14 months)
Sanger Sequencing
(doubling time 19 months)
NGS (Shotgun) Sequencing
(doubling time 5 months)
1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012
Year
0
1
10
100
1,000
10,000
100,000
1,000,000
DiskStorage,Mb/$
0.1
1
10
100
1,000
10,000
100,000
1,000,000
DNASequencing,Mbpper$
10,000,000
100,000,000
0.1
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Postdoc experience with data
2003-2008 Cumulative sequencing in PhD = 2000 bp
2008-2009 Postdoc Year 1 = 50 Gbp
2009-2010 Postdoc Year 2 = 450 Gbp
2014 = 50 Tbp
today = 0.5 to 1 Tbp budgeted per project (PhD student)
Tackling Soil Biodiversity
Source: Chuck Haney
C. Titus Brown, James Tiedje, Qingpeng Zhang, Jason Pell (MSU)
Janet Jansson, Susannah Tringe (JGI)
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
THE DIRT ON SOIL
Biodiversity in the dark, Wall et al., Nature Geoscience, 2010 Jeremy Burgress
MAGNIFICENT BIODIVERSITY
THE DIRT ON SOIL
SPATIAL HETEROGENEITY
http://www.fao.org/ www.cnr.uidaho.edu
THE DIRT ON SOIL
DYNAMIC
THE DIRT ON SOIL
INTERACTIONS: BIOTIC, ABIOTIC, ABOVE, BELOW, SCALES
Philippot, 2013, Nature Reviews Microbiology
Data compression
http://siliconangle.com/files/2010/09/image_thumb69.png
de novo assembly
vCompresses dataset size significantly
vImproved data quality (longer sequences, gene order)
vReference not necessary (novelty)
Raw sequencing data (“reads”) Computational algorithms Informative genes / genomes
Metagenome assembly…a scaling
problem.
Shotgun sequencing and de novo
assembly
It was the Gest of times, it was the wor
, it was the worst of timZs, it was the
isdom, it was the age of foolisXness
, it was the worVt of times, it was the
mes, it was Ahe age of wisdom, it was th
It was the best of times, it Gas the wor
mes, it was the age of witdom, it was th
isdom, it was tIe age of foolishness
It was the best of times, it was the worst of times, it was the age of
wisdom, it was the age of foolishness
Practical Challenges – Intensive
computing
Howe et al, 2014, PNAS
Months of
“computer
crunching” on a
super computer
Practical Challenges – Intensive
computing
Howe et al, 2014, PNAS
Months of
“computer
crunching” on a
super computer
Assembly of 300 Gbp (70,000
genomes worth) can be done with
any assembly program in less than
14 GB RAM and less than 24 hours.
50 Gbp = 10,000 genomes
Four main challenges for de novo
sequencing of metagenomes.
¨ Repeats.
¨ Low coverage.
¨ Errors - these introduce breaks in the construction of
contigs.
¨ Variation in coverage – transcriptomes and
metagenomes (PCR amplification biases)
Challenge: assembler must distinguish between
erroneous connections and real connections.
Natural community characteristics
u Diverse
è Many organisms
(genomes)
Natural community characteristics
u Diverse
è Many organisms
(genomes)
u Variable abundance
è Most abundant organisms, sampled
more often
è Assembly requires a minimum amount
of sampling
è More sequencing, more errors
Sample 1x
Natural community characteristics
u Diverse
è Many organisms
(genomes)
u Variable abundance
è Most abundant organisms, sampled
more often
è Assembly requires a minimum amount
of sampling
è More sequencing, more errors
Sample 1x Sample 10x
Natural community characteristics
u Diverse
è Many organisms
(genomes)
u Variable abundance
è Most abundant organisms, sampled
more often
è Assembly requires a minimum amount
of sampling
è More sequencing, more errors
Sample 1x Sample 10x
Overkill
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
Brown et al., 2012, arXiv
Howe et al., 2014, PNAS
Zhang et al., 2014, PLOS One
Titus Brown , Jason Pell, Qingpeng Zhang
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
Brown et al., 2012, arXiv
Howe et al., 2014, PNAS
Zhang et al., 2014, PLOS One
Titus Brown , Jason Pell, Qingpeng Zhang
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
X
X
Brown et al., 2012, arXiv
Howe et al., 2014, PNAS
Zhang et al., 2014, PLOS One
X = Sequencing errors L
Titus Brown , Jason Pell, Qingpeng Zhang
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
X
X
Brown et al., 2012, arXiv
Howe et al., 2014, PNAS
Zhang et al., 2014, PLOS One
Titus Brown , Jason Pell, Qingpeng Zhang
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
If next read is from a high
coverage region - discard
X
X
Brown et al., 2012, arXiv
Howe et al., 2014, PNAS
Zhang et al., 2014, PLOS One
Titus Brown , Jason Pell, Qingpeng Zhang
Digital normalization
True sequence (unknown)
Reads
(randomly sequenced)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Redundant reads
(not needed for assembly)
Brown et al., 2012, arXiv
Howe et al., 2014, PNAS
Pell et al, 2012, PNAS
Zhang et al., 2014, PLOS One
v Scales datasets for assembly up to 95% - same assembly
outputs.
v Genomes, mRNA-seq, metagenomes (soils, gut, water)
Tackling Soil Biodiversity
Source: Chuck Haney
C. Titus Brown, James Tiedje, Qingpeng Zhang, Jason Pell (MSU)
Janet Jansson, Susannah Tringe (JGI)
The reality?
More like…
Source: Chuck HaneyHowe et. al, 2014, PNAS
The Future
¨ More data, more samples, better references
¨ Expense will be in sampling – not sequencing or
even data analysis
¨ All biologists will need to know how to use a pipette
and write computer programs
¨ Large-scale, collaborative projects rather than
single PI efforts
Soils @ ISU
¨ CABBI DOE Bioenergy Research Center / ISU: Can
we predict where to grow the best bioenergy
crops? How come some bioenergy crops are so
good at finding nitrogen and some not?
¨ USDA ARS / ISU : What happens during
composting? Does the compost source matter?
What is the timing of nutrients? Can we manage it?
¨ USDA ARS / ISU : Soil as an environmental
pathway: What happens in the soil in traditionally
managed agriculture? Manure-amended soils?
PART II.
Two blog posts on this:
adina.github.io
“Job Hunt Wrap Up”
“My job application to ISU”
Background
Purdue University, BSME,
Mechanical Engineering
Purdue University, MS,
Environmental Engineering
(Sustainability)
Background
Purdue University, BSME,
Mechanical Engineering
Purdue University, MS,
Environmental Engineering
(Sustainability)
University of Iowa, PhD,
Environmental Engineering
(Microbiology/Bioremediation)
Background
Purdue University, BSME,
Mechanical Engineering
Purdue University, MS,
Environmental Engineering
(Sustainability)
University of Iowa, PhD,
Environmental Engineering
(Microbiology/Bioremediation)
Michigan State University
NSF Postdoc Math and Biology Fellow (cross-training)
Microbial Ecology (Jim Tiedje)
Bioinformatics (Titus Brown)
Background
Purdue University, BSME,
Mechanical Engineering
Purdue University, MS,
Environmental Engineering
(Sustainability)
University of Iowa, PhD,
Environmental Engineering
(Microbiology/Bioremediation)
Michigan State University
NSF Postdoc Math and Biology Fellow (cross-training)
Microbial Ecology (Jim Tiedje)
Bioinformatics (Titus Brown)
Computational Biologist
Microbiology / Microbial Ecology
Business vs. govt vs. academics
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
What I turned to…
Three main questions
¨ When do I start looking for jobs?
¨ Where do I look for jobs?
¨ What should I say about myself?
Job Hunt Statistics
¨ 2 Postdocs
¨ 2 first author papers
¨ 3 non-first author papers
¨ $3 million grant funded (co-PI)
¨ $325K grants historically (PI)
¨ Backup plan – full staff scientist at Argonne
National Laboratory
Job hunt statistics (cont.)
¨ Environmental engineering, agricultural engineering,
biology, microbiology, ecology and evolution
¨ 30 applications
¤ Prioritized < 10 applications, 3 versions
¤ Research statement, teaching statement, CV, references
¨ 5 on-site interviews (see blog for questions I was asked)
¨ 1 phone interview
¨ 8 rejection communications (dream offer tangent…)
¨ 2 offers
Evaluate yourself.
Pros
¨ Skill set unique (for now)
¨ Diverse systems
¨ Engr / Biology /
Ecology
¨ Comfortable
¨ Great reference writers
¤ Note to talk about
reference writer packet
Cons
¨ Defining “who” I am
¨ Publication productivity
¨ Husband required
Want it.
¨ Personal Impact
¤ Research
¤ Mentorship
¨ Enjoy what you’re selling
¤ Grant writing
¤ Publishing
¤ Presenting
¨ No man is an island
Believe it.
¨ I am very good at what I
do.
¨ My work deserves to be
funded.
¨ No imposter syndrom-ing
during application writing
Prioritize the effort.
¨ Job hunting
requires:
¤ Mood
¤ Time
¤ Emotions
¤ Focus
¨ You have another
job
¨ Wrote applications
on weekend (half
day)
Only apply if you can imagine yourself
accepting the offer.
Stay organized
¨ Addressees
¨ Specifics
¨ Deadlines
¨ References
¨ Save postings &
application
Be positive.
¨ It is about fit.
¨ Many factors you cannot control.
¨ This is not an evaluation of your worth.
¨ Resilience is key.
Let it be “Out the door”
Manage your own expectations.
¨ Know myself.
¨ Let myself win & lose.
¨ Get jazzed during the interview.
¨ Find nice, hardy laughing people.
¨ Find really smart people that would help mentor
me..
¨ Work hard but find balance.
¨ Dream it.
The ISU job posting
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Thank you!
BCB:
Shane Dooley
Schuyler Smith
Paul Villenueva
www.germslab.org

More Related Content

PPTX
Big Data Field Museum
PPTX
Sweden_eemis_big_data
PPTX
Big data nebraska
PPTX
Job Talk Iowa State University Ag Bio Engineering
PDF
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
PDF
Networks, plant health and biodiversity
PDF
EVE 161 Winter 2018 Class 14
PPTX
American Gut Project presentation at Masaryk University
Big Data Field Museum
Sweden_eemis_big_data
Big data nebraska
Job Talk Iowa State University Ag Bio Engineering
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Networks, plant health and biodiversity
EVE 161 Winter 2018 Class 14
American Gut Project presentation at Masaryk University

What's hot (20)

PDF
Marine Host-Microbiome Interactions: Challenges and Opportunities
PDF
EVE 161 Winter 2018 Class 17
PDF
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
PDF
JIES 2014 - F. Lombard - Les données ne sont pas données
PDF
ISU ENVSCI690 Graduate Seminar Slides
PPTX
2014 04-beiko-biology
PDF
Microbial Phylogenomics (EVE161) Class 7: rRNA PCR and Major Groups
PDF
Genome-scale phylogenomics
PDF
EVE 161 Lecture 6
PDF
Towards inferring the history of life in the presence of lateral gene transfe...
PPTX
Is microbial ecology driven by roaming genes?
PDF
EveMicrobial Phylogenomics (EVE161) Class 9
PDF
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
PDF
Genomic inference of the evolution of life history traits
PDF
EVE 161 Winter 2018 Class 10
PDF
UC Davis EVE161 Lecture 14 by @phylogenomics
PDF
EVE 161 Winter 2018 Class 15
PPTX
Beiko cms final
PDF
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
PDF
Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Marine Host-Microbiome Interactions: Challenges and Opportunities
EVE 161 Winter 2018 Class 17
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
JIES 2014 - F. Lombard - Les données ne sont pas données
ISU ENVSCI690 Graduate Seminar Slides
2014 04-beiko-biology
Microbial Phylogenomics (EVE161) Class 7: rRNA PCR and Major Groups
Genome-scale phylogenomics
EVE 161 Lecture 6
Towards inferring the history of life in the presence of lateral gene transfe...
Is microbial ecology driven by roaming genes?
EveMicrobial Phylogenomics (EVE161) Class 9
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Genomic inference of the evolution of life history traits
EVE 161 Winter 2018 Class 10
UC Davis EVE161 Lecture 14 by @phylogenomics
EVE 161 Winter 2018 Class 15
Beiko cms final
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Ad

Similar to Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again (20)

PPTX
Big data nebraska
PPTX
2015 mcgill-talk
PPTX
Trends In Genomics
PDF
2015 Soil Science of America Meeting
PDF
Pathogen Genome Data
PPTX
Fundamentals of Analysis of Exomes
PPTX
2014 marine-microbes-grc
PPTX
2012 hpcuserforum talk
PPTX
Emerging challenges in data-intensive genomics
PPTX
2014 villefranche
PPT
Bioinformatics workshop presentation
PPTX
2014 mmg-talk
PPTX
ISB nov 2014
PPTX
The Chills and Thrills of Whole Genome Sequencing
PPTX
2014 whitney-public-talk
PDF
Marzillier_09052014.pdf
PDF
ppgardner-lecture03-genomesize-complexity.pdf
PDF
Methods to enhance the validity of precision guidelines emerging from big data
PPTX
2014 nyu-bio-talk
PPTX
Modern Biotechnology
Big data nebraska
2015 mcgill-talk
Trends In Genomics
2015 Soil Science of America Meeting
Pathogen Genome Data
Fundamentals of Analysis of Exomes
2014 marine-microbes-grc
2012 hpcuserforum talk
Emerging challenges in data-intensive genomics
2014 villefranche
Bioinformatics workshop presentation
2014 mmg-talk
ISB nov 2014
The Chills and Thrills of Whole Genome Sequencing
2014 whitney-public-talk
Marzillier_09052014.pdf
ppgardner-lecture03-genomesize-complexity.pdf
Methods to enhance the validity of precision guidelines emerging from big data
2014 nyu-bio-talk
Modern Biotechnology
Ad

More from Adina Chuang Howe (6)

PDF
Merrill Retreat 2018 - Nebraska City, Nebraska
PPTX
Adina's Faculty Introduction - ISU ABE
PPTX
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
PPTX
Metagenomic data analysis discussion NEON Workshop
PPT
ASM 2013 Metagenomic Assembly Workshop Slides
PPTX
EPA 2013 Air Sensors Meeting Big Data Talk
Merrill Retreat 2018 - Nebraska City, Nebraska
Adina's Faculty Introduction - ISU ABE
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
Metagenomic data analysis discussion NEON Workshop
ASM 2013 Metagenomic Assembly Workshop Slides
EPA 2013 Air Sensors Meeting Big Data Talk

Recently uploaded (20)

PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Institutional Correction lecture only . . .
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Pre independence Education in Inndia.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPH.pptx obstetrics and gynecology in nursing
Institutional Correction lecture only . . .
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
STATICS OF THE RIGID BODIES Hibbelers.pdf
Week 4 Term 3 Study Techniques revisited.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
RMMM.pdf make it easy to upload and study
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Pre independence Education in Inndia.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial disease of the cardiovascular and lymphatic systems
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again

  • 1. THERE AND BACK AGAIN 4th Annual BCB Symposium, March 30, 2018 Adina Howe Assistant Professors Department of Agricultural and Biosystems Engineering Iowa State University www.germslab.org NGS SEQUENCING
  • 2. HOW DID WE GET HERE
  • 3. Understanding community dynamics ¨ Who is there? ¨ What are they doing? ¨ How are they doing it? Kim Lewis, 2010
  • 4. Gene / Genome Sequencing ¨ Collect samples ¨ Extract DNA ¨ Sequence DNA ¨ “Analyze” DNA to identify its content and origin Taxonomy (e.g., pathogenic E. Coli) Function (e.g., degrades cellulose)
  • 5. Cost of Sequencing Stein, Genome Biology, 2010 E. Coli genome 4,500,000 bp ($4.5M, 1992) 1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012 Year 0.1 1 10 100 1,000 10,000 100,000 1,000,000 DNASequencing,Mbpper$ 10,000,000 100,000,000
  • 6. Rapidly decreasing costs with NGS Sequencing Stein, Genome Biology, 2010 Next Generation Sequencing 4,500,000 bp (E. Coli, $100, presently) 1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012 Year 0.1 1 10 100 1,000 10,000 100,000 1,000,000 DNASequencing,Mbpper$ 10,000,000 100,000,000
  • 7. Effects of low cost sequencing… First free-living bacterium sequenced for billions of dollars and years of analysis Personal genome can be mapped in a few days and hundreds of dollars
  • 8. The experimental continuum Single Isolate Pure Culture Enrichment Mixed Cultures Natural systems
  • 9. The era of big data in biology Stein, Genome Biology, 2010 Computational Hardware (doubling time 14 months) Sanger Sequencing (doubling time 19 months) NGS (Shotgun) Sequencing (doubling time 5 months) 1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 2012 Year 0 1 10 100 1,000 10,000 100,000 1,000,000 DiskStorage,Mb/$ 0.1 1 10 100 1,000 10,000 100,000 1,000,000 DNASequencing,Mbpper$ 10,000,000 100,000,000 0.1 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000
  • 10. Postdoc experience with data 2003-2008 Cumulative sequencing in PhD = 2000 bp 2008-2009 Postdoc Year 1 = 50 Gbp 2009-2010 Postdoc Year 2 = 450 Gbp 2014 = 50 Tbp today = 0.5 to 1 Tbp budgeted per project (PhD student)
  • 11. Tackling Soil Biodiversity Source: Chuck Haney C. Titus Brown, James Tiedje, Qingpeng Zhang, Jason Pell (MSU) Janet Jansson, Susannah Tringe (JGI)
  • 13. THE DIRT ON SOIL Biodiversity in the dark, Wall et al., Nature Geoscience, 2010 Jeremy Burgress MAGNIFICENT BIODIVERSITY
  • 14. THE DIRT ON SOIL SPATIAL HETEROGENEITY http://www.fao.org/ www.cnr.uidaho.edu
  • 15. THE DIRT ON SOIL DYNAMIC
  • 16. THE DIRT ON SOIL INTERACTIONS: BIOTIC, ABIOTIC, ABOVE, BELOW, SCALES Philippot, 2013, Nature Reviews Microbiology
  • 18. de novo assembly vCompresses dataset size significantly vImproved data quality (longer sequences, gene order) vReference not necessary (novelty) Raw sequencing data (“reads”) Computational algorithms Informative genes / genomes
  • 20. Shotgun sequencing and de novo assembly It was the Gest of times, it was the wor , it was the worst of timZs, it was the isdom, it was the age of foolisXness , it was the worVt of times, it was the mes, it was Ahe age of wisdom, it was th It was the best of times, it Gas the wor mes, it was the age of witdom, it was th isdom, it was tIe age of foolishness It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness
  • 21. Practical Challenges – Intensive computing Howe et al, 2014, PNAS Months of “computer crunching” on a super computer
  • 22. Practical Challenges – Intensive computing Howe et al, 2014, PNAS Months of “computer crunching” on a super computer Assembly of 300 Gbp (70,000 genomes worth) can be done with any assembly program in less than 14 GB RAM and less than 24 hours. 50 Gbp = 10,000 genomes
  • 23. Four main challenges for de novo sequencing of metagenomes. ¨ Repeats. ¨ Low coverage. ¨ Errors - these introduce breaks in the construction of contigs. ¨ Variation in coverage – transcriptomes and metagenomes (PCR amplification biases) Challenge: assembler must distinguish between erroneous connections and real connections.
  • 24. Natural community characteristics u Diverse è Many organisms (genomes)
  • 25. Natural community characteristics u Diverse è Many organisms (genomes) u Variable abundance è Most abundant organisms, sampled more often è Assembly requires a minimum amount of sampling è More sequencing, more errors Sample 1x
  • 26. Natural community characteristics u Diverse è Many organisms (genomes) u Variable abundance è Most abundant organisms, sampled more often è Assembly requires a minimum amount of sampling è More sequencing, more errors Sample 1x Sample 10x
  • 27. Natural community characteristics u Diverse è Many organisms (genomes) u Variable abundance è Most abundant organisms, sampled more often è Assembly requires a minimum amount of sampling è More sequencing, more errors Sample 1x Sample 10x Overkill
  • 28. Digital normalization True sequence (unknown) Reads (randomly sequenced) Brown et al., 2012, arXiv Howe et al., 2014, PNAS Zhang et al., 2014, PLOS One Titus Brown , Jason Pell, Qingpeng Zhang
  • 29. Digital normalization True sequence (unknown) Reads (randomly sequenced) X Brown et al., 2012, arXiv Howe et al., 2014, PNAS Zhang et al., 2014, PLOS One Titus Brown , Jason Pell, Qingpeng Zhang
  • 30. Digital normalization True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X X X Brown et al., 2012, arXiv Howe et al., 2014, PNAS Zhang et al., 2014, PLOS One X = Sequencing errors L Titus Brown , Jason Pell, Qingpeng Zhang
  • 31. Digital normalization True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X X X Brown et al., 2012, arXiv Howe et al., 2014, PNAS Zhang et al., 2014, PLOS One Titus Brown , Jason Pell, Qingpeng Zhang
  • 32. Digital normalization True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X If next read is from a high coverage region - discard X X Brown et al., 2012, arXiv Howe et al., 2014, PNAS Zhang et al., 2014, PLOS One Titus Brown , Jason Pell, Qingpeng Zhang
  • 33. Digital normalization True sequence (unknown) Reads (randomly sequenced) X X X X X X X X X X X X X X X X X X X X X X X X Redundant reads (not needed for assembly) Brown et al., 2012, arXiv Howe et al., 2014, PNAS Pell et al, 2012, PNAS Zhang et al., 2014, PLOS One v Scales datasets for assembly up to 95% - same assembly outputs. v Genomes, mRNA-seq, metagenomes (soils, gut, water)
  • 34. Tackling Soil Biodiversity Source: Chuck Haney C. Titus Brown, James Tiedje, Qingpeng Zhang, Jason Pell (MSU) Janet Jansson, Susannah Tringe (JGI)
  • 36. More like… Source: Chuck HaneyHowe et. al, 2014, PNAS
  • 37. The Future ¨ More data, more samples, better references ¨ Expense will be in sampling – not sequencing or even data analysis ¨ All biologists will need to know how to use a pipette and write computer programs ¨ Large-scale, collaborative projects rather than single PI efforts
  • 38. Soils @ ISU ¨ CABBI DOE Bioenergy Research Center / ISU: Can we predict where to grow the best bioenergy crops? How come some bioenergy crops are so good at finding nitrogen and some not? ¨ USDA ARS / ISU : What happens during composting? Does the compost source matter? What is the timing of nutrients? Can we manage it? ¨ USDA ARS / ISU : Soil as an environmental pathway: What happens in the soil in traditionally managed agriculture? Manure-amended soils?
  • 39. PART II. Two blog posts on this: adina.github.io “Job Hunt Wrap Up” “My job application to ISU”
  • 40. Background Purdue University, BSME, Mechanical Engineering Purdue University, MS, Environmental Engineering (Sustainability)
  • 41. Background Purdue University, BSME, Mechanical Engineering Purdue University, MS, Environmental Engineering (Sustainability) University of Iowa, PhD, Environmental Engineering (Microbiology/Bioremediation)
  • 42. Background Purdue University, BSME, Mechanical Engineering Purdue University, MS, Environmental Engineering (Sustainability) University of Iowa, PhD, Environmental Engineering (Microbiology/Bioremediation) Michigan State University NSF Postdoc Math and Biology Fellow (cross-training) Microbial Ecology (Jim Tiedje) Bioinformatics (Titus Brown)
  • 43. Background Purdue University, BSME, Mechanical Engineering Purdue University, MS, Environmental Engineering (Sustainability) University of Iowa, PhD, Environmental Engineering (Microbiology/Bioremediation) Michigan State University NSF Postdoc Math and Biology Fellow (cross-training) Microbial Ecology (Jim Tiedje) Bioinformatics (Titus Brown) Computational Biologist Microbiology / Microbial Ecology
  • 44. Business vs. govt vs. academics
  • 47. What I turned to…
  • 48. Three main questions ¨ When do I start looking for jobs? ¨ Where do I look for jobs? ¨ What should I say about myself?
  • 49. Job Hunt Statistics ¨ 2 Postdocs ¨ 2 first author papers ¨ 3 non-first author papers ¨ $3 million grant funded (co-PI) ¨ $325K grants historically (PI) ¨ Backup plan – full staff scientist at Argonne National Laboratory
  • 50. Job hunt statistics (cont.) ¨ Environmental engineering, agricultural engineering, biology, microbiology, ecology and evolution ¨ 30 applications ¤ Prioritized < 10 applications, 3 versions ¤ Research statement, teaching statement, CV, references ¨ 5 on-site interviews (see blog for questions I was asked) ¨ 1 phone interview ¨ 8 rejection communications (dream offer tangent…) ¨ 2 offers
  • 51. Evaluate yourself. Pros ¨ Skill set unique (for now) ¨ Diverse systems ¨ Engr / Biology / Ecology ¨ Comfortable ¨ Great reference writers ¤ Note to talk about reference writer packet Cons ¨ Defining “who” I am ¨ Publication productivity ¨ Husband required
  • 52. Want it. ¨ Personal Impact ¤ Research ¤ Mentorship ¨ Enjoy what you’re selling ¤ Grant writing ¤ Publishing ¤ Presenting ¨ No man is an island
  • 53. Believe it. ¨ I am very good at what I do. ¨ My work deserves to be funded. ¨ No imposter syndrom-ing during application writing
  • 54. Prioritize the effort. ¨ Job hunting requires: ¤ Mood ¤ Time ¤ Emotions ¤ Focus ¨ You have another job ¨ Wrote applications on weekend (half day)
  • 55. Only apply if you can imagine yourself accepting the offer.
  • 56. Stay organized ¨ Addressees ¨ Specifics ¨ Deadlines ¨ References ¨ Save postings & application
  • 57. Be positive. ¨ It is about fit. ¨ Many factors you cannot control. ¨ This is not an evaluation of your worth. ¨ Resilience is key.
  • 58. Let it be “Out the door”
  • 59. Manage your own expectations. ¨ Know myself. ¨ Let myself win & lose. ¨ Get jazzed during the interview. ¨ Find nice, hardy laughing people. ¨ Find really smart people that would help mentor me.. ¨ Work hard but find balance. ¨ Dream it.
  • 60. The ISU job posting
  • 62. Thank you! BCB: Shane Dooley Schuyler Smith Paul Villenueva www.germslab.org