SlideShare a Scribd company logo
Discussion on Metagenomic Data
for NEON Workshop
Adina Howe
July 15, 2014
Boulder, CO
What I do with my sequencing data
• Compare it to known references (BLAST, alignment
mapping, HMMs)
– Database gaps, 50%+ in soil metagenomes unknown
– Annotation errors
• Assembly (site specific reference)
– Dependent on sequencing depth
– Largest soil datasets, 1 billion reads, 10% assembled
• Co-occurrence / binning
– Abundance or nucleotide based
– Sequencing coverage dependent, need “unique” bins
– Particularly problematic for short reads
Sentinels in metagenomic data?
• Presence/Absence? Quantification?
• Phyla? Strain? Functional level?
• Strong signals of similarity or differences in
both “annotatable” and unknown sequences
• Information about targets for which we need
more references (by other methods)
• Indexes (alpha diversity,
copiotroph/oligotroph, % polymorphism,
recA/16S gene count)
What I do with my sequencing data
• Compare it to known references (BLAST,
alignment mapping, HMMs)
– Database gaps, 50%+ in soil metagenomes unknown
– Annotation errors
• Assembly (site specific reference)
– Well developed pipelines and tutorials
– Almost too many choices
• Co-occurrence / binning
– Some tools available, still in development largely
– Eventually need to link to knowns for validation (or
experimental validation)
What has been done – MG-RAST
• Free
• Easy
• Growing in usage by many
• Not used by the “leading wedge” (at least as
more than a blast engine)
• Free data / annotation storage and maintenance
• NEON – do you want to build your own? Does
MG-RAST suffice?
MG-RAST
Metagenomic data analysis discussion NEON Workshop
Metagenomic data analysis discussion NEON Workshop
Metagenomic data analysis discussion NEON Workshop
Fine tuning is not possible
• What happens when your sequence matches two known genes
equally?
• What happens if there are multiple taxa related to one function?
• What if you want more than ordination (e.g., primer evaluation?)
• What if you want to change your distance matrix for ordination?
• How do (not) you normalize / rarify your data?
• What if you want more sensitivity than a BLAT analysis (e.g.,
amplicons)
• Computationally-optimized, not biologically
– Denitrification, Nitrate reductase, nitrate ammonification all distinct
categories in Level 3
– If you want to merge, not trivial – especially in dealing with abundance
estimations
Looking forward
• Development of all-pleasing computational,
statistical, visualization tool will be impossible
• NEON community demands will more than likely
be easy to use and aimed at reducing (perhaps
overly simplified) complex data into comparable
metrics (TBD)
• NEON data is actually not a big dataset and
current tools should (imho) suffice
• Data management and associated “metadata”
may be the larger challenge (in the face of rapidly
changing datasets)
http://tamingdata.com/wp-content/uploads/2010/07/tree-swing-project-management-large.png

More Related Content

Similar to Metagenomic data analysis discussion NEON Workshop (20)

PPTX
2020 02 11_biological_databases_part1
PPTX
2017 biological databases_part1_vupload
PPTX
Interpreting transcriptomics (ers berlin 2017)
PPTX
introductiontodatabases-210511074114.pptx
PPTX
Metabolomic data analysis and visualization tools
PPT
Basic Local Alignment Tool (BLAST) bioinformatics
PPTX
Database Searching
PPTX
EiTESAL eHealth Conference 14&15 May 2017
PPTX
GLBIO/CCBC Metagenomics Workshop
PPTX
Semi-automated Exploration and Extraction of Data in Scientific Tables
PPT
Bioinformatics detailed explaination with diagrams
PPTX
BLAST AND FASTA.pptx12345789999987544321234
PPTX
2018 02 20_biological_databases_part1_v_upload
PDF
Basic BLAST (BLASTn)
PPT
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
PDF
Introduction to 16S rRNA gene multivariate analysis
PPTX
Big data Intro - Presentation to OCHackerz Meetup Group
PPTX
2015 NISO Forum: The Future of Library Resource Discovery
PPTX
Databases_L2.pptx
PPTX
2014 aus-agta
2020 02 11_biological_databases_part1
2017 biological databases_part1_vupload
Interpreting transcriptomics (ers berlin 2017)
introductiontodatabases-210511074114.pptx
Metabolomic data analysis and visualization tools
Basic Local Alignment Tool (BLAST) bioinformatics
Database Searching
EiTESAL eHealth Conference 14&15 May 2017
GLBIO/CCBC Metagenomics Workshop
Semi-automated Exploration and Extraction of Data in Scientific Tables
Bioinformatics detailed explaination with diagrams
BLAST AND FASTA.pptx12345789999987544321234
2018 02 20_biological_databases_part1_v_upload
Basic BLAST (BLASTn)
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
Introduction to 16S rRNA gene multivariate analysis
Big data Intro - Presentation to OCHackerz Meetup Group
2015 NISO Forum: The Future of Library Resource Discovery
Databases_L2.pptx
2014 aus-agta
Ad

More from Adina Chuang Howe (13)

PDF
Merrill Retreat 2018 - Nebraska City, Nebraska
PDF
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
PDF
2015 Soil Science of America Meeting
PDF
ISU ENVSCI690 Graduate Seminar Slides
PPTX
Job Talk Iowa State University Ag Bio Engineering
PPTX
Adina's Faculty Introduction - ISU ABE
PPTX
Sweden_eemis_big_data
PPTX
Big data nebraska
PPTX
Big data nebraska
PPTX
Big Data Field Museum
PPTX
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
PPT
ASM 2013 Metagenomic Assembly Workshop Slides
PPTX
EPA 2013 Air Sensors Meeting Big Data Talk
Merrill Retreat 2018 - Nebraska City, Nebraska
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
2015 Soil Science of America Meeting
ISU ENVSCI690 Graduate Seminar Slides
Job Talk Iowa State University Ag Bio Engineering
Adina's Faculty Introduction - ISU ABE
Sweden_eemis_big_data
Big data nebraska
Big data nebraska
Big Data Field Museum
ANL Soil Metagenomics 2014 Soil Reference Database - Let's do this
ASM 2013 Metagenomic Assembly Workshop Slides
EPA 2013 Air Sensors Meeting Big Data Talk
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
1_Introduction to advance data techniques.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Lecture1 pattern recognition............
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
1_Introduction to advance data techniques.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Clinical guidelines as a resource for EBP(1).pdf
ISS -ESG Data flows What is ESG and HowHow
Introduction-to-Cloud-ComputingFinal.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to Knowledge Engineering Part 1
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
Qualitative Qantitative and Mixed Methods.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
IB Computer Science - Internal Assessment.pptx

Metagenomic data analysis discussion NEON Workshop

  • 1. Discussion on Metagenomic Data for NEON Workshop Adina Howe July 15, 2014 Boulder, CO
  • 2. What I do with my sequencing data • Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors • Assembly (site specific reference) – Dependent on sequencing depth – Largest soil datasets, 1 billion reads, 10% assembled • Co-occurrence / binning – Abundance or nucleotide based – Sequencing coverage dependent, need “unique” bins – Particularly problematic for short reads
  • 3. Sentinels in metagenomic data? • Presence/Absence? Quantification? • Phyla? Strain? Functional level? • Strong signals of similarity or differences in both “annotatable” and unknown sequences • Information about targets for which we need more references (by other methods) • Indexes (alpha diversity, copiotroph/oligotroph, % polymorphism, recA/16S gene count)
  • 4. What I do with my sequencing data • Compare it to known references (BLAST, alignment mapping, HMMs) – Database gaps, 50%+ in soil metagenomes unknown – Annotation errors • Assembly (site specific reference) – Well developed pipelines and tutorials – Almost too many choices • Co-occurrence / binning – Some tools available, still in development largely – Eventually need to link to knowns for validation (or experimental validation)
  • 5. What has been done – MG-RAST • Free • Easy • Growing in usage by many • Not used by the “leading wedge” (at least as more than a blast engine) • Free data / annotation storage and maintenance • NEON – do you want to build your own? Does MG-RAST suffice?
  • 10. Fine tuning is not possible • What happens when your sequence matches two known genes equally? • What happens if there are multiple taxa related to one function? • What if you want more than ordination (e.g., primer evaluation?) • What if you want to change your distance matrix for ordination? • How do (not) you normalize / rarify your data? • What if you want more sensitivity than a BLAT analysis (e.g., amplicons) • Computationally-optimized, not biologically – Denitrification, Nitrate reductase, nitrate ammonification all distinct categories in Level 3 – If you want to merge, not trivial – especially in dealing with abundance estimations
  • 11. Looking forward • Development of all-pleasing computational, statistical, visualization tool will be impossible • NEON community demands will more than likely be easy to use and aimed at reducing (perhaps overly simplified) complex data into comparable metrics (TBD) • NEON data is actually not a big dataset and current tools should (imho) suffice • Data management and associated “metadata” may be the larger challenge (in the face of rapidly changing datasets)

Editor's Notes

  • #3: Who is important and how can I reduce the noise?
  • #5: Who is important and how can I reduce the noise?