SlideShare a Scribd company logo
X-team #2
High Dimensional
Biological Butterflies
Data Science Workshop 2015
What do we have in common?
High-dimensional biological data
● High-throughput genotyping and phenotyping
● Finding biological meaning in big data with
high N and/or P
The ability to harvest the wealth of information contained in
biomedical Big Data will advance our understanding of
human health and disease; however, lack of appropriate
tools, poor data accessibility, and insufficient training, are
major impediments to rapid translational impact. -NIH BD2K
Data integration
● Data fragmentation
o individual vs population
o multiple -omics
o multiple sources
● Discovery and prediction
o genome and functional
annotation
Statistical learning
methods
● Data quality
○ hidden sources of variability
○ limitations of short read
sequencing
Data annotation
Genome assembly/error
correction
Problem Solution
Success Stories
Domain Science Data Science Methods
Metabolic pathway - Ingenuity Pathway Analysis (http://www.ingenuity.com/products/ipa)
Genomic data - Quality Control
- FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- EasyQC for genome-wide association meta-analyses
(http://www.nature.com/nprot/journal/v9/n5/full/nprot.2014.071.html)
- Batch effect
- PEER (http://www.ncbi.nlm.nih.gov/pubmed/22343431)
- SVA (http://www.ncbi.nlm.nih.gov/pubmed/22257669)
- scLVM (Buettner et al., 2015)
- Data storage and sharing
- NCBI (http://www.ncbi.nlm.nih.gov)
- GitHub (https://github.com)
- UCSC genome browser (http://genome.ucsc.edu/)
- Gene annotation
- Gene Ontology (http://geneontology.org/page/documentation)
Proteomics - Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do)
Disease Survivability - WEKA (Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten
(2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.)
Same data, different interpretation
Gilad & Mizrahi-Man 2015
F1000Research, 4:121
Interdisciplinary
Research
Interdisciplinary data science essentials
Going Forward
● Create and maintain a HowTo website for
Data Science computational tools and
methods.
http://data-science-for-biologists.wikia.com/wiki/Data_Science_for_Biologists_Wikia
● Collaborate via Github
Thanks!

More Related Content

PDF
Considerations and challenges in building an end to-end microbiome workflow
PDF
Digital transformation of translational medicine
PDF
Validating microbiome claims – including the latest DNA techniques
PDF
Expert Panel on Data Challenges in Translational Research
PPTX
Why should researchers care about data curation?
PPTX
Next Gen Sequencing and Associated Big Data / AI problem
PDF
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
PPT
Connecting data across our clinical data warehouses: UC-Research eXchange (UC...
Considerations and challenges in building an end to-end microbiome workflow
Digital transformation of translational medicine
Validating microbiome claims – including the latest DNA techniques
Expert Panel on Data Challenges in Translational Research
Why should researchers care about data curation?
Next Gen Sequencing and Associated Big Data / AI problem
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Connecting data across our clinical data warehouses: UC-Research eXchange (UC...

What's hot (20)

PPTX
Precision Medicine enabling tools are not just NGS
PDF
Beyond Proofs of Concept for Biomedical AI
PDF
Quality analysis of NSF DMP plans - Wayne State University
PDF
Brazil-UK Frontiers of Engineering - Big data in healthcare session
PDF
Application of blockchain technology in healthcare and biomedicine
PPTX
AI in translational medicine webinar
PPTX
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
PDF
David Tyrpak CV
PDF
Pine.Bio slide deck - Idea Village CAPITALx (New Orleans Entrepreneur Week 2017)
PPTX
cBioPortal Webinar Slides (2/3)
PDF
Data for AI models, the past, the present, the future
PDF
BigDataAnalytics_Talk_KOCH_FINAL
PPTX
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
PDF
Omics Logic Genomics Program
PPTX
Data Commons & Data Science Workshop
PDF
Pine Biotech
PPTX
Lecture 9C
PPTX
MPS webinar master deck
PPTX
NCI Support for Cancer Data Sharing
PDF
Data Science Coursera 8N8VM4AGNDL7
Precision Medicine enabling tools are not just NGS
Beyond Proofs of Concept for Biomedical AI
Quality analysis of NSF DMP plans - Wayne State University
Brazil-UK Frontiers of Engineering - Big data in healthcare session
Application of blockchain technology in healthcare and biomedicine
AI in translational medicine webinar
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
David Tyrpak CV
Pine.Bio slide deck - Idea Village CAPITALx (New Orleans Entrepreneur Week 2017)
cBioPortal Webinar Slides (2/3)
Data for AI models, the past, the present, the future
BigDataAnalytics_Talk_KOCH_FINAL
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Omics Logic Genomics Program
Data Commons & Data Science Workshop
Pine Biotech
Lecture 9C
MPS webinar master deck
NCI Support for Cancer Data Sharing
Data Science Coursera 8N8VM4AGNDL7
Ad

Similar to X team 2 - presentation (20)

PPTX
Bhasha_Bandhu_Sample_presentation_2.pptxFESGEWGASGASFASFASFAS
PPTX
2016 davis-biotech
PPTX
Importance of Data Preparation and Exploration.pptx
PPTX
2016 bergen-sars
PDF
BigData in Life Sciences, Genomics and Systems Biology
PDF
Deep learning for biomedical discovery and data mining I
PPTX
2014 aus-agta
PDF
Zen and the Art of Data Science Maintenance
PDF
Big Data in Omics and Imaging Association Analysis 1st Edition Momiao Xiong
PPTX
Data analysis & integration challenges in genomics
PPTX
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
PDF
User-friendly bioinformatics (Monthly Informational workshop)
PPTX
Data analysis patterns, tools and data types in genomics
PPTX
2016 davis-plantbio
PDF
Basics of Data Analysis in Bioinformatics
PDF
Dmla0910 – Hoeck– Presentation
PDF
Big Medical Data – Challenge or Potential?
PDF
Improving Knowledge Discovery Through Development of Big Data to Knowledge S...
PPTX
Supporting researchers in the molecular life sciences Jeff Christiansen
PDF
Big Data in Genomics: Opportunities and Challenges
Bhasha_Bandhu_Sample_presentation_2.pptxFESGEWGASGASFASFASFAS
2016 davis-biotech
Importance of Data Preparation and Exploration.pptx
2016 bergen-sars
BigData in Life Sciences, Genomics and Systems Biology
Deep learning for biomedical discovery and data mining I
2014 aus-agta
Zen and the Art of Data Science Maintenance
Big Data in Omics and Imaging Association Analysis 1st Edition Momiao Xiong
Data analysis & integration challenges in genomics
Atul Butte's presentation to the Association of Medical School Pediatric Depa...
User-friendly bioinformatics (Monthly Informational workshop)
Data analysis patterns, tools and data types in genomics
2016 davis-plantbio
Basics of Data Analysis in Bioinformatics
Dmla0910 – Hoeck– Presentation
Big Medical Data – Challenge or Potential?
Improving Knowledge Discovery Through Development of Big Data to Knowledge S...
Supporting researchers in the molecular life sciences Jeff Christiansen
Big Data in Genomics: Opportunities and Challenges
Ad

More from Rayna Harris (6)

PDF
Hippocampal transcriptomic responses to technical and biological perturbations
PDF
Version Control with GitHub for Bioinformatics
PPTX
Time and Money: Techniques for Neural Gene Expression Profiling
PPTX
Toward Single Neuron Gene Expression Analysis for Studying Behavior
PPTX
Evolution of Social Brains
PPTX
Neurobiology of Social Sensory Integration and Behavior
Hippocampal transcriptomic responses to technical and biological perturbations
Version Control with GitHub for Bioinformatics
Time and Money: Techniques for Neural Gene Expression Profiling
Toward Single Neuron Gene Expression Analysis for Studying Behavior
Evolution of Social Brains
Neurobiology of Social Sensory Integration and Behavior

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Foundation of Data Science unit number two notes
PPT
Quality review (1)_presentation of this 21
PPTX
Moving the Public Sector (Government) to a Digital Adoption
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu
Business Acumen Training GuidePresentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IB Computer Science - Internal Assessment.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Foundation of Data Science unit number two notes
Quality review (1)_presentation of this 21
Moving the Public Sector (Government) to a Digital Adoption

X team 2 - presentation

  • 1. X-team #2 High Dimensional Biological Butterflies Data Science Workshop 2015
  • 2. What do we have in common?
  • 3. High-dimensional biological data ● High-throughput genotyping and phenotyping ● Finding biological meaning in big data with high N and/or P
  • 4. The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact. -NIH BD2K
  • 5. Data integration ● Data fragmentation o individual vs population o multiple -omics o multiple sources ● Discovery and prediction o genome and functional annotation Statistical learning methods ● Data quality ○ hidden sources of variability ○ limitations of short read sequencing Data annotation Genome assembly/error correction Problem Solution
  • 6. Success Stories Domain Science Data Science Methods Metabolic pathway - Ingenuity Pathway Analysis (http://www.ingenuity.com/products/ipa) Genomic data - Quality Control - FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - EasyQC for genome-wide association meta-analyses (http://www.nature.com/nprot/journal/v9/n5/full/nprot.2014.071.html) - Batch effect - PEER (http://www.ncbi.nlm.nih.gov/pubmed/22343431) - SVA (http://www.ncbi.nlm.nih.gov/pubmed/22257669) - scLVM (Buettner et al., 2015) - Data storage and sharing - NCBI (http://www.ncbi.nlm.nih.gov) - GitHub (https://github.com) - UCSC genome browser (http://genome.ucsc.edu/) - Gene annotation - Gene Ontology (http://geneontology.org/page/documentation) Proteomics - Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do) Disease Survivability - WEKA (Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.)
  • 7. Same data, different interpretation Gilad & Mizrahi-Man 2015 F1000Research, 4:121
  • 9. Going Forward ● Create and maintain a HowTo website for Data Science computational tools and methods. http://data-science-for-biologists.wikia.com/wiki/Data_Science_for_Biologists_Wikia ● Collaborate via Github

Editor's Notes

  • #3: Half are domain scientists and half are more computationally inclined. Made this word cloud from out notes. Data. Comp bio. Disease. Genetics. Integrative anlyses.. Disease spread. Social environment and epigenetics. Data privacy, data sharing, and computational genetics. Genetic and Proteomics and statistical tool to understand disease and cancer or individual phenotypic variation Tool development. RNAseq technology and applications tools for data reduction and variable selection.
  • #4: S
  • #7: predicting disease survivability for breast cancer patients Famous example: Potential flaws in genomics paper scrutinized on Twitter:http://www.nature.com/news/potential-flaws-in-genomics-paper-scrutinized-on-twitter-1.17591