SlideShare a Scribd company logo
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen
t-test
ANOVA
normal distribution
useful tests
counts
contingency table
Jensen et al., Nature Reviews Genetics, 2012
Fisher’s exact test
real numbers
no theoretical distribution
non-parametric statistics
do the medians differ?
Mann–Whitney U test
medians can mislead you
do the distributions differ?
Kolmogorov–Smirnov test
Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens
does not tell how they differ
resampling
Monte Carlo testing
Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens
always applicable
compute intensive
multiple testing
xkcd.com
xkcd.com
xkcd.com
xkcd.com
compare multiple condition
Gene Ontology enrichment
Bonferroni
avoid making any errors
too conservative
Benjamini–Hochberg
control false discovery rate
assumes independence
resampling
negative set
systematic biases
Huang et al., Journal of Proteome Research, 2014
studiedness bias
we study disease proteins
thus we know many PTMs
abundance bias
higher expressed
easier to detect in assays
better characterized
matched background
the big data effect
if you have enough data
any difference is significant
but maybe not relevant
biases become significant
“significant”
statistical significance
p-value
biological relevance
effect size
significant and relevant
volcano plots
Lundby et al., Science Signaling, 2013
rather ad hoc
questions?

More Related Content

PPT
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
PDF
Unifying Genomics, Phenomics, and Environments
PDF
littenberg-strep
PPT
Cómo distinguir una investigación seria de una fraudulenta
PDF
Prediction research in a pandemic: 3 lessons from a living systematic review ...
DOCX
Print
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Unifying Genomics, Phenomics, and Environments
littenberg-strep
Cómo distinguir una investigación seria de una fraudulenta
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Print

What's hot (19)

PPTX
Responsible Conduct of Research
PPTX
Ethics and Stem Cells
PPTX
Genomics privacy
PDF
Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...
PDF
Glyn Elwyn, Shared Decision Making... a dangerous idea
PPTX
How To Lie With Statistics Chapter 10
PPTX
Secondary Data Analysis
PPT
Covering Screening Tests: Do No Harm (As A Reporter)
ODP
ConstructPrecisePhenotypesBigDataChallenge
PDF
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
PPTX
Big Data: Learning from MIMIC- Celi
PDF
Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium
PPT
Clinical Research Issues
PPTX
BYO App: Announcing Linq from Open mHealth
PPT
Collin O´Neil MedicReS 5th World Congress 2015
PPT
PT 610: EBP and Information Management
PPTX
Principles of data_science
PPTX
Share & Flourish workshop, Leiden, August 2014
PDF
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Responsible Conduct of Research
Ethics and Stem Cells
Genomics privacy
Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...
Glyn Elwyn, Shared Decision Making... a dangerous idea
How To Lie With Statistics Chapter 10
Secondary Data Analysis
Covering Screening Tests: Do No Harm (As A Reporter)
ConstructPrecisePhenotypesBigDataChallenge
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Big Data: Learning from MIMIC- Celi
Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium
Clinical Research Issues
BYO App: Announcing Linq from Open mHealth
Collin O´Neil MedicReS 5th World Congress 2015
PT 610: EBP and Information Management
Principles of data_science
Share & Flourish workshop, Leiden, August 2014
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Ad

Similar to Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens (20)

PDF
Bioinformatics Strategies for Exposome 100416
PPT
Day2 145pm Crawford
PDF
Critical appraisal: How to read a scientific paper?
PDF
Informatics and data analytics to support for exposome-based discovery
PPTX
UAB Pulmonary board review study design and statistical principles
PPTX
Depersonalising medicine
PPTX
Montgomery expression
PPT
02 Study Designs - Research Methodology Workshop - Aug 2011.ppt
PPT
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
PPTX
Research by MAGIC
PDF
Data analytics to support exposome research course slides
PDF
Duzkale_2013_Variant Interpretation_
PDF
Methods to enhance the validity of precision guidelines emerging from big data
PPT
Large-scale biomedical data and text integration
PPT
Diabetes Systems Biology And Genetics V6
PPTX
Basics of Research and Bias
PPT
Surviving statistics lecture 1
PPTX
Overview of different statistical tests used in epidemiological
PPTX
Biomarkers for psychological phenotypes?
PDF
Repurposing large datasets for exposomic discovery in disease
Bioinformatics Strategies for Exposome 100416
Day2 145pm Crawford
Critical appraisal: How to read a scientific paper?
Informatics and data analytics to support for exposome-based discovery
UAB Pulmonary board review study design and statistical principles
Depersonalising medicine
Montgomery expression
02 Study Designs - Research Methodology Workshop - Aug 2011.ppt
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Research by MAGIC
Data analytics to support exposome research course slides
Duzkale_2013_Variant Interpretation_
Methods to enhance the validity of precision guidelines emerging from big data
Large-scale biomedical data and text integration
Diabetes Systems Biology And Genetics V6
Basics of Research and Bias
Surviving statistics lecture 1
Overview of different statistical tests used in epidemiological
Biomarkers for psychological phenotypes?
Repurposing large datasets for exposomic discovery in disease
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
PPT
The Art of Counting: Scoring and ranking co-occurrences in literature
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization
The Art of Counting: Scoring and ranking co-occurrences in literature

Recently uploaded (20)

PPTX
famous lake in india and its disturibution and importance
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
protein biochemistry.ppt for university classes
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
neck nodes and dissection types and lymph nodes levels
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Microbiology with diagram medical studies .pptx
famous lake in india and its disturibution and importance
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Comparative Structure of Integument in Vertebrates.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
The KM-GBF monitoring framework – status & key messages.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
protein biochemistry.ppt for university classes
Placing the Near-Earth Object Impact Probability in Context
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
AlphaEarth Foundations and the Satellite Embedding dataset
INTRODUCTION TO EVS | Concept of sustainability
microscope-Lecturecjchchchchcuvuvhc.pptx
An interstellar mission to test astrophysical black holes
Cell Membrane: Structure, Composition & Functions
neck nodes and dissection types and lymph nodes levels
Viruses (History, structure and composition, classification, Bacteriophage Re...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Microbiology with diagram medical studies .pptx

Statistics on big biomedical data: Methods and pitfalls when analyzing high-throughput screens