SlideShare a Scribd company logo
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen
t-test
ANOVA
normal distribution
useful tests
counts
contingency table
Jensen et al., Nature Reviews Genetics, 2012
Fisher’s exact test
real numbers
no theoretical distribution
non-parametric statistics
do the medians differ?
Mann–Whitney U test
medians can mislead you
do the distributions differ?
Kolmogorov–Smirnov test
Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens
does not tell how they differ
resampling
Monte Carlo testing
Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens
always applicable
compute intensive
multiple testing
xkcd.com
xkcd.com
xkcd.com
xkcd.com
compare multiple condition
Gene Ontology enrichment
Bonferroni
avoid making any errors
too conservative
Benjamini–Hochberg
control false discovery rate
assumes independence
resampling
negative set
systematic biases
Huang et al., Journal of Proteome Research, 2014
studiedness bias
we study disease proteins
thus we know many PTMs
abundance bias
higher expressed
easier to detect in assays
better characterized
matched background
the big data effect
if you have enough data
any difference is significant
but maybe not relevant
biases become significant
“significant”
statistical significance
p-value
biological relevance
fold change
relative risk
significant and relevant
volcano plots
Lundby et al., Science Signaling, 2013
rather ad hoc
questions?

More Related Content

PPT
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
PDF
Unifying Genomics, Phenomics, and Environments
PDF
littenberg-strep
PPT
Cómo distinguir una investigación seria de una fraudulenta
PDF
Prediction research in a pandemic: 3 lessons from a living systematic review ...
DOCX
Print
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Unifying Genomics, Phenomics, and Environments
littenberg-strep
Cómo distinguir una investigación seria de una fraudulenta
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Print

What's hot (19)

PPTX
Responsible Conduct of Research
PPTX
Ethics and Stem Cells
PPTX
Genomics privacy
PDF
Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...
PDF
Glyn Elwyn, Shared Decision Making... a dangerous idea
PPTX
How To Lie With Statistics Chapter 10
PPTX
Secondary Data Analysis
PPT
Covering Screening Tests: Do No Harm (As A Reporter)
ODP
ConstructPrecisePhenotypesBigDataChallenge
PDF
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
PPTX
Big Data: Learning from MIMIC- Celi
PDF
Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium
PPT
Clinical Research Issues
PPTX
BYO App: Announcing Linq from Open mHealth
PPT
Collin O´Neil MedicReS 5th World Congress 2015
PPT
PT 610: EBP and Information Management
PPTX
Principles of data_science
PPTX
Share & Flourish workshop, Leiden, August 2014
PDF
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Responsible Conduct of Research
Ethics and Stem Cells
Genomics privacy
Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...
Glyn Elwyn, Shared Decision Making... a dangerous idea
How To Lie With Statistics Chapter 10
Secondary Data Analysis
Covering Screening Tests: Do No Harm (As A Reporter)
ConstructPrecisePhenotypesBigDataChallenge
Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...
Big Data: Learning from MIMIC- Celi
Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium
Clinical Research Issues
BYO App: Announcing Linq from Open mHealth
Collin O´Neil MedicReS 5th World Congress 2015
PT 610: EBP and Information Management
Principles of data_science
Share & Flourish workshop, Leiden, August 2014
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Ad

Viewers also liked (12)

PDF
Euro90_en
PPTX
Green Campus Challenge
PDF
Tribunal buletin volume 2 of 2015
DOC
διαγωνισμα α τετραμηνου στην ειδηση
PDF
กิจกรรมที่ 1
PDF
Maximize Your Membership
PPT
Modern Purchasing Roles
PPT
Biomedical text mining and network analysis
DOCX
Poder ciudadano en venezuela
PDF
Competenze emotive di un coach. L'empatia nel processo di focusing
PPTX
Constructivismo y Educación según Mario Carretero por Katherine Semanate
Euro90_en
Green Campus Challenge
Tribunal buletin volume 2 of 2015
διαγωνισμα α τετραμηνου στην ειδηση
กิจกรรมที่ 1
Maximize Your Membership
Modern Purchasing Roles
Biomedical text mining and network analysis
Poder ciudadano en venezuela
Competenze emotive di un coach. L'empatia nel processo di focusing
Constructivismo y Educación según Mario Carretero por Katherine Semanate
Ad

Similar to Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens (20)

PDF
Bioinformatics Strategies for Exposome 100416
PPT
Day2 145pm Crawford
PDF
Critical appraisal: How to read a scientific paper?
PDF
Informatics and data analytics to support for exposome-based discovery
PPTX
UAB Pulmonary board review study design and statistical principles
PPTX
Depersonalising medicine
PPTX
Montgomery expression
PPT
02 Study Designs - Research Methodology Workshop - Aug 2011.ppt
PPT
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
PPTX
Research by MAGIC
PDF
Data analytics to support exposome research course slides
PDF
Duzkale_2013_Variant Interpretation_
PDF
Methods to enhance the validity of precision guidelines emerging from big data
PPT
Large-scale biomedical data and text integration
PPT
Diabetes Systems Biology And Genetics V6
PPTX
Basics of Research and Bias
PPT
Surviving statistics lecture 1
PPTX
Overview of different statistical tests used in epidemiological
PPTX
Biomarkers for psychological phenotypes?
PDF
Repurposing large datasets for exposomic discovery in disease
Bioinformatics Strategies for Exposome 100416
Day2 145pm Crawford
Critical appraisal: How to read a scientific paper?
Informatics and data analytics to support for exposome-based discovery
UAB Pulmonary board review study design and statistical principles
Depersonalising medicine
Montgomery expression
02 Study Designs - Research Methodology Workshop - Aug 2011.ppt
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Research by MAGIC
Data analytics to support exposome research course slides
Duzkale_2013_Variant Interpretation_
Methods to enhance the validity of precision guidelines emerging from big data
Large-scale biomedical data and text integration
Diabetes Systems Biology And Genetics V6
Basics of Research and Bias
Surviving statistics lecture 1
Overview of different statistical tests used in epidemiological
Biomarkers for psychological phenotypes?
Repurposing large datasets for exposomic discovery in disease

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
PPT
Extract 2.0: Text-mining-assisted interactive annotation
PPT
Network visualization: A crash course on using Cytoscape
PPT
STRING & STITCH : Network integration of heterogeneous data
PPT
Biomedical text mining: Automatic processing of unstructured text
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
PPT
Network Biology: A crash course on STRING and Cytoscape
PPT
Cellular networks
PPT
Cellular Network Biology: Large-scale integration of data and text
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
PPT
STRING & related databases: Large-scale integration of heterogeneous data
PPT
Tagger: Rapid dictionary-based named entity recognition
PPT
Network Biology: Large-scale integration of data and text
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
PPT
Network biology: Large-scale integration of data and text
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
PPT
Cellular Network Biology
PPT
Network biology: Large-scale integration of data and text
PPT
Biomarker bioinformatics: Network-based candidate prioritization
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Simple text-mining strategies for biomedicine
Extract 2.0: Text-mining-assisted interactive annotation
Network visualization: A crash course on using Cytoscape
STRING & STITCH : Network integration of heterogeneous data
Biomedical text mining: Automatic processing of unstructured text
Medical network analysis: Linking diseases and genes through data and text mi...
Network Biology: A crash course on STRING and Cytoscape
Cellular networks
Cellular Network Biology: Large-scale integration of data and text
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
STRING & related databases: Large-scale integration of heterogeneous data
Tagger: Rapid dictionary-based named entity recognition
Network Biology: Large-scale integration of data and text
Medical text mining: Linking diseases, drugs, and adverse reactions
Network biology: Large-scale integration of data and text
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Cellular Network Biology
Network biology: Large-scale integration of data and text
Biomarker bioinformatics: Network-based candidate prioritization

Recently uploaded (20)

PPTX
2. Earth - The Living Planet earth and life
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
famous lake in india and its disturibution and importance
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Sciences of Europe No 170 (2025)
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Comparative Structure of Integument in Vertebrates.pptx
2. Earth - The Living Planet earth and life
POSITIONING IN OPERATION THEATRE ROOM.ppt
Phytochemical Investigation of Miliusa longipes.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
. Radiology Case Scenariosssssssssssssss
ECG_Course_Presentation د.محمد صقران ppt
HPLC-PPT.docx high performance liquid chromatography
famous lake in india and its disturibution and importance
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
AlphaEarth Foundations and the Satellite Embedding dataset
Introduction to Cardiovascular system_structure and functions-1
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Sciences of Europe No 170 (2025)
Classification Systems_TAXONOMY_SCIENCE8.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
INTRODUCTION TO EVS | Concept of sustainability
Comparative Structure of Integument in Vertebrates.pptx

Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens