Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Download as PPT, PDF

0 likes351 views

Lars Juhl Jensen

This document discusses statistical methods for analyzing high-throughput biomedical screens and common pitfalls. It introduces several statistical tests that can be used such as t-tests, ANOVA, Fisher's exact test, Mann-Whitney U test, Kolmogorov-Smirnov test, multiple testing corrections like Bonferroni and Benjamini-Hochberg, and resampling methods. It also discusses biases that can occur in big data analyses like studiedness bias and abundance bias, and how to determine if findings are statistically significant as well as biologically relevant.

Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen

Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput
screens
Lars Juhl Jensen

t-test

ANOVA

normal distribution

useful tests

counts

contingency table

Jensen et al., Nature Reviews Genetics, 2012

Fisher’s exact test

real numbers

no theoretical distribution

non-parametric statistics

do the medians differ?

Mann–Whitney U test

medians can mislead you

do the distributions differ?

Kolmogorov–Smirnov test

Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

does not tell how they differ

resampling

Monte Carlo testing

Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

always applicable

compute intensive

multiple testing

xkcd.com

xkcd.com

xkcd.com

xkcd.com

compare multiple condition

Gene Ontology enrichment

Bonferroni

avoid making any errors

too conservative

Benjamini–Hochberg

control false discovery rate

assumes independence

resampling

negative set

systematic biases

Huang et al., Journal of Proteome Research, 2014

studiedness bias

we study disease proteins

thus we know many PTMs

abundance bias

higher expressed

easier to detect in assays

better characterized

matched background

the big data effect

if you have enough data

any difference is significant

but maybe not relevant

biases become significant

“significant”

statistical significance

p-value

biological relevance

fold change

relative risk

significant and relevant

volcano plots

Lundby et al., Science Signaling, 2013

rather ad hoc

questions?

Ad

Recommended

PPT

Statistics on big biomedical data - Methods and pitfalls when analyzing high...

Lars Juhl Jensen

PDF

Unifying Genomics, Phenomics, and Environments

PPT

297 vickers

Society for Scholarly Publishing

PPT

297 vickers

Society for Scholarly Publishing

PDF

littenberg-strep

Meredith Woodward King

PPT

Cómo distinguir una investigación seria de una fraudulenta

PDF

Prediction research in a pandemic: 3 lessons from a living systematic review ...

DOCX

Print

PPTX

Responsible Conduct of Research

T.J. Kasperbauer

PPTX

Ethics and Stem Cells

T.J. Kasperbauer

PPTX

Genomics privacy

T.J. Kasperbauer

PDF

Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics

PDF

Glyn Elwyn, Shared Decision Making... a dangerous idea

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics

PPTX

How To Lie With Statistics Chapter 10

PPTX

Secondary Data Analysis

PPT

Covering Screening Tests: Do No Harm (As A Reporter)

ODP

ConstructPrecisePhenotypesBigDataChallenge

PDF

Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...

PPTX

Big Data: Learning from MIMIC- Celi

intensivecaresociety

PDF

Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium

PPT

Clinical Research Issues

Connie Dello Buono

PPTX

BYO App: Announcing Linq from Open mHealth

PPT

Collin O´Neil MedicReS 5th World Congress 2015

PPT

PT 610: EBP and Information Management

PPTX

Principles of data_science

PPTX

Share & Flourish workshop, Leiden, August 2014

Varsha Khodiyar

PDF

Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...

The Research Council of Norway, IKTPLUSS

PDF

Euro90_en

PPTX

Green Campus Challenge

Ayush Choudhary

PDF

Tribunal buletin volume 2 of 2015

Siyabonga Ntshangase

More Related Content

PPT

Statistics on big biomedical data - Methods and pitfalls when analyzing high...

Lars Juhl Jensen

PDF

Unifying Genomics, Phenomics, and Environments

PPT

297 vickers

Society for Scholarly Publishing

PPT

297 vickers

Society for Scholarly Publishing

PDF

littenberg-strep

Meredith Woodward King

PPT

Cómo distinguir una investigación seria de una fraudulenta

PDF

Prediction research in a pandemic: 3 lessons from a living systematic review ...

DOCX

Print

Statistics on big biomedical data - Methods and pitfalls when analyzing high...

Lars Juhl Jensen

Unifying Genomics, Phenomics, and Environments

297 vickers

Society for Scholarly Publishing

297 vickers

Society for Scholarly Publishing

littenberg-strep

Meredith Woodward King

Cómo distinguir una investigación seria de una fraudulenta

Prediction research in a pandemic: 3 lessons from a living systematic review ...

Print

What's hot (19)

PPTX

Responsible Conduct of Research

T.J. Kasperbauer

PPTX

Ethics and Stem Cells

T.J. Kasperbauer

PPTX

Genomics privacy

T.J. Kasperbauer

PDF

Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics

PDF

Glyn Elwyn, Shared Decision Making... a dangerous idea

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics

PPTX

How To Lie With Statistics Chapter 10

PPTX

Secondary Data Analysis

PPT

Covering Screening Tests: Do No Harm (As A Reporter)

ODP

ConstructPrecisePhenotypesBigDataChallenge

PDF

Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...

PPTX

Big Data: Learning from MIMIC- Celi

intensivecaresociety

PDF

Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium

PPT

Clinical Research Issues

Connie Dello Buono

PPTX

BYO App: Announcing Linq from Open mHealth

PPT

Collin O´Neil MedicReS 5th World Congress 2015

PPT

PT 610: EBP and Information Management

PPTX

Principles of data_science

PPTX

Share & Flourish workshop, Leiden, August 2014

Varsha Khodiyar

PDF

Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...

The Research Council of Norway, IKTPLUSS

Responsible Conduct of Research

T.J. Kasperbauer

Ethics and Stem Cells

T.J. Kasperbauer

Genomics privacy

T.J. Kasperbauer

Dominick Frosch, Many Miles to Go……Implementing Shared Decision Making in Rou...

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics

Glyn Elwyn, Shared Decision Making... a dangerous idea

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics

How To Lie With Statistics Chapter 10

Secondary Data Analysis

Covering Screening Tests: Do No Harm (As A Reporter)

ConstructPrecisePhenotypesBigDataChallenge

Zorg | 150129 | Big Data | Een optie voor de toekomst van preventieve medisch...

Big Data: Learning from MIMIC- Celi

intensivecaresociety

Poster Validation of child search filters for Pubmed-18th Cochrane Colloquium

Clinical Research Issues

Connie Dello Buono

BYO App: Announcing Linq from Open mHealth

Collin O´Neil MedicReS 5th World Congress 2015

PT 610: EBP and Information Management

Principles of data_science

Share & Flourish workshop, Leiden, August 2014

Varsha Khodiyar

Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...

The Research Council of Norway, IKTPLUSS

Ad

Viewers also liked (12)

PDF

Euro90_en

PPTX

Green Campus Challenge

Ayush Choudhary

PDF

Tribunal buletin volume 2 of 2015

Siyabonga Ntshangase

DOC

διαγωνισμα α τετραμηνου στην ειδηση

Katerina Aivazoglou

PDF

กิจกรรมที่ 1

Nittaya Tangmonpiean

PDF

Maximize Your Membership

Christina Anderson

PPT

Modern Purchasing Roles

Mattias Hultheimer

PPT

Biomedical text mining and network analysis

Lars Juhl Jensen

DOCX

Poder ciudadano en venezuela

PPT

Constatação de óbito

Residência em MFC (Medicina de Família e Comunidade) do HPRB (Hospital Regional de Betim)

PDF

Competenze emotive di un coach. L'empatia nel processo di focusing

PPTX

Constructivismo y Educación según Mario Carretero por Katherine Semanate

Katherinesemanatet

Euro90_en

Green Campus Challenge

Ayush Choudhary

Tribunal buletin volume 2 of 2015

Siyabonga Ntshangase

διαγωνισμα α τετραμηνου στην ειδηση

Katerina Aivazoglou

กิจกรรมที่ 1

Nittaya Tangmonpiean

Maximize Your Membership

Christina Anderson

Modern Purchasing Roles

Mattias Hultheimer

Biomedical text mining and network analysis

Lars Juhl Jensen

Poder ciudadano en venezuela

Constatação de óbito

Residência em MFC (Medicina de Família e Comunidade) do HPRB (Hospital Regional de Betim)

Competenze emotive di un coach. L'empatia nel processo di focusing

Constructivismo y Educación según Mario Carretero por Katherine Semanate

Katherinesemanatet

Ad

Similar to Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens (20)

PDF

Bioinformatics Strategies for Exposome 100416

PPT

Day2 145pm Crawford

PDF

Critical appraisal: How to read a scientific paper?

Mohammed Abd El Wadood

PDF

Informatics and data analytics to support for exposome-based discovery

PPTX

UAB Pulmonary board review study design and statistical principles

Terry Shaneyfelt

PPTX

Depersonalising medicine

PPTX

Montgomery expression

PPT

02 Study Designs - Research Methodology Workshop - Aug 2011.ppt

ParameshwariPrahalad

PPT

Lessons learned in polygenic risk research | Grand Rapids, MI 2019

Cecile Janssens

PPTX

Research by MAGIC

Mitchell Maltenfort

PDF

Data analytics to support exposome research course slides

PDF

Duzkale_2013_Variant Interpretation_

Hatice Duzkale, MD, MPH, PhD, FACMG

PDF

Methods to enhance the validity of precision guidelines emerging from big data

PPT

Large-scale biomedical data and text integration

Lars Juhl Jensen

PPT

Diabetes Systems Biology And Genetics V6

PPTX

Basics of Research and Bias

Brian Wells, MD, MS, MPH

PPT

Surviving statistics lecture 1

PPTX

Overview of different statistical tests used in epidemiological

PPTX

Biomarkers for psychological phenotypes?

PDF

Repurposing large datasets for exposomic discovery in disease

Bioinformatics Strategies for Exposome 100416

Day2 145pm Crawford

Critical appraisal: How to read a scientific paper?

Mohammed Abd El Wadood

Informatics and data analytics to support for exposome-based discovery

UAB Pulmonary board review study design and statistical principles

Terry Shaneyfelt

Depersonalising medicine

Montgomery expression

02 Study Designs - Research Methodology Workshop - Aug 2011.ppt

ParameshwariPrahalad

Lessons learned in polygenic risk research | Grand Rapids, MI 2019

Cecile Janssens

Research by MAGIC

Mitchell Maltenfort

Data analytics to support exposome research course slides

Duzkale_2013_Variant Interpretation_

Hatice Duzkale, MD, MPH, PhD, FACMG

Methods to enhance the validity of precision guidelines emerging from big data

Large-scale biomedical data and text integration

Lars Juhl Jensen

Diabetes Systems Biology And Genetics V6

Basics of Research and Bias

Brian Wells, MD, MS, MPH

Surviving statistics lecture 1

Overview of different statistical tests used in epidemiological

Biomarkers for psychological phenotypes?

Repurposing large datasets for exposomic discovery in disease

More from Lars Juhl Jensen (20)

PPT

One tagger, many uses: Illustrating the power of dictionary-based named entit...

Lars Juhl Jensen

PPT

One tagger, many uses: Simple text-mining strategies for biomedicine

Lars Juhl Jensen

PPT

Extract 2.0: Text-mining-assisted interactive annotation

Lars Juhl Jensen

PPT

Network visualization: A crash course on using Cytoscape

Lars Juhl Jensen

PPT

STRING & STITCH: Network integration of heterogeneous data

Lars Juhl Jensen

PPT

Biomedical text mining: Automatic processing of unstructured text

Lars Juhl Jensen

PPT

Medical network analysis: Linking diseases and genes through data and text mi...

Lars Juhl Jensen

PPT

Network Biology: A crash course on STRING and Cytoscape

Lars Juhl Jensen

PPT

Cellular networks

Lars Juhl Jensen

PPT

Cellular Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...

Lars Juhl Jensen

PPT

STRING & related databases: Large-scale integration of heterogeneous data

Lars Juhl Jensen

PPT

Tagger: Rapid dictionary-based named entity recognition

Lars Juhl Jensen

PPT

Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Medical text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

PPT

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

PPT

Cellular Network Biology

Lars Juhl Jensen

PPT

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

PPT

Biomarker bioinformatics: Network-based candidate prioritization

Lars Juhl Jensen

One tagger, many uses: Illustrating the power of dictionary-based named entit...

Lars Juhl Jensen

One tagger, many uses: Simple text-mining strategies for biomedicine

Lars Juhl Jensen

Extract 2.0: Text-mining-assisted interactive annotation

Lars Juhl Jensen

Network visualization: A crash course on using Cytoscape

Lars Juhl Jensen

STRING & STITCH: Network integration of heterogeneous data

Lars Juhl Jensen

Biomedical text mining: Automatic processing of unstructured text

Lars Juhl Jensen

Medical network analysis: Linking diseases and genes through data and text mi...

Lars Juhl Jensen

Network Biology: A crash course on STRING and Cytoscape

Lars Juhl Jensen

Cellular networks

Lars Juhl Jensen

Cellular Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...

Lars Juhl Jensen

STRING & related databases: Large-scale integration of heterogeneous data

Lars Juhl Jensen

Tagger: Rapid dictionary-based named entity recognition

Lars Juhl Jensen

Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

Medical text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

Lars Juhl Jensen

Cellular Network Biology

Lars Juhl Jensen

Network biology: Large-scale integration of data and text

Lars Juhl Jensen

Biomarker bioinformatics: Network-based candidate prioritization

Lars Juhl Jensen

Recently uploaded (20)

PPTX

2. Earth - The Living Planet earth and life

markjustinebarolobau

PPT

POSITIONING IN OPERATION THEATRE ROOM.ppt

PDF

Phytochemical Investigation of Miliusa longipes.pdf

IrfanShahirSharafi

PDF

Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

PPTX

DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.

PDF

. Radiology Case Scenariosssssssssssssss

PPTX

ECG_Course_Presentation د.محمد صقران ppt

PDF

HPLC-PPT.docx high performance liquid chromatography

darshanambiga1633

PPTX

famous lake in india and its disturibution and importance

PPTX

Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...

Muhammad Sajid Afridi

PDF

AlphaEarth Foundations and the Satellite Embedding dataset

PPTX

Introduction to Cardiovascular system_structure and functions-1

PPT

The World of Physical Science, • Labs: Safety Simulation, Measurement Practice

PDF

Sciences of Europe No 170 (2025)

Sciences of Europe

PPTX

Classification Systems_TAXONOMY_SCIENCE8.pptx

PPTX

Introduction to Fisheries Biotechnology_Lesson 1.pptx

PDF

SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf

PDF

Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...

PPTX

INTRODUCTION TO EVS | Concept of sustainability

PPTX

Comparative Structure of Integument in Vertebrates.pptx

Dr Showkat Ahmad Wani

2. Earth - The Living Planet earth and life

markjustinebarolobau

POSITIONING IN OPERATION THEATRE ROOM.ppt

Phytochemical Investigation of Miliusa longipes.pdf

IrfanShahirSharafi

Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.

. Radiology Case Scenariosssssssssssssss

ECG_Course_Presentation د.محمد صقران ppt

HPLC-PPT.docx high performance liquid chromatography

darshanambiga1633

famous lake in india and its disturibution and importance

Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...

Muhammad Sajid Afridi

AlphaEarth Foundations and the Satellite Embedding dataset

Introduction to Cardiovascular system_structure and functions-1

The World of Physical Science, • Labs: Safety Simulation, Measurement Practice

Sciences of Europe No 170 (2025)

Sciences of Europe

Classification Systems_TAXONOMY_SCIENCE8.pptx

Introduction to Fisheries Biotechnology_Lesson 1.pptx

SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf

Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...

INTRODUCTION TO EVS | Concept of sustainability

Comparative Structure of Integument in Vertebrates.pptx

Dr Showkat Ahmad Wani

Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

1. Statistics on big biomedical data Methods and pitfalls when analyzing high-throughput screens Lars Juhl Jensen

2. Statistics on big biomedical data Methods and pitfalls when analyzing high-throughput screens Lars Juhl Jensen

5. normal distribution

6. useful tests

8. contingency table

9. Jensen et al., Nature Reviews Genetics, 2012

10. Fisher’s exact test

11. real numbers

12. no theoretical distribution

13. non-parametric statistics

14. do the medians differ?

15. Mann–Whitney U test

16. medians can mislead you

17. do the distributions differ?

18. Kolmogorov–Smirnov test

20. does not tell how they differ

22. Monte Carlo testing

24. always applicable

25. compute intensive

26. multiple testing

31. compare multiple condition

32. Gene Ontology enrichment

34. avoid making any errors

35. too conservative

36. Benjamini–Hochberg

37. control false discovery rate

38. assumes independence

40. negative set

41. systematic biases

42. Huang et al., Journal of Proteome Research, 2014

43. studiedness bias

44. we study disease proteins

45. thus we know many PTMs

46. abundance bias

47. higher expressed

48. easier to detect in assays

49. better characterized

50. matched background

51. the big data effect

52. if you have enough data

53. any difference is significant

54. but maybe not relevant

55. biases become significant

56. “significant”

57. statistical significance

59. biological relevance

60. fold change

61. relative risk

62. significant and relevant

63. volcano plots

64. Lundby et al., Science Signaling, 2013

65. rather ad hoc