"‘Repligate’: reproducibility in statistical
studies. What does it mean and in what
sense does it matter?"
Stephen Senn
(c) Stephen Senn 1
Acknowledgements
(c) Stephen Senn 2
Acknowledgements
Thanks to Rogier Kievit and Richard Morey for the invitation and to APS for support
This work is partly supported by the European Union’s 7th Framework Programme for
research, technological development and demonstration under grant agreement no.
602552. “IDEAL”
Outline
• The crisis of replication?
• A brief history of P-values
• What are we looking for in replication?
• Empirical evidence
• Conclusions
(c) Stephen Senn 3
The Crisis of Replication
(c) Stephen Senn 4
In countless tweets….The “replication police” were described as “shameless little
bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not
designed to find truth,” “second stringers” who were incapable of making novel
contributions of their own to the literature, and—most succinctly—“assholes.”
Why Psychologists’ Food Fight Matters
“Important findings” haven’t been replicated, and science may have to change
its ways.
By Michelle N. Meyer and Christopher Chabris , Science
Ioannidis (2005)
• Claimed that most
published research
findings are wrong
– By finding he means a
‘positive’ result
• 2764 citations by 18
May 2015 according to
Google Scholar
(c) Stephen Senn 5
Colquhoun’s Criticisms
(c) Stephen Senn 6
“If you want to avoid making a fool of yourself very often, do not regard anything greater than
p <0.001 as a demonstration that you have discovered something. Or, slightly less stringently,
use a three-sigma rule.”
Royal Society Open Science 2014
A Common Story
• Scientists were treading the path of Bayesian
reason
• Along came RA Fisher and persuaded them into a
path of P-value madness
• This is responsible for a lot of unrepeatable
nonsense
• We need to return them to the path of Bayesian
virtue
• In fact the history is not like this and
understanding this is a key to understanding the
problem
(c) Stephen Senn 7
(c) Stephen Senn 8
Fisher, Statistical Methods for Research Workers, 1925
The real history
• Scientists before Fisher were using tail area
probabilities to calculate posterior probabilities
• Fisher pointed out that this interpretation was
unsafe and offered a more conservative one
• Jeffreys, influenced by CD Broad’s criticism, was
unsatisfied with the Laplacian framework and
used a lump prior probability on a point
hypothesis being true
• It is Bayesian Jeffreys versus Bayesian Laplace
that makes the dramatic difference, not
frequentist Fisher versus Bayesian Laplace
(c) Stephen Senn 9
(c) Stephen Senn 10
Why the difference?
• Imagine a point estimate of two standard
errors
• Now consider the likelihood ratio for a given
value of the parameter,  under the
alternative to one under the null
– Dividing hypothesis (smooth prior) for any given
value  = ’ compare to  = -’
– Plausible hypothesis (lump prior) for any given
value  = ’ compare to  = 0
(c) Stephen Senn 11
A speculation of mine
• Scientists had noticed that for dividing
hypotheses they could get ‘significance’ rather
easily
– The result is the 1/20 rule
• However when deciding to choose a new
parameter or not in terms of probability it is 50%
not 5% that is relevant
• This explains the baffling finding that significance
test are actually more conservative than AIC (and
sometimes than BIC)
(c) Stephen Senn 12
The situations compared
(c) Stephen Senn 13
(c) Stephen Senn 14
Goodman’s Criticism
• What is the probability of repeating a result
that is just significant at the 5% level (p=0.05)?
• Answer 50%
– If true difference is observed difference
– If uninformative prior for true treatment effect
• Therefore P-values are unreliable as inferential
aids
(c) Stephen Senn 15
Sauce for the Goose and Sauce for the
Gander
• This property is shared by Bayesian
statements
– It follows from the Martingale property of
Bayesian forecasts
• Hence, either
– The property is undesirable and hence is a
criticism of Bayesian methods also
– Or it is desirable and is a point in favour of
frequentist methods
(c) Stephen Senn 16
Three Possible Questions
• Q1 What is the probability that in a future experiment,
taking that experiment's results alone, the estimate for B
would after all be worse than that for A?
• Q2 What is the probability, having conducted this
experiment, and pooled its results with the current one,
we would show that the estimate for B was, after all,
worse than that for A?
• Q3 What is the probability that having conducted a future
experiment and then calculated a Bayesian posterior using
a uniform prior and the results of this second experiment
alone, the probability that B would be worse than A
would be less than or equal to 0.05?
(c) Stephen Senn 17
(c) Stephen Senn 18
Why Goodman’s Criticism is Irrelevant
“It would be absurd if our inferences about the world, having just completed a
clinical trial, were necessarily dependent on assuming the following. 1. We are
now going to repeat this experiment. 2. We are going to repeat it only once. 3.
It must be exactly the same size as the experiment we have just run. 4. The
inferential meaning of the experiment we have just run is the extent to which
it predicts this second experiment.”
Senn, 2002
(c) Stephen Senn 19
A Paradox of Bayesian Significance
Tests
Two scientists start with the same probability 0.5 that a drug is effective.
Given that it is effective they have the same prior for how effective it is.
If it is not effective A believes that it will be useless but B believes that it may
be harmful.
Having seem the same data B now believes that it is useful with probability
0.95 and A believes that it is useless with probability 0.95.
A Tale of Two priors
(c) Stephen Senn 20
(c) Stephen Senn 21
In Other Words
The probability is 0.95
And the probability is also 0.05
Both of these probabilities can be simultaneously true.
NB This is not illogical but it is illogical to regard this sort of thing as proving that
p-values are illogical
‘…would require that a procedure is dismissed because, when combined
with information which it doesn’t require and which may not exist, it
disagrees with a procedure that disagrees with itself’
Senn, 2000
Are most research findings false?
• A dram of data is worth a pint of pontification
• Two interesting studies recently
– The many labs replications project
• This raised the Twitter storm alluded to earlier
– Jager & Leek, Biostatistics 2014
(c) Stephen Senn 22
Many Labs Replication Project
(c) Stephen Senn 23
Klein et al, Social
Psychology, 2014
Jager & Leek, 2014
• Text-mining of 77,410
abstracts yielded 5,322
P-values
• Considered a mixture
model truncated at 0.05
• Estimated that amongst
‘discoveries’ 14% are
false
𝑓 𝑝 𝑎, 𝑏, 𝜋0
= 𝜋0 𝑢𝑛𝑖𝑓𝑜𝑟𝑚 0,0.05
+ 1 − 𝜋0 𝑡𝐵𝑒𝑡𝑎 𝑎, 𝑏, 0.05
Estimation using the EM algorithm
(c) Stephen Senn 24
But one must be careful
• These studies suggest that a common
threshold of 5% seems to be associated with a
manageable false positive rate
• This does not mean that the threshold is right
– It might reflect (say) that most P-values are either
>0.05 or <<0.05
– The situation might be capable of improvement
using a different threshold
• Also, are false negatives without cost?
(c) Stephen Senn 25
My Conclusion
• P-values per se are not the problem
• There may be a harmful culture of ‘significance’
however this is defined
• P-values have a limited use as rough and ready
tools using little structure
• Where you have more structure you can often do
better
– Likelihood, Bayes etc
– Point estimates and standard errors are extremely
useful for future research synthesizers and should be
provided regularly
(c) Stephen Senn 26

More Related Content

PPTX
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
PDF
Feb21 mayobostonpaper
PDF
Final mayo's aps_talk
PDF
D. Mayo: Replication Research Under an Error Statistical Philosophy
PPTX
Mayo &amp; parker spsp 2016 june 16
PDF
Statistical skepticism: How to use significance tests effectively
PPTX
Severe Testing: The Key to Error Correction
PDF
Gelman psych crisis_2
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
Feb21 mayobostonpaper
Final mayo's aps_talk
D. Mayo: Replication Research Under an Error Statistical Philosophy
Mayo &amp; parker spsp 2016 june 16
Statistical skepticism: How to use significance tests effectively
Severe Testing: The Key to Error Correction
Gelman psych crisis_2

What's hot (20)

PDF
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
PPTX
Replication Crises and the Statistics Wars: Hidden Controversies
PPTX
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
PPTX
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
PPTX
D. G. Mayo: Your data-driven claims must still be probed severely
PPTX
Controversy Over the Significance Test Controversy
PDF
April 3 2014 slides mayo
PDF
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
PDF
Exploratory Research is More Reliable Than Confirmatory Research
PDF
Discussion a 4th BFFF Harvard
PDF
beyond objectivity and subjectivity; a discussion paper
PDF
Statistical Flukes, the Higgs Discovery, and 5 Sigma
PPTX
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
PDF
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
PPTX
Byrd statistical considerations of the histomorphometric test protocol (1)
PPTX
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
PPTX
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
PPTX
D.G. Mayo Slides LSE PH500 Meeting #1
PPTX
D. Mayo: Philosophical Interventions in the Statistics Wars
PDF
Bayes rpp bristol
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Replication Crises and the Statistics Wars: Hidden Controversies
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
D. G. Mayo: Your data-driven claims must still be probed severely
Controversy Over the Significance Test Controversy
April 3 2014 slides mayo
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
Exploratory Research is More Reliable Than Confirmatory Research
Discussion a 4th BFFF Harvard
beyond objectivity and subjectivity; a discussion paper
Statistical Flukes, the Higgs Discovery, and 5 Sigma
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
Byrd statistical considerations of the histomorphometric test protocol (1)
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
D.G. Mayo Slides LSE PH500 Meeting #1
D. Mayo: Philosophical Interventions in the Statistics Wars
Bayes rpp bristol
Ad

Similar to Senn repligate (20)

PPTX
P values and replication
PDF
The replication crisis: are P-values the problem and are Bayes factors the so...
PPTX
The replication crisis: are P-values the problem and are Bayes factors the so...
PPTX
What should we expect from reproducibiliry
PPTX
Trends towards significance
PPTX
P values and the art of herding cats
PPTX
And thereby hangs a tail
PPTX
The Statistics Wars: Errors and Casualties
PDF
The Statistics Wars and Their Casualties (w/refs)
PDF
The Statistics Wars and Their Causalities (refs)
PDF
The Statistics Wars and Their Casualties
PPTX
P value wars
PDF
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
PDF
Crisis of confidence, p-hacking and the future of psychology
PDF
D. G. Mayo Columbia slides for Workshop on Probability &Learning
PDF
P-values in crisis
PPTX
Mayod@psa 21(na)
PDF
Statistical significance
PDF
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
PPT
What is a P-value by Dr. Jeff Witmer.ppt
P values and replication
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
What should we expect from reproducibiliry
Trends towards significance
P values and the art of herding cats
And thereby hangs a tail
The Statistics Wars: Errors and Casualties
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Casualties
P value wars
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Crisis of confidence, p-hacking and the future of psychology
D. G. Mayo Columbia slides for Workshop on Probability &Learning
P-values in crisis
Mayod@psa 21(na)
Statistical significance
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
What is a P-value by Dr. Jeff Witmer.ppt
Ad

More from jemille6 (20)

PDF
What is the Philosophy of Statistics? (and how I was drawn to it)
PDF
Mayo, DG March 8-Emory AI Systems and society conference slides.pdf
PDF
Severity as a basic concept in philosophy of statistics
PDF
“The importance of philosophy of science for statistical science and vice versa”
PDF
D. Mayo JSM slides v2.pdf
PDF
reid-postJSM-DRC.pdf
PDF
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
PDF
Causal inference is not statistical inference
PDF
What are questionable research practices?
PDF
What's the question?
PDF
The neglected importance of complexity in statistics and Metascience
PDF
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
PDF
On Severity, the Weight of Evidence, and the Relationship Between the Two
PDF
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
PDF
Comparing Frequentists and Bayesian Control of Multiple Testing
PPTX
Good Data Dredging
PDF
The Duality of Parameters and the Duality of Probability
PDF
Error Control and Severity
PDF
On the interpretation of the mathematical characteristics of statistical test...
PDF
The role of background assumptions in severity appraisal (
What is the Philosophy of Statistics? (and how I was drawn to it)
Mayo, DG March 8-Emory AI Systems and society conference slides.pdf
Severity as a basic concept in philosophy of statistics
“The importance of philosophy of science for statistical science and vice versa”
D. Mayo JSM slides v2.pdf
reid-postJSM-DRC.pdf
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Causal inference is not statistical inference
What are questionable research practices?
What's the question?
The neglected importance of complexity in statistics and Metascience
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
On Severity, the Weight of Evidence, and the Relationship Between the Two
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Comparing Frequentists and Bayesian Control of Multiple Testing
Good Data Dredging
The Duality of Parameters and the Duality of Probability
Error Control and Severity
On the interpretation of the mathematical characteristics of statistical test...
The role of background assumptions in severity appraisal (

Recently uploaded (20)

PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
Education and Perspectives of Education.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
What’s under the hood: Parsing standardized learning content for AI
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Environmental Education MCQ BD2EE - Share Source.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Empowerment Technology for Senior High School Guide
AI-driven educational solutions for real-life interventions in the Philippine...
Education and Perspectives of Education.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
Race Reva University – Shaping Future Leaders in Artificial Intelligence
Paper A Mock Exam 9_ Attempt review.pdf.
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
Unit 4 Computer Architecture Multicore Processor.pptx
What’s under the hood: Parsing standardized learning content for AI
Cambridge-Practice-Tests-for-IELTS-12.docx
Core Concepts of Personalized Learning and Virtual Learning Environments
B.Sc. DS Unit 2 Software Engineering.pptx
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Computer Architecture Input Output Memory.pptx
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf

Senn repligate

  • 1. "‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?" Stephen Senn (c) Stephen Senn 1
  • 2. Acknowledgements (c) Stephen Senn 2 Acknowledgements Thanks to Rogier Kievit and Richard Morey for the invitation and to APS for support This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
  • 3. Outline • The crisis of replication? • A brief history of P-values • What are we looking for in replication? • Empirical evidence • Conclusions (c) Stephen Senn 3
  • 4. The Crisis of Replication (c) Stephen Senn 4 In countless tweets….The “replication police” were described as “shameless little bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not designed to find truth,” “second stringers” who were incapable of making novel contributions of their own to the literature, and—most succinctly—“assholes.” Why Psychologists’ Food Fight Matters “Important findings” haven’t been replicated, and science may have to change its ways. By Michelle N. Meyer and Christopher Chabris , Science
  • 5. Ioannidis (2005) • Claimed that most published research findings are wrong – By finding he means a ‘positive’ result • 2764 citations by 18 May 2015 according to Google Scholar (c) Stephen Senn 5
  • 6. Colquhoun’s Criticisms (c) Stephen Senn 6 “If you want to avoid making a fool of yourself very often, do not regard anything greater than p <0.001 as a demonstration that you have discovered something. Or, slightly less stringently, use a three-sigma rule.” Royal Society Open Science 2014
  • 7. A Common Story • Scientists were treading the path of Bayesian reason • Along came RA Fisher and persuaded them into a path of P-value madness • This is responsible for a lot of unrepeatable nonsense • We need to return them to the path of Bayesian virtue • In fact the history is not like this and understanding this is a key to understanding the problem (c) Stephen Senn 7
  • 8. (c) Stephen Senn 8 Fisher, Statistical Methods for Research Workers, 1925
  • 9. The real history • Scientists before Fisher were using tail area probabilities to calculate posterior probabilities • Fisher pointed out that this interpretation was unsafe and offered a more conservative one • Jeffreys, influenced by CD Broad’s criticism, was unsatisfied with the Laplacian framework and used a lump prior probability on a point hypothesis being true • It is Bayesian Jeffreys versus Bayesian Laplace that makes the dramatic difference, not frequentist Fisher versus Bayesian Laplace (c) Stephen Senn 9
  • 11. Why the difference? • Imagine a point estimate of two standard errors • Now consider the likelihood ratio for a given value of the parameter,  under the alternative to one under the null – Dividing hypothesis (smooth prior) for any given value  = ’ compare to  = -’ – Plausible hypothesis (lump prior) for any given value  = ’ compare to  = 0 (c) Stephen Senn 11
  • 12. A speculation of mine • Scientists had noticed that for dividing hypotheses they could get ‘significance’ rather easily – The result is the 1/20 rule • However when deciding to choose a new parameter or not in terms of probability it is 50% not 5% that is relevant • This explains the baffling finding that significance test are actually more conservative than AIC (and sometimes than BIC) (c) Stephen Senn 12
  • 13. The situations compared (c) Stephen Senn 13
  • 14. (c) Stephen Senn 14 Goodman’s Criticism • What is the probability of repeating a result that is just significant at the 5% level (p=0.05)? • Answer 50% – If true difference is observed difference – If uninformative prior for true treatment effect • Therefore P-values are unreliable as inferential aids
  • 15. (c) Stephen Senn 15 Sauce for the Goose and Sauce for the Gander • This property is shared by Bayesian statements – It follows from the Martingale property of Bayesian forecasts • Hence, either – The property is undesirable and hence is a criticism of Bayesian methods also – Or it is desirable and is a point in favour of frequentist methods
  • 16. (c) Stephen Senn 16 Three Possible Questions • Q1 What is the probability that in a future experiment, taking that experiment's results alone, the estimate for B would after all be worse than that for A? • Q2 What is the probability, having conducted this experiment, and pooled its results with the current one, we would show that the estimate for B was, after all, worse than that for A? • Q3 What is the probability that having conducted a future experiment and then calculated a Bayesian posterior using a uniform prior and the results of this second experiment alone, the probability that B would be worse than A would be less than or equal to 0.05?
  • 18. (c) Stephen Senn 18 Why Goodman’s Criticism is Irrelevant “It would be absurd if our inferences about the world, having just completed a clinical trial, were necessarily dependent on assuming the following. 1. We are now going to repeat this experiment. 2. We are going to repeat it only once. 3. It must be exactly the same size as the experiment we have just run. 4. The inferential meaning of the experiment we have just run is the extent to which it predicts this second experiment.” Senn, 2002
  • 19. (c) Stephen Senn 19 A Paradox of Bayesian Significance Tests Two scientists start with the same probability 0.5 that a drug is effective. Given that it is effective they have the same prior for how effective it is. If it is not effective A believes that it will be useless but B believes that it may be harmful. Having seem the same data B now believes that it is useful with probability 0.95 and A believes that it is useless with probability 0.95.
  • 20. A Tale of Two priors (c) Stephen Senn 20
  • 21. (c) Stephen Senn 21 In Other Words The probability is 0.95 And the probability is also 0.05 Both of these probabilities can be simultaneously true. NB This is not illogical but it is illogical to regard this sort of thing as proving that p-values are illogical ‘…would require that a procedure is dismissed because, when combined with information which it doesn’t require and which may not exist, it disagrees with a procedure that disagrees with itself’ Senn, 2000
  • 22. Are most research findings false? • A dram of data is worth a pint of pontification • Two interesting studies recently – The many labs replications project • This raised the Twitter storm alluded to earlier – Jager & Leek, Biostatistics 2014 (c) Stephen Senn 22
  • 23. Many Labs Replication Project (c) Stephen Senn 23 Klein et al, Social Psychology, 2014
  • 24. Jager & Leek, 2014 • Text-mining of 77,410 abstracts yielded 5,322 P-values • Considered a mixture model truncated at 0.05 • Estimated that amongst ‘discoveries’ 14% are false 𝑓 𝑝 𝑎, 𝑏, 𝜋0 = 𝜋0 𝑢𝑛𝑖𝑓𝑜𝑟𝑚 0,0.05 + 1 − 𝜋0 𝑡𝐵𝑒𝑡𝑎 𝑎, 𝑏, 0.05 Estimation using the EM algorithm (c) Stephen Senn 24
  • 25. But one must be careful • These studies suggest that a common threshold of 5% seems to be associated with a manageable false positive rate • This does not mean that the threshold is right – It might reflect (say) that most P-values are either >0.05 or <<0.05 – The situation might be capable of improvement using a different threshold • Also, are false negatives without cost? (c) Stephen Senn 25
  • 26. My Conclusion • P-values per se are not the problem • There may be a harmful culture of ‘significance’ however this is defined • P-values have a limited use as rough and ready tools using little structure • Where you have more structure you can often do better – Likelihood, Bayes etc – Point estimates and standard errors are extremely useful for future research synthesizers and should be provided regularly (c) Stephen Senn 26