Demystifying p<.05: A Balanced Approach to Significance Testing (or Avoiding it Altogether) 📈🧐

Demystifying p<.05: A Balanced Approach to Significance Testing (or Avoiding it Altogether) 📈🧐

Feeling shamed for not adhering to a p<.05 statistical significance rule in your UX research? Don’t.

The p<.05 standard is a benchmark set by Ronald Fisher in the 1920s before modern computers were available. It’s the probability that your results occurred simply by random chance. For example, if five users prefer Design A and four prefer Design B, would you be confident that the larger population prefers Design A? Of course not. Why? Because if you reran the test with new users you could get four who prefer A and five who prefer B. There’s no evidence that your designs differ in preference because the likelihood of getting these results by chance is high. In Fisher’s world, the odds that the results occurred by random chance are “greater than 5%."  

So, how did we end up with the .05 standard? Early statisticians thought it was reasonable, and scientific journals picked it up and made it gospel. Besides, calculating by hand the exact probability of your results occurring by chance could have taken months! So they made a table of “critical values” that you could compare your statistical result against to see if it was “over or under” the critical value that indicates 5%. For its time, it was a useful concept.

Old traditions die hard. Even though we can calculate the exact alpha levels of experiments now in a flash, many people (and journals) cling to the old notion of p <.05 religiously.  

But think about it. What if there’s a 6% chance that your results occurred by chance? Fisher would say your results are not statistically significant. But, if you’re in business, is there an appreciable difference in your decision-making when the chance of a false positive is 6% versus 5%? What about 9%?

The answer, of course, is “It depends.” What are the costs of a false positive? The p<.20 might be a reasonable standard if the costs are relatively small. If they are life and death, p<.05 seems woefully inadequate. Would you take an experimental treatment if there were “only” a 5% chance it would kill you?

Statistical significance testing also often ignores the importance of “effect size.” Let’s say you have a very large sample, and your new design is preferred more than the old one with a statistical significance level of p<.01. Great, right? Fisher would be proud. Now let’s say the mean preference on a 1-10 scale for the new design is 7.6, and the mean preference for the old design is 7.5. It’s a reliable statistically significant difference that would almost certainly replicate time and time again. But is it worth it to implement given the associated costs? No, because the effect size (although significant) is too small.  

Is there a better way? Enter Bayesian analysis. Bayesian methods shift the focus from rigid, binary "significant or not" decisions to probabilistic reasoning. Think of it as a nuanced conversation with your data. Instead of asking, "Is this result statistically significant at the p<.05 level?" Bayesian analysis prompts a more relevant question: "Given the data and our prior knowledge, what is the probability that one design is genuinely better than the other?" This approach is particularly advantageous when dealing with complex or uncertain scenarios common in UX research. It allows for incorporating prior knowledge and expertise into the analysis, yielding contextually richer insights and often more directly applicable to business decisions.

Let's be clear: advocating for a more nuanced approach than the p<.05 standard is not a call to abandon hypothesis evaluation- far from it. Statistical analysis remains a cornerstone of robust UX research. But, it's time to rethink our adherence to the p<.05 dogma in UX research and embrace a more flexible, nuanced approach.

It's crucial to consider the real-world implications of our findings, the magnitude of effect sizes, and the consequences and practicality of decision-making thresholds. With their probabilistic and contextual richness, Bayesian methods offer a compelling alternative. So, let's break free from the shackles of p<.05 and step into a more informed and adaptable era of data analysis, where the true goal is insightful, actionable conclusions, not just statistical victories.

#UXResearchInsights #BeyondP05 #StatisticalSignificance #BayesianAnalysisUX #DataDrivenDesign

#RethinkStatistics

Paula Bach

Principal Director Product Research Microsoft Azure Data and Fabric

1y

Joshua Noble - Bayesian!

Aaron Mooney

EHS Professional with experience in Statistics, Analytics, and System Design

1y
Krystal Cooper

AI Researcher | Creative Engineer | Content Creator | #GHC25 #specsquad | Startup Advisor

1y

What will it take to change the tide? p<.05 feels like the UX version of the developer’s always ending up debating if something is deterministic or probabilistic for every complex code challenge. So many other things can impact variance and variables and are worth of discussion.

Karla H.R

Chevening scholar at LSE's MSc Management of Information Systems and Digital Innovation

1y
Like
Reply

Thanks for the article! John Neuhoff Back in the day, Practical Significance was a real eye-opener for me. It's crucial to communicate this comprehension when presenting outcomes to stakeholders. Because they are likely to think in terms of p<.05.

To view or add a comment, sign in

Others also viewed

Explore content categories