SlideShare a Scribd company logo
A/B testing and problems with
statistics
Web Analytics Wednesday Singapore
Nikolay Novozhilov, Wego.com
www.novozhilov.co
Is there a problem with A/B testing?
Imaginary uplifts
100 tests done, 10 successful, 10% uplift each…
…expect 159% growth!
Expectation Reality
Why?
… and what to do about it
Lies, damned lies, and statistics
All different! All based on assumptions!!!
Tool Test used
Optimizely Two-tailed sequential likelihood ratio test
with false discovery rate controls
Google Analytics Bayes estimate with uniform beta prior
VWO Intersection of confidence intervals for
binominal distribution
Leanplum Confidence intervals at p=5%, unknown
statistic
Usereffect Chi-square statistics
Commerce Sciences Welch's t-test
What is p-value and why it is 5%?
All tests are
based on
assumptions!
Assumption #1:
You don’t look at
the data upfront
What happens if you look?
I played Monte Carlo in
Excel
And here is the result:
• 5% p-value
• 1000 “users” in each
sample
• CR of 2%
• A wins over A 29% of
the times!
What do you do about it?
Don’t look! (just kidding)
Google “O'Brien & Fleming interim
analysis” (no, still kidding )
Keep calm, more stuff coming!
“My test on Buy button showed
interesting results…”
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
-3% -23% +6% -9%
-2% +22% -11% -14%
-1% +9% -12% -1%
10000 users in each variant, base CR=1%
But in reality all colors were
the same…
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
-3% -23% +6% -9%
-2% +22% -11% -14%
-1% +9% -12% -1%
10000 users in each variant, base CR=1%
The real problem!
Multivariate testing
Multiple comparisons
Be smart or be Google
Sample
size
Significa
nce
Effect
size
Power
Start with a good hypothesis!
But people are good in finding plausible
explanations for data!
Replication
Do your dirty
business
Register Replicate
This might work!
Stop math, I’m a web designer!
Visual way of doing it
Has some stat meaning!
ReplicationsVariance observation

More Related Content

PPSX
Multiplication 0s 1s
PDF
Viacheslav Kravchuk: Conversion rate optimisation. What’s really proved to m...
PPTX
A/B tests
ODP
Actionable Machine Learning
PPT
User Research @ Bitspiration2013
PDF
Your assumptions are probably wrong.
PDF
An Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
PDF
Survey Monkey Survey Results
Multiplication 0s 1s
Viacheslav Kravchuk: Conversion rate optimisation. What’s really proved to m...
A/B tests
Actionable Machine Learning
User Research @ Bitspiration2013
Your assumptions are probably wrong.
An Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
Survey Monkey Survey Results

Similar to A/B testing problems (20)

PPTX
Waws january 2015-nikolay-novozhilov
PDF
Data Insights Talk
PDF
Setting up an A/B-testing framework
PDF
Andrii Belas: A/B testing overview: use-cases, theory and tools
PDF
Optimizely Workshop: Take Action on Results with Statistics
PPTX
10 Guidelines for A/B Testing
PDF
Columbus Data & Analytics Wednesdays - June 2024
PPTX
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
PPTX
Building an A/B Testing Analytics System with R and Shiny
PPTX
A/B Testing at Scale
PDF
A/B testing from basic concepts to advanced techniques
PDF
Data Science Toolkit for Product Managers
PDF
Data science toolkit for product managers
PDF
Chris Stuccio - Data science - Conversion Hotel 2015
PDF
A/B Testing: Common Pitfalls and How to Avoid Them
PPTX
Crash Course in A/B testing
PDF
You have no idea what your users want - WordCamp PDX
PPTX
CS194Lec0hbh6EDA.pptx
PDF
Principles Before Practices: Transform Your Testing by Understanding Key Conc...
PPTX
Ab test
Waws january 2015-nikolay-novozhilov
Data Insights Talk
Setting up an A/B-testing framework
Andrii Belas: A/B testing overview: use-cases, theory and tools
Optimizely Workshop: Take Action on Results with Statistics
10 Guidelines for A/B Testing
Columbus Data & Analytics Wednesdays - June 2024
GIAF UK Winter 2015 - Analytical techniques: A practical guide to answering b...
Building an A/B Testing Analytics System with R and Shiny
A/B Testing at Scale
A/B testing from basic concepts to advanced techniques
Data Science Toolkit for Product Managers
Data science toolkit for product managers
Chris Stuccio - Data science - Conversion Hotel 2015
A/B Testing: Common Pitfalls and How to Avoid Them
Crash Course in A/B testing
You have no idea what your users want - WordCamp PDX
CS194Lec0hbh6EDA.pptx
Principles Before Practices: Transform Your Testing by Understanding Key Conc...
Ab test
Ad

Recently uploaded (20)

PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Introduction to Data Science and Data Analysis
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
Qualitative Qantitative and Mixed Methods.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Data Science and Data Analysis
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
Reliability_Chapter_ presentation 1221.5784
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction-to-Cloud-ComputingFinal.pptx
Clinical guidelines as a resource for EBP(1).pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
1_Introduction to advance data techniques.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Ad

A/B testing problems

  • 1. A/B testing and problems with statistics Web Analytics Wednesday Singapore Nikolay Novozhilov, Wego.com www.novozhilov.co
  • 2. Is there a problem with A/B testing?
  • 3. Imaginary uplifts 100 tests done, 10 successful, 10% uplift each… …expect 159% growth! Expectation Reality
  • 4. Why? … and what to do about it
  • 5. Lies, damned lies, and statistics All different! All based on assumptions!!! Tool Test used Optimizely Two-tailed sequential likelihood ratio test with false discovery rate controls Google Analytics Bayes estimate with uniform beta prior VWO Intersection of confidence intervals for binominal distribution Leanplum Confidence intervals at p=5%, unknown statistic Usereffect Chi-square statistics Commerce Sciences Welch's t-test
  • 6. What is p-value and why it is 5%? All tests are based on assumptions! Assumption #1: You don’t look at the data upfront
  • 7. What happens if you look? I played Monte Carlo in Excel And here is the result: • 5% p-value • 1000 “users” in each sample • CR of 2% • A wins over A 29% of the times!
  • 8. What do you do about it? Don’t look! (just kidding) Google “O'Brien & Fleming interim analysis” (no, still kidding ) Keep calm, more stuff coming!
  • 9. “My test on Buy button showed interesting results…” Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! -3% -23% +6% -9% -2% +22% -11% -14% -1% +9% -12% -1% 10000 users in each variant, base CR=1%
  • 10. But in reality all colors were the same… Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! -3% -23% +6% -9% -2% +22% -11% -14% -1% +9% -12% -1% 10000 users in each variant, base CR=1%
  • 11. The real problem! Multivariate testing Multiple comparisons
  • 12. Be smart or be Google Sample size Significa nce Effect size Power
  • 13. Start with a good hypothesis! But people are good in finding plausible explanations for data!
  • 14. Replication Do your dirty business Register Replicate This might work!
  • 15. Stop math, I’m a web designer!
  • 16. Visual way of doing it
  • 17. Has some stat meaning! ReplicationsVariance observation

Editor's Notes

  • #3: Ask audience: Do you do A/B testing? What you think is a problem?