SlideShare a Scribd company logo
February 2019
A/B testing:
common pitfalls
and how to avoid
them
Igor Karpov
Participation in this meetup is purely on personal basis and not representing any firm in any
form or matter. The talk is based on learnings from work across industries and firms.
Agenda
o About A/B testing
o Pitfalls and Solutions
o Testing Ideas
o Experiment Template
B
∀
What is A/B test?
is an experiment where
two or more variants are
shown to users at
random, and statistical
analysis is used to
determine which variation
performs better.
Allows us to understand
the causal impact of a
change
Users
Treatment
Group
Test Metric p2
Control Group
Test Metric p1
Is (p1-p2) statistically
significant?
What is A/B test?
Clinton Bush Haiti Fund
Source: Siroker et Koomen 2013
Control Treatment
11 % Increase in $
per pageview
When to do A/B testing
Method Description Inference Stage
Prototyping Developing prototypes
Guide a direction of the
product
Ideation
User Testing In-depth interviews Understanding of why Ideation – Development
Surveys / Feedback
Surveys/ Pop-up
questionnaires
Large numbers, a
thorough analysis
Development – Post
launch
A/B Testing Measuring experiments Statistical methods Pre-launch
Why A/B testing?
2000
3000
4000
5000
6000
7000
8000
May June July August September October November
#of Signups
A
Feature is deployed
Why A/B testing?
2000
3000
4000
5000
6000
7000
8000
May June July August September October November
#of Signups
A B
Feature is deployed
With feature
Without feature
Chocolate consumption and Nobel laureates per capita
Source: Franz H. Messerli 2012
Pitfall #1 Ignoring statistical significance
• Correlation is not causation.
• Do your improvements actually affect user behavior or are
the changes due to chance?
• At the end of the experiment do we just pick a variation
that has better metrics?
Pitfall #1 Ignoring statistical significance
Source: Annie Ward , Mildred Murray-Ward 1999
o Statistical significance is a probability that a change is not
due to chance alone
Solution to Pitfall #1
Sample size in
each group
The desired
power (e.g.
.84 for 80% ).
The desired
level of
statistical
significance
(e.g. 1.96 for
95 %).
A measure of
variability
Effect size
(the
difference in
proportions)
n =
2 ̅% (1 − ̅%)(*+,- + *//1)1
(%+ − %1)1
Source: Altman 1991
Simple formula for difference in proportions
Solution to Pitfall #1
alpha beta Baseline Increase Sample size
.05 .80 50% ± 5 % 1565
.05 .80 50% ± 10 % 389
.05 .80 50% ± 15 % 171
.05 .80 50% ± 20 % 95
Examples of Sample Sizes Calculated
Pitfall #2 Not having a workflow for testing
o Choosing non-business related metrics as proxies
o Doing a little analysis in understanding the current user behaviour
o Not formulating a hypothesis before testing
o Testing if green buttons increase conversion rates
o Spending precious time and traffic on random ideas
Solution to Pitfall #2
What is
Success?
Plan
Hypothesis
Funnel
Diagnosis
Test
Measure
Results
Pitfall #3 Not prioritizing experimentation roadmap
o Taking too small risks (local maxima)
o Impacting too little users
o Running experiments that don’t produce a strategic value
o Not estimating designer’s or developer’s workload
o Time spent for coordinating experiments
Solution to Pitfall #3
Effort High
Low
High
Impact
Low
Do it!
Forget it
Reach
Uplift
Strategic
CoordinateTechCreative
If resources
are available
Solution to Pitfall #3
Potential Importance Ease (PIE) Framework by Chris Goward
https://widerfunnel.com/pie-framework/
Time, Impact Resources framework by Bryan Eisenberg
https://www.bryaneisenberg.com/3-steps-to-better-prioritization-and-faster-execution/
Impact Confidence Ease (ICE) Framework by Sean Ellis
https://tech.trello.com/ice-scoring/
Testing ideas to start
Website content. Do users prefer to scroll down the page or click through to
another page to learn more?
Headline copy. Do users prefer headlines that are straightforward, abstract,
goofy, or creative?
Media. Do users prefer to see auto-play or click-to-play video?
Funnels. Do users prefer to have more information on one page or the
information spread across multiple pages?
Social. What social proof users need: brands you work with, testimonials from
other users or influencers?
Pricing. Do you users prefer free trial vs moey-back guarantee?
Experiment template
What is Success?
e.g. Conversion rate
Qualitative
What user research insights supports the
decision to create an experiment
Hypothesis
If ___ then ___ due to ___
Proposed Change
E.g. Show FAQ page after registering
Results
What happened?
Another experiment?
Need to clean-up?
Audience Segment
e.g. Free Trial users
Quantitative
What analytics data supports the decision to
create an experiment
Sample Size
Sample Size and Duration
Thank You
Igor Karpov
/ mrigorkarpov

More Related Content

PDF
A/B testing from basic concepts to advanced techniques
PPTX
Conjoint by idrees iugc
PPTX
E bay amplify_final
PDF
Product Evaluation PowerPoint Presentation Slides
PDF
Common Method Variance Detection in Business Research
PPTX
DoWhy Python library for causal inference: An End-to-End tool
PPTX
Adv 435 ch 10 evaluation
PPTX
A Guide to the Five Whys Technique
A/B testing from basic concepts to advanced techniques
Conjoint by idrees iugc
E bay amplify_final
Product Evaluation PowerPoint Presentation Slides
Common Method Variance Detection in Business Research
DoWhy Python library for causal inference: An End-to-End tool
Adv 435 ch 10 evaluation
A Guide to the Five Whys Technique

What's hot (16)

PDF
Usability testing
PPTX
Predictability of popularity on online social media: Gaps between prediction ...
PPTX
DIY Max-Diff webinar slides
PPTX
Why You Need to Do a Pilot - Mitch Weisburgh, Founder, Games4Ed & Scott Brews...
PPTX
Fundamentals of testing
PPTX
Dowhy: An end-to-end library for causal inference
PPTX
Software estimation
PPT
Consumer and Producers Risk[1]
PPT
5-Why Training
PPTX
5 why analysis training presentaion
PPTX
Root cause analysis using 5 whys
PPTX
Causal inference in practice: Here, there, causality is everywhere
PPTX
Introduction to Usability Testing for Survey Research
PPTX
Graham et.al, 2008, Foundations of Software Testing ISTQB Certification. Chap...
PPTX
Cognitive walkthrough
PDF
Demand forecasting: Looking for answers
Usability testing
Predictability of popularity on online social media: Gaps between prediction ...
DIY Max-Diff webinar slides
Why You Need to Do a Pilot - Mitch Weisburgh, Founder, Games4Ed & Scott Brews...
Fundamentals of testing
Dowhy: An end-to-end library for causal inference
Software estimation
Consumer and Producers Risk[1]
5-Why Training
5 why analysis training presentaion
Root cause analysis using 5 whys
Causal inference in practice: Here, there, causality is everywhere
Introduction to Usability Testing for Survey Research
Graham et.al, 2008, Foundations of Software Testing ISTQB Certification. Chap...
Cognitive walkthrough
Demand forecasting: Looking for answers
Ad

Similar to A/B Testing: Common Pitfalls and How to Avoid Them (20)

PDF
Fighting the hippo - Get A/B experiments right
PDF
Product Experimentation Pitfalls & How to Avoid Them
PDF
Product Experimentation Pitfalls & How to Avoid Them
PDF
Being a Data-Driven Communicator
PDF
Modernize 2018: Driving Innovation Through a Culture of Experimentation, VIP ...
PPTX
iSG Webinar – AB Testing: The most important thing you’re NOT doing
PDF
Analytics Academy 2017 Presentation Slides
PPTX
Introduction to A/B Testing
PDF
A/B testing in Firebase. Intermediate and advanced approach
PDF
Fail Well, Pivot Fast: Product Experimentation for Continuous Discovery
PDF
When in doubt, go live
PPTX
A/B Testing Best Practices - Do's and Don'ts
PDF
A/B Testing and the Infinite Monkey Theory
PPTX
[#500Distro] Measuring for Impact: Knowing When, What & How to A/B Test
PDF
Bad Experiments: The #18 Ways You’re A/B Tests are Going Wrong.
PPTX
Innovation and growth with experimentation
 
PDF
Talks@Coursera - A/B Testing @ Internet Scale
PPTX
10 Guidelines for A/B Testing
PPTX
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
PDF
Building a culture of testing like lucid
Fighting the hippo - Get A/B experiments right
Product Experimentation Pitfalls & How to Avoid Them
Product Experimentation Pitfalls & How to Avoid Them
Being a Data-Driven Communicator
Modernize 2018: Driving Innovation Through a Culture of Experimentation, VIP ...
iSG Webinar – AB Testing: The most important thing you’re NOT doing
Analytics Academy 2017 Presentation Slides
Introduction to A/B Testing
A/B testing in Firebase. Intermediate and advanced approach
Fail Well, Pivot Fast: Product Experimentation for Continuous Discovery
When in doubt, go live
A/B Testing Best Practices - Do's and Don'ts
A/B Testing and the Infinite Monkey Theory
[#500Distro] Measuring for Impact: Knowing When, What & How to A/B Test
Bad Experiments: The #18 Ways You’re A/B Tests are Going Wrong.
Innovation and growth with experimentation
 
Talks@Coursera - A/B Testing @ Internet Scale
10 Guidelines for A/B Testing
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
Building a culture of testing like lucid
Ad

Recently uploaded (20)

PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PPTX
Database Infoormation System (DBIS).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
Business Analytics and business intelligence.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Transcultural that can help you someday.
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Managing Community Partner Relationships
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
Database Infoormation System (DBIS).pptx
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Data Science and Data Analysis
Business Analytics and business intelligence.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Transcultural that can help you someday.
SAP 2 completion done . PRESENTATION.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Managing Community Partner Relationships
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Computer network topology notes for revision

A/B Testing: Common Pitfalls and How to Avoid Them

  • 1. February 2019 A/B testing: common pitfalls and how to avoid them Igor Karpov
  • 2. Participation in this meetup is purely on personal basis and not representing any firm in any form or matter. The talk is based on learnings from work across industries and firms.
  • 3. Agenda o About A/B testing o Pitfalls and Solutions o Testing Ideas o Experiment Template B ∀
  • 4. What is A/B test? is an experiment where two or more variants are shown to users at random, and statistical analysis is used to determine which variation performs better. Allows us to understand the causal impact of a change Users Treatment Group Test Metric p2 Control Group Test Metric p1 Is (p1-p2) statistically significant?
  • 5. What is A/B test? Clinton Bush Haiti Fund Source: Siroker et Koomen 2013 Control Treatment 11 % Increase in $ per pageview
  • 6. When to do A/B testing Method Description Inference Stage Prototyping Developing prototypes Guide a direction of the product Ideation User Testing In-depth interviews Understanding of why Ideation – Development Surveys / Feedback Surveys/ Pop-up questionnaires Large numbers, a thorough analysis Development – Post launch A/B Testing Measuring experiments Statistical methods Pre-launch
  • 7. Why A/B testing? 2000 3000 4000 5000 6000 7000 8000 May June July August September October November #of Signups A Feature is deployed
  • 8. Why A/B testing? 2000 3000 4000 5000 6000 7000 8000 May June July August September October November #of Signups A B Feature is deployed With feature Without feature
  • 9. Chocolate consumption and Nobel laureates per capita Source: Franz H. Messerli 2012
  • 10. Pitfall #1 Ignoring statistical significance • Correlation is not causation. • Do your improvements actually affect user behavior or are the changes due to chance? • At the end of the experiment do we just pick a variation that has better metrics?
  • 11. Pitfall #1 Ignoring statistical significance Source: Annie Ward , Mildred Murray-Ward 1999 o Statistical significance is a probability that a change is not due to chance alone
  • 12. Solution to Pitfall #1 Sample size in each group The desired power (e.g. .84 for 80% ). The desired level of statistical significance (e.g. 1.96 for 95 %). A measure of variability Effect size (the difference in proportions) n = 2 ̅% (1 − ̅%)(*+,- + *//1)1 (%+ − %1)1 Source: Altman 1991 Simple formula for difference in proportions
  • 13. Solution to Pitfall #1 alpha beta Baseline Increase Sample size .05 .80 50% ± 5 % 1565 .05 .80 50% ± 10 % 389 .05 .80 50% ± 15 % 171 .05 .80 50% ± 20 % 95 Examples of Sample Sizes Calculated
  • 14. Pitfall #2 Not having a workflow for testing o Choosing non-business related metrics as proxies o Doing a little analysis in understanding the current user behaviour o Not formulating a hypothesis before testing o Testing if green buttons increase conversion rates o Spending precious time and traffic on random ideas
  • 15. Solution to Pitfall #2 What is Success? Plan Hypothesis Funnel Diagnosis Test Measure Results
  • 16. Pitfall #3 Not prioritizing experimentation roadmap o Taking too small risks (local maxima) o Impacting too little users o Running experiments that don’t produce a strategic value o Not estimating designer’s or developer’s workload o Time spent for coordinating experiments
  • 17. Solution to Pitfall #3 Effort High Low High Impact Low Do it! Forget it Reach Uplift Strategic CoordinateTechCreative If resources are available
  • 18. Solution to Pitfall #3 Potential Importance Ease (PIE) Framework by Chris Goward https://widerfunnel.com/pie-framework/ Time, Impact Resources framework by Bryan Eisenberg https://www.bryaneisenberg.com/3-steps-to-better-prioritization-and-faster-execution/ Impact Confidence Ease (ICE) Framework by Sean Ellis https://tech.trello.com/ice-scoring/
  • 19. Testing ideas to start Website content. Do users prefer to scroll down the page or click through to another page to learn more? Headline copy. Do users prefer headlines that are straightforward, abstract, goofy, or creative? Media. Do users prefer to see auto-play or click-to-play video? Funnels. Do users prefer to have more information on one page or the information spread across multiple pages? Social. What social proof users need: brands you work with, testimonials from other users or influencers? Pricing. Do you users prefer free trial vs moey-back guarantee?
  • 20. Experiment template What is Success? e.g. Conversion rate Qualitative What user research insights supports the decision to create an experiment Hypothesis If ___ then ___ due to ___ Proposed Change E.g. Show FAQ page after registering Results What happened? Another experiment? Need to clean-up? Audience Segment e.g. Free Trial users Quantitative What analytics data supports the decision to create an experiment Sample Size Sample Size and Duration
  • 21. Thank You Igor Karpov / mrigorkarpov