A/B Testing: Common Pitfalls and How to Avoid Them

February 2019
A/B testing:
common pitfalls
and how to avoid
them
Igor Karpov

Participation in this meetup is purely on personal basis and not representing any firm in any
form or matter. The talk is based on learnings from work across industries and firms.

Agenda
o About A/B testing
o Pitfalls and Solutions
o Testing Ideas
o Experiment Template
B
∀

What is A/B test?
is an experiment where
two or more variants are
shown to users at
random, and statistical
analysis is used to
determine which variation
performs better.
Allows us to understand
the causal impact of a
change
Users
Treatment
Group
Test Metric p2
Control Group
Test Metric p1
Is (p1-p2) statistically
significant?

What is A/B test?
Clinton Bush Haiti Fund
Source: Siroker et Koomen 2013
Control Treatment
11 % Increase in $
per pageview

When to do A/B testing
Method Description Inference Stage
Prototyping Developing prototypes
Guide a direction of the
product
Ideation
User Testing In-depth interviews Understanding of why Ideation – Development
Surveys / Feedback
Surveys/ Pop-up
questionnaires
Large numbers, a
thorough analysis
Development – Post
launch
A/B Testing Measuring experiments Statistical methods Pre-launch

Why A/B testing?
2000
3000
4000
5000
6000
7000
8000
May June July August September October November
#of Signups
A
Feature is deployed

Why A/B testing?
2000
3000
4000
5000
6000
7000
8000
May June July August September October November
#of Signups
A B
Feature is deployed
With feature
Without feature

Chocolate consumption and Nobel laureates per capita
Source: Franz H. Messerli 2012

Pitfall #1 Ignoring statistical significance
• Correlation is not causation.
• Do your improvements actually affect user behavior or are
the changes due to chance?
• At the end of the experiment do we just pick a variation
that has better metrics?

Pitfall #1 Ignoring statistical significance
Source: Annie Ward , Mildred Murray-Ward 1999
o Statistical significance is a probability that a change is not
due to chance alone

Solution to Pitfall #1
Sample size in
each group
The desired
power (e.g.
.84 for 80% ).
The desired
level of
statistical
significance
(e.g. 1.96 for
95 %).
A measure of
variability
Effect size
(the
difference in
proportions)
n =
2 ̅% (1 − ̅%)(*+,- + *//1)1
(%+ − %1)1
Source: Altman 1991
Simple formula for difference in proportions

alpha beta Baseline Increase Sample size
.05 .80 50% ± 5 % 1565
.05 .80 50% ± 10 % 389
.05 .80 50% ± 15 % 171
.05 .80 50% ± 20 % 95
Examples of Sample Sizes Calculated

Pitfall #2 Not having a workflow for testing
o Choosing non-business related metrics as proxies
o Doing a little analysis in understanding the current user behaviour
o Not formulating a hypothesis before testing
o Testing if green buttons increase conversion rates
o Spending precious time and traffic on random ideas

What is
Success?
Plan
Hypothesis
Funnel
Diagnosis
Test
Measure
Results

Pitfall #3 Not prioritizing experimentation roadmap
o Taking too small risks (local maxima)
o Impacting too little users
o Running experiments that don’t produce a strategic value
o Not estimating designer’s or developer’s workload
o Time spent for coordinating experiments

Effort High
Low
High
Impact
Low
Do it!
Forget it
Reach
Uplift
Strategic
CoordinateTechCreative
If resources
are available

Potential Importance Ease (PIE) Framework by Chris Goward
https://widerfunnel.com/pie-framework/
Time, Impact Resources framework by Bryan Eisenberg
https://www.bryaneisenberg.com/3-steps-to-better-prioritization-and-faster-execution/
Impact Confidence Ease (ICE) Framework by Sean Ellis
https://tech.trello.com/ice-scoring/

Testing ideas to start
Website content. Do users prefer to scroll down the page or click through to
another page to learn more?
Headline copy. Do users prefer headlines that are straightforward, abstract,
goofy, or creative?
Media. Do users prefer to see auto-play or click-to-play video?
Funnels. Do users prefer to have more information on one page or the
information spread across multiple pages?
Social. What social proof users need: brands you work with, testimonials from
other users or influencers?
Pricing. Do you users prefer free trial vs moey-back guarantee?

Experiment template
What is Success?
e.g. Conversion rate
Qualitative
What user research insights supports the
decision to create an experiment
Hypothesis
If ___ then ___ due to ___
Proposed Change
E.g. Show FAQ page after registering
Results
What happened?
Another experiment?
Need to clean-up?
Audience Segment
e.g. Free Trial users
Quantitative
What analytics data supports the decision to
create an experiment
Sample Size
Sample Size and Duration

Thank You
Igor Karpov
/ mrigorkarpov

A/B Testing: Common Pitfalls and How to Avoid Them

More Related Content

What's hot (16)

Similar to A/B Testing: Common Pitfalls and How to Avoid Them (20)

Recently uploaded (20)

A/B Testing: Common Pitfalls and How to Avoid Them