SlideShare a Scribd company logo
Company
     D
 LOGO




                 A/B Testing Framework
                     Design Issues

        Patrick McKenzie 2010
          (This presentation is meant to be read. It is released
        under the Creative Commons By Attribution license –
        feel free to spread it or use it.)




                                                                         www.abingo.org
 By Patrick McKenzie 2010. Please use or send to people who'd benefit.
Company
     D
                                 A/B Testing Frameworks
 LOGO



                           •   Why You Should Care
                           •   Core Use Scenarios
                           •   A/B Test Lifecycle
                           •   Design Decisions
                           •   Technical Considerations
                           •   API Considerations




          www.abingo.org
Company
     D
                                   Why You Should Care
 LOGO


                           There is a paucity of A/B testing frameworks.

                             "I can probably name a dozen different systems for
                             building high scale applications (distributed storage,
                             message queues, caching layers, search engines,
                             etc), but I can’t name a single AB testing framework
                             other than Google Website Optimizer. That seems
                             like a serious inversion of priorities for most
                             startups."




                                          http://www.tomkleinpeter.com/2009
                                          /01/21/where-are-the-ab-testing-
                                          frameworks/

          www.abingo.org
Company
     D
                                    Why You Should Care
 LOGO



                           •   A/B testing helps you validate your
                               hypotheses about customers and product.
                           •   A/B testing is drop-dead easy if your tech
                               supports it.
                           •   You won't do it otherwise, because it feels
                               like boring busywork.
                                   The goal is to have split-testing be a continuous part of our
                                   development process, so much so that it is considered a
                                   completely routine part of developing a new feature. In fact,
                                   I've seen this approach work so well that it would be
                                   considered weird and kind of silly for anyone to ship a new
                                   feature without subjecting it to a split-test. That's when this
                                   approach can pay huge dividends.

                                                     Eric Ries in blog post


          www.abingo.org
Company
     D
                                   Why You Should Care
 LOGO



                           •   There are only two decent A/B test
                               frameworks for Rails. Both less than 9
                               months old.
                           •   There are (to best of my knowledge) no
                               OSS frameworks for Java, Python, etc.
                           •   You should write one. V1.0 can be done
                               in 10 man hours in modern MVC
                               frameworks. Will be best ROI you ever
                               get.
                           •   This presentation hopes to save you time
                               by telling you where the hard decisions
                               are.

          www.abingo.org
Company
     D
                                    Three Use Scenarios
 LOGO



                           •   Customers interacting with site.
                           •   Implementers coding A/B test.
                           •   Somebody interpreting results.




          www.abingo.org
Company
     D
                           User View of A/B Test
 LOGO                       (What Cindy Sees)




          www.abingo.org
Company
     D
                           User View of A/B Test
 LOGO                        (What Bob Sees)




          www.abingo.org
Company
     D
                                    Key Points For Users
 LOGO



                           •   Users get consistent behavior. Cindy
                               always sees her alternative. Bob always
                               sees his.
                           •   A/B test doesn't break usage of site.
                               (Sounds obvious, can be non-trivial. Test
                               for interactions!)
                           •   Ending A/B test doesn't break site.


                                   Did you know that in Google Website Optimizer
                                   users can bookmark individual A/B alternatives
                                   because they have distinct URLs? And that after
                                   the test is over they may 404? Yeah. Don't do
                                   that.

          www.abingo.org
Company
     D
                                    What Developers See
 LOGO



                           •   One line to add a test.
                           •   One line to track it.




                           •   No thought required beyond creating
                               alternatives.



          www.abingo.org
Company
     D
                               What Internal Customers See
 LOGO



                           •   Simple, clear, actionable results.
                           •   Stats 101 not required.




                                     Your marketing team might know math.
                                     That doesn't mean they should have to.



          www.abingo.org
Company
     D
                                      A/B Test Lifecycle
 LOGO



                           •   Come up with alternatives.
                           •   Code alternatives.
                           •   Test alternatives.
                           •   Deploy to site.
                           •   Users interact with alternatives.
                           •   Analyze results.
                           •   End test.

                                      When designing your A/B testing framework,
                                      keep in mind that you'll be doing all of the
                                      above. Eliminate as much friction from each
                                      step as possible – this decreases total time
                                      through the loop.

          www.abingo.org
Company
     D
                                 Come up with alternatives.
 LOGO



                           •   Not generally a technical problem.
                           •   Inspiration can come from anywhere – a
                               blog post, a passing fancy, customer
                               comments.
                           •   Should never have to say "We can't do
                               that!"
                           •   Strong recommendation: If we pay your
                               salary, you are authorized to test.

                                   Customers do not think in terms of
                                   Model/View/Controller interfaces. They just want
                                   to know what the app can do. You should be able
                                   to A/B test from any point in the app.

          www.abingo.org
Company
     D
                                      Code Alternatives
 LOGO



                           •   Programming is hard, but you have to do it
                               anyway.
                           •   Programming A/B tests is easy – one liner
                               and if statement.
                           •   Testing framework handles all
                               bookkeeping – programmers never care.
                           •   Re-use conversion code. Typical
                               businesses have lots of tests, few defined
                               conversions. No need to reinvent wheel
                               every single time.



          www.abingo.org
Company
     D
                                       Test Alternatives
 LOGO



                           •   A/B tests are live code. They can have
                               bugs. You should be able to unit test like
                               normal.
                           •   Helpful for developers to have access to
                               quick "switch what test I'm seeing"
                               functionality. Simplest example: manually
                               add parameter to URL
                               (&exampleTest=altA). Turn off feature in
                               production.
                           •   Careful of test interactions. Very easy to
                               do once you start testing behavior in
                               addition to display.

          www.abingo.org
Company
     D
                                          Deploy to site.
 LOGO



                           •   Avoid pointless work here. "Push code
                               live, test starts automatically" is the ideal.
                           •   Testing framework should handle its own
                               setup first time test is called. After that, re-
                               use.
                           •   Note this decision going to be made
                               thousands or hundreds of thousands of
                               times, possibly right after you push live:
                               consider performance implications.
                           •   Can make code default to old version,
                               control start/stop of test via dashboard.
                               Could be worth it, adds complexity.

          www.abingo.org
Company
     D
                               Users interact with alternatives.
 LOGO



                           •   Happily, this takes very little work for you...
                           •   … except when it creates Heisenbugs.
                           •   In addition to thorough testing, make sure
                               your "What The User Is Seeing" feature
                               (you have one, right?) reflects their A/B
                               tests.




          www.abingo.org
Company
     D
                                        Analyze results.
 LOGO



                           •   Stats behind A/B tests may not be well
                               understood. Impress that stats are real,
                               measured, and actionable. It doesn't
                               matter if they think it is magic as long as
                               they trust the magic.
                           •   Do significance testing so it isn't magic.
                           •   Doing significance testing is grunt work: let
                               the computer do it.
                           •   Spend the extra time to make internal
                               dashboard pretty. People trust pretty
                               things.
                           •   A/B tests not a good place to dig for data.
                               One glance tells you all you need.
          www.abingo.org
Company
     D
                                                End test
 LOGO


                           •   Simple solution: rip code out, test stops.
                           •   Simple solution requires redeploy. In event of bug
                               or strong test result ("Oh my God what were we
                               thinking!?!") might want immediate end button on
                               dashboard. Be able to specify alternative.
                           •   Automatic end of test? Probably a misfeature, but
                               easy to implement.
                           •   Ending test should switch all users to winner (or
                               else you get to support old tests until doomsday).
                               However, users have memories.
                           •   Negatively affected users (e.g. you end test in favor
                               of higher price, user planning on buying later saw
                               lower price) may be mad. Not big problem, but be
                               ready.

          www.abingo.org
Company
     D
                                   Design Considerations
 LOGO



                           •   Tracking and managing identity.
                           •   How to choose alternatives by identity.
                           •   Where to store test participation.
                           •   Where to store alternatives.
                           •   Stats is hard, let's go shopping.
                           •   Presenting results.




          www.abingo.org
Company
     D
                                       Tracking Identity
 LOGO



                           •   Cindy is Cindy, Bob is Bob, Cindy should
                               always see Cindy's tests.
                           •   Cindy is not a cookie. Cindy is not a
                               session. Cindy is freaking Cindy. Even
                               when she is on different computer.
                           •   You already have identity via user
                               authentication. Probably want to punt
                               identity problem there. Have it inform
                               framework of current user identity.
                           •   Important edge case: new user signup
                               should persist “identity” from anonymous
                               visitor to identifiable user.

          www.abingo.org
Company
     D
                                       Tracking Identity
 LOGO



                           •   Easiest identity is random number thrown
                               into cookie. Associate with user accounts.
                               Restore on login. Bam, done.
                           •   However, you will occasionally have A/B
                               test conversions outside of Cindy's HTTP
                               cycle. (e.g. Purchase notification comes
                               from Paypal, not from Cindy. Cindy calls
                               up to place order.) Think it through – not
                               terribly difficult if you plan for it.




          www.abingo.org
Company
     D
                                How To Choose Alternatives
 LOGO



                           •   If you have N alternatives, picking
                               randomly and persisting it by identity works
                               decently.
                           •   Another approach: MD5(identity) %
                               number_of_alts. Saves space
                               (marginally).
                           •   Don't need to save what test Cindy is
                               seeing as long as you can reproduce it.




          www.abingo.org
Company
     D
                                How To Choose Alternatives
 LOGO



                           •   If you have N alternatives, picking
                               randomly and persisting it by identity works
                               decently.
                           •   Another approach: MD5(identity) %
                               number_of_alts. Saves space
                               (marginally).
                           •   Don't need to save what test Cindy is
                               seeing as long as you can reproduce it.




          www.abingo.org
Company
     D
                               Where to store test participation
 LOGO



                           •   Cookie/session bad idea: Cindy will log in
                               at work tomorrow. She should see
                               consistent behavior.
                           •   Cache (memcached) possible, but if Cindy
                               is evicted from cache or cache resets,
                               tough for Cindy and tough for you.
                           •   Persistent data store best bet. Will talk
                               about specific data stores later in slides.




          www.abingo.org
Company
     D
                                 Where to store alternatives
 LOGO



                           •   Many approaches. Whatever works for
                               you.
                           •   A/Bingo puts alternatives directly in code.
                               Easiest place, always right in front of
                               developer, no conceptual overhead.
                           •   Vanity puts alternatives in special
                               experiment files. Arguably cleaner code,
                               but have to context/switch.
                           •   Google Website Optimizer has you define
                               alternatives on a web form. Great for
                               marketing department at insurance
                               company. Don't do this. Greatly limits
                               possibilities, increases integration work,
          www.abingo.org       blows testing to heck and back.
Company
     D
                                           Doing Stats
 LOGO



                           •   If possible, call out to dedicated stats
                               modules/libraries to do stats.
                           •   Many types of possible stats for A/B
                               testing. Pick one, stick with it. I use Z-
                               scores because a) I remember them and
                               b) implementation was drop-dead easy.
                           •   Sadly, Ruby lacks many good stats
                               libraries. Oh, to be a Perl programmer...
                           •   This subject worth its own presentation.
                               See Ben Tilly.
                               http://elem.com/~btilly/effective-ab-testing/


          www.abingo.org
Company
     D
                                      Presenting Results
 LOGO



                           •   Text is easy! Graphs not quite.
                           •   Google's confidence bars are sexy... and
                               pretty useless.
                           •   Simple, human language to describe what
                               confidence intervals and statistical
                               significance mean.
                           •   De-emphasize null results (A > B but not
                               statistically significantly so) but don't hide
                               them. (After all, the fact that "this test was
                               too close to call" tells you something
                               useful.)


          www.abingo.org
Company
     D
                                  Technical Considerations
 LOGO



                           •   Less than 1,000 visitors per hour? Skip
                               these slides.
                           •   A/B testing turns performance
                               assumptions on head: heavy writes in very
                               bursty fashion ("as soon as test goes
                               live"), very non-relational data, fairly
                               infrequent reads (~3X writes on my site),
                               extraordinarily infrequent use of summary
                               statistics.
                           •   Practically tailor-made for key/value store,
                               not so much for SQL.


          www.abingo.org
Company
     D
                           Queries You Have To Answer FAST
 LOGO



                           •   Who is Cindy? (user → identity)
                           •   Is Cindy participating in Test X?
                           •   If so, what alternative has she seen?
                           •   If not, what alternative should she see?
                           •   Record fact that Cindy is participating in
                               Test X.
                           •   Has Cindy converted in Test X?
                           •   Record fact that Cindy converted for Test
                               X.



          www.abingo.org
Company
     D
                           Queries You Can Answer Leisurely
 LOGO



                           •   How many people have participated in
                               Experiment X?
                           •   How many saw Alternative A?
                           •   Umm, do that stats magic for me.




          www.abingo.org
Company
     D
                                Query You Will NEVER ASK
 LOGO



                           •   Who saw Alternative A in Experiment X?




          www.abingo.org
Company
     D
                                   Possible Architectures
 LOGO



                           •   Summary statistics (participant counts &
                               conversion counts) in MySQL table with
                               "fairly few" rows. Simple increment
                               statements for updates.
                           •   Participation information (Cindy,
                               Experiment X, Alternative A) in key/value
                               store.
                           •   Or whole thing in key/value store.




          www.abingo.org
Company
     D
                           Quick Speed Improvement for SQL
 LOGO



                           •   Give each of your alternatives a unique
                               string ID like MD5(experiment name,
                               alternative name). Calculate that in
                               application code. Index on column.
                           •   UPDATE alternatives SET participants =
                               participants + 1 where lookup_code =
                               'CALCULATED IN APPLICATION';
                           •   This avoids having to translate human
                               name in code to ID in table. (Or having to
                               use multi-column index for lookup.)
                           •   Note: I am not a very good guy with DBs,
                               but I am informed this is fairly fast. Test
                               for yourself.
          www.abingo.org
Company
     D
                                  Specific Key/Value Store
 LOGO                               Recommendations

                           •   MySQL with big string columns for key,
                               value: ewwwwww. I mean, ewwwwww.
                           •   Memcachd: Acceptable (and fast) but not
                               persistent. Also tends to only go down
                               when server does. For A/B testing, might
                               just re-run all in progress tests if it dies.
                           •   MemcacheDB: Tried it. Has unacceptable
                               performance when BerkeleyDB flushes to
                               disk. (5 seconds+!)
                           •   Redis: Tried it. Not in production yet. My
                               recommendation – very fast. Vanity also
                               uses it.

          www.abingo.org
Company
     D
                                      API Considerations
 LOGO


                           Only need to expose two methods:

                              •   ab_test(name, alternatives, conversion_name)
                              •   conversion(conversion_name)


                           Note lack of identity in method calls. Let the
                           framework worry about that.

                           How you specify alternatives up to you.
                           Array of strings is easy to understand.


          www.abingo.org
Company
     D
                                          Consuming API
 LOGO


                           ab_test(name, alternatives, conversion_name) returns
                           the chosen alternative, handles all bookkeeping as
                           side effect.

                           Typically:

                           if (ab_test(...) == "something") {
                              #do something
                           } else {
                             #do something else
                           }

                           Fun opportunity for blocks/binding if your language
                           supports that.
          www.abingo.org
Company
     D
                                        Got Questions?
 LOGO


                           Great A/B testing resources:
                           • Eric Ries (startuplessonslearned.com) – heavy on
                              motivation, less on stats/design decisions
                           • #abtests and @abtests on Twitter. Good
                              community, many ideas for inspiration.
                           • http://abtests.com – ditto
                           • http://www.bingocardcreator.com/abingo/resources
                              – links I use when I forget the math.
                           • http://www.kalzumeus.com – my blog
                           • patrick@bingocardcreator.com –
                           I'm always happy to chat about A/B testing, with
                           anybody. Potentially available for consulting.



          www.abingo.org

More Related Content

PPTX
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
PPTX
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
PDF
Talks@Coursera - A/B Testing @ Internet Scale
PPTX
Basics of AB testing in online products
PDF
The Power of A/B Testing
PDF
4 Steps Toward Scientific A/B Testing
PDF
A/B testing at Spotify
PDF
Experimentation Platform at Netflix
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Talks@Coursera - A/B Testing @ Internet Scale
Basics of AB testing in online products
The Power of A/B Testing
4 Steps Toward Scientific A/B Testing
A/B testing at Spotify
Experimentation Platform at Netflix

What's hot (20)

PDF
SXSW 2016 - Everything you think about A/B testing is wrong
PDF
Lean Analytics @ MicroConf
PDF
Hooked: How to Build Habit-Forming Products with Nir Eyal
PDF
[500DISTRO] The Scientific Method: How to Design & Track Viral Growth Experim...
PPTX
Why everything is an A/B Test at Pinterest
PDF
Unit of Value: A Framework for Scaling
PDF
Workshop : Innovation Games at NSSpain
PDF
From Idea to Execution: Spotify's Discover Weekly
DOCX
Ab testing
PPTX
UX Strategy Blueprint
PDF
Mailchimp: Scaling Experimentation Across Teams
PPTX
Product-led growth
PDF
The proven pitch deck template
PDF
How Lazada ranks products to improve customer experience and conversion
PPTX
Website performance optimization
PDF
Go-to-Market Best Practices for Startups
PDF
Wild Slides: 20 tips to improve your PowerPoint presentations
PDF
Lean Product Discovery
PPTX
A/B Testing Pitfalls and Lessons Learned at Spotify
SXSW 2016 - Everything you think about A/B testing is wrong
Lean Analytics @ MicroConf
Hooked: How to Build Habit-Forming Products with Nir Eyal
[500DISTRO] The Scientific Method: How to Design & Track Viral Growth Experim...
Why everything is an A/B Test at Pinterest
Unit of Value: A Framework for Scaling
Workshop : Innovation Games at NSSpain
From Idea to Execution: Spotify's Discover Weekly
Ab testing
UX Strategy Blueprint
Mailchimp: Scaling Experimentation Across Teams
Product-led growth
The proven pitch deck template
How Lazada ranks products to improve customer experience and conversion
Website performance optimization
Go-to-Market Best Practices for Startups
Wild Slides: 20 tips to improve your PowerPoint presentations
Lean Product Discovery
A/B Testing Pitfalls and Lessons Learned at Spotify
Ad

Similar to A/B Testing Framework Design (20)

PDF
How To Fit Testing Into The Iteration
PPTX
DevOpsDays Jakarta Igites
PDF
Abb presentation uklug
PDF
Hardening
PDF
QA Role in Agile Teams
PPTX
How to establish ways of working that allows shifting-left of the automation ...
PDF
XebiaLabs & codecentric Webinar: Deploy Higher Quality Applications Faster (G...
PPT
SOASTA Webinar: Process Compression For Mobile App Dev 120612
PDF
Rob Sabourin: On Testing
PPT
Practical ideas for getting the most out of your working environment
PDF
[Webinar] Introducing Feature Management
PDF
Lean Quality & Engineering
PDF
Don’t Go over the Waterfall: Keep Agile Testing Agile
PDF
Don't hate, automate. lessons learned from implementing continuous delivery
PDF
Automated Exploratory Tests
PPTX
Topic production code
PDF
Expo qa15 Keynote
PPTX
Scaling agile
PPTX
How to Master UX Testing in an Agile Design Process
PDF
Winning strategies in Test Automation
How To Fit Testing Into The Iteration
DevOpsDays Jakarta Igites
Abb presentation uklug
Hardening
QA Role in Agile Teams
How to establish ways of working that allows shifting-left of the automation ...
XebiaLabs & codecentric Webinar: Deploy Higher Quality Applications Faster (G...
SOASTA Webinar: Process Compression For Mobile App Dev 120612
Rob Sabourin: On Testing
Practical ideas for getting the most out of your working environment
[Webinar] Introducing Feature Management
Lean Quality & Engineering
Don’t Go over the Waterfall: Keep Agile Testing Agile
Don't hate, automate. lessons learned from implementing continuous delivery
Automated Exploratory Tests
Topic production code
Expo qa15 Keynote
Scaling agile
How to Master UX Testing in an Agile Design Process
Winning strategies in Test Automation
Ad

More from Patrick McKenzie (10)

PPTX
Conversion Optimization in Practice: BaconBiz 2013
PPTX
Patrick McKenzie Opticon 2014: Advanced A/B Testing
PPTX
Microconf Europe 2013 -- Patrick McKenzie
PPTX
Selling Your Twilio-powered Apps to Businesses
PPTX
Building Stuff To Help You Sell The Stuff You Build
PPTX
Productizing Twilio Applications
PPTX
Software Businesses On 5 Hours A Week
PPTX
Software For Underserved Markets
PPTX
SEO for Software Companies
ODP
Data-Driven Software Design
Conversion Optimization in Practice: BaconBiz 2013
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Microconf Europe 2013 -- Patrick McKenzie
Selling Your Twilio-powered Apps to Businesses
Building Stuff To Help You Sell The Stuff You Build
Productizing Twilio Applications
Software Businesses On 5 Hours A Week
Software For Underserved Markets
SEO for Software Companies
Data-Driven Software Design

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Chapter 3 Spatial Domain Image Processing.pdf
Spectroscopy.pptx food analysis technology
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”

A/B Testing Framework Design

  • 1. Company D LOGO A/B Testing Framework Design Issues Patrick McKenzie 2010 (This presentation is meant to be read. It is released under the Creative Commons By Attribution license – feel free to spread it or use it.) www.abingo.org By Patrick McKenzie 2010. Please use or send to people who'd benefit.
  • 2. Company D A/B Testing Frameworks LOGO • Why You Should Care • Core Use Scenarios • A/B Test Lifecycle • Design Decisions • Technical Considerations • API Considerations www.abingo.org
  • 3. Company D Why You Should Care LOGO There is a paucity of A/B testing frameworks. "I can probably name a dozen different systems for building high scale applications (distributed storage, message queues, caching layers, search engines, etc), but I can’t name a single AB testing framework other than Google Website Optimizer. That seems like a serious inversion of priorities for most startups." http://www.tomkleinpeter.com/2009 /01/21/where-are-the-ab-testing- frameworks/ www.abingo.org
  • 4. Company D Why You Should Care LOGO • A/B testing helps you validate your hypotheses about customers and product. • A/B testing is drop-dead easy if your tech supports it. • You won't do it otherwise, because it feels like boring busywork. The goal is to have split-testing be a continuous part of our development process, so much so that it is considered a completely routine part of developing a new feature. In fact, I've seen this approach work so well that it would be considered weird and kind of silly for anyone to ship a new feature without subjecting it to a split-test. That's when this approach can pay huge dividends. Eric Ries in blog post www.abingo.org
  • 5. Company D Why You Should Care LOGO • There are only two decent A/B test frameworks for Rails. Both less than 9 months old. • There are (to best of my knowledge) no OSS frameworks for Java, Python, etc. • You should write one. V1.0 can be done in 10 man hours in modern MVC frameworks. Will be best ROI you ever get. • This presentation hopes to save you time by telling you where the hard decisions are. www.abingo.org
  • 6. Company D Three Use Scenarios LOGO • Customers interacting with site. • Implementers coding A/B test. • Somebody interpreting results. www.abingo.org
  • 7. Company D User View of A/B Test LOGO (What Cindy Sees) www.abingo.org
  • 8. Company D User View of A/B Test LOGO (What Bob Sees) www.abingo.org
  • 9. Company D Key Points For Users LOGO • Users get consistent behavior. Cindy always sees her alternative. Bob always sees his. • A/B test doesn't break usage of site. (Sounds obvious, can be non-trivial. Test for interactions!) • Ending A/B test doesn't break site. Did you know that in Google Website Optimizer users can bookmark individual A/B alternatives because they have distinct URLs? And that after the test is over they may 404? Yeah. Don't do that. www.abingo.org
  • 10. Company D What Developers See LOGO • One line to add a test. • One line to track it. • No thought required beyond creating alternatives. www.abingo.org
  • 11. Company D What Internal Customers See LOGO • Simple, clear, actionable results. • Stats 101 not required. Your marketing team might know math. That doesn't mean they should have to. www.abingo.org
  • 12. Company D A/B Test Lifecycle LOGO • Come up with alternatives. • Code alternatives. • Test alternatives. • Deploy to site. • Users interact with alternatives. • Analyze results. • End test. When designing your A/B testing framework, keep in mind that you'll be doing all of the above. Eliminate as much friction from each step as possible – this decreases total time through the loop. www.abingo.org
  • 13. Company D Come up with alternatives. LOGO • Not generally a technical problem. • Inspiration can come from anywhere – a blog post, a passing fancy, customer comments. • Should never have to say "We can't do that!" • Strong recommendation: If we pay your salary, you are authorized to test. Customers do not think in terms of Model/View/Controller interfaces. They just want to know what the app can do. You should be able to A/B test from any point in the app. www.abingo.org
  • 14. Company D Code Alternatives LOGO • Programming is hard, but you have to do it anyway. • Programming A/B tests is easy – one liner and if statement. • Testing framework handles all bookkeeping – programmers never care. • Re-use conversion code. Typical businesses have lots of tests, few defined conversions. No need to reinvent wheel every single time. www.abingo.org
  • 15. Company D Test Alternatives LOGO • A/B tests are live code. They can have bugs. You should be able to unit test like normal. • Helpful for developers to have access to quick "switch what test I'm seeing" functionality. Simplest example: manually add parameter to URL (&exampleTest=altA). Turn off feature in production. • Careful of test interactions. Very easy to do once you start testing behavior in addition to display. www.abingo.org
  • 16. Company D Deploy to site. LOGO • Avoid pointless work here. "Push code live, test starts automatically" is the ideal. • Testing framework should handle its own setup first time test is called. After that, re- use. • Note this decision going to be made thousands or hundreds of thousands of times, possibly right after you push live: consider performance implications. • Can make code default to old version, control start/stop of test via dashboard. Could be worth it, adds complexity. www.abingo.org
  • 17. Company D Users interact with alternatives. LOGO • Happily, this takes very little work for you... • … except when it creates Heisenbugs. • In addition to thorough testing, make sure your "What The User Is Seeing" feature (you have one, right?) reflects their A/B tests. www.abingo.org
  • 18. Company D Analyze results. LOGO • Stats behind A/B tests may not be well understood. Impress that stats are real, measured, and actionable. It doesn't matter if they think it is magic as long as they trust the magic. • Do significance testing so it isn't magic. • Doing significance testing is grunt work: let the computer do it. • Spend the extra time to make internal dashboard pretty. People trust pretty things. • A/B tests not a good place to dig for data. One glance tells you all you need. www.abingo.org
  • 19. Company D End test LOGO • Simple solution: rip code out, test stops. • Simple solution requires redeploy. In event of bug or strong test result ("Oh my God what were we thinking!?!") might want immediate end button on dashboard. Be able to specify alternative. • Automatic end of test? Probably a misfeature, but easy to implement. • Ending test should switch all users to winner (or else you get to support old tests until doomsday). However, users have memories. • Negatively affected users (e.g. you end test in favor of higher price, user planning on buying later saw lower price) may be mad. Not big problem, but be ready. www.abingo.org
  • 20. Company D Design Considerations LOGO • Tracking and managing identity. • How to choose alternatives by identity. • Where to store test participation. • Where to store alternatives. • Stats is hard, let's go shopping. • Presenting results. www.abingo.org
  • 21. Company D Tracking Identity LOGO • Cindy is Cindy, Bob is Bob, Cindy should always see Cindy's tests. • Cindy is not a cookie. Cindy is not a session. Cindy is freaking Cindy. Even when she is on different computer. • You already have identity via user authentication. Probably want to punt identity problem there. Have it inform framework of current user identity. • Important edge case: new user signup should persist “identity” from anonymous visitor to identifiable user. www.abingo.org
  • 22. Company D Tracking Identity LOGO • Easiest identity is random number thrown into cookie. Associate with user accounts. Restore on login. Bam, done. • However, you will occasionally have A/B test conversions outside of Cindy's HTTP cycle. (e.g. Purchase notification comes from Paypal, not from Cindy. Cindy calls up to place order.) Think it through – not terribly difficult if you plan for it. www.abingo.org
  • 23. Company D How To Choose Alternatives LOGO • If you have N alternatives, picking randomly and persisting it by identity works decently. • Another approach: MD5(identity) % number_of_alts. Saves space (marginally). • Don't need to save what test Cindy is seeing as long as you can reproduce it. www.abingo.org
  • 24. Company D How To Choose Alternatives LOGO • If you have N alternatives, picking randomly and persisting it by identity works decently. • Another approach: MD5(identity) % number_of_alts. Saves space (marginally). • Don't need to save what test Cindy is seeing as long as you can reproduce it. www.abingo.org
  • 25. Company D Where to store test participation LOGO • Cookie/session bad idea: Cindy will log in at work tomorrow. She should see consistent behavior. • Cache (memcached) possible, but if Cindy is evicted from cache or cache resets, tough for Cindy and tough for you. • Persistent data store best bet. Will talk about specific data stores later in slides. www.abingo.org
  • 26. Company D Where to store alternatives LOGO • Many approaches. Whatever works for you. • A/Bingo puts alternatives directly in code. Easiest place, always right in front of developer, no conceptual overhead. • Vanity puts alternatives in special experiment files. Arguably cleaner code, but have to context/switch. • Google Website Optimizer has you define alternatives on a web form. Great for marketing department at insurance company. Don't do this. Greatly limits possibilities, increases integration work, www.abingo.org blows testing to heck and back.
  • 27. Company D Doing Stats LOGO • If possible, call out to dedicated stats modules/libraries to do stats. • Many types of possible stats for A/B testing. Pick one, stick with it. I use Z- scores because a) I remember them and b) implementation was drop-dead easy. • Sadly, Ruby lacks many good stats libraries. Oh, to be a Perl programmer... • This subject worth its own presentation. See Ben Tilly. http://elem.com/~btilly/effective-ab-testing/ www.abingo.org
  • 28. Company D Presenting Results LOGO • Text is easy! Graphs not quite. • Google's confidence bars are sexy... and pretty useless. • Simple, human language to describe what confidence intervals and statistical significance mean. • De-emphasize null results (A > B but not statistically significantly so) but don't hide them. (After all, the fact that "this test was too close to call" tells you something useful.) www.abingo.org
  • 29. Company D Technical Considerations LOGO • Less than 1,000 visitors per hour? Skip these slides. • A/B testing turns performance assumptions on head: heavy writes in very bursty fashion ("as soon as test goes live"), very non-relational data, fairly infrequent reads (~3X writes on my site), extraordinarily infrequent use of summary statistics. • Practically tailor-made for key/value store, not so much for SQL. www.abingo.org
  • 30. Company D Queries You Have To Answer FAST LOGO • Who is Cindy? (user → identity) • Is Cindy participating in Test X? • If so, what alternative has she seen? • If not, what alternative should she see? • Record fact that Cindy is participating in Test X. • Has Cindy converted in Test X? • Record fact that Cindy converted for Test X. www.abingo.org
  • 31. Company D Queries You Can Answer Leisurely LOGO • How many people have participated in Experiment X? • How many saw Alternative A? • Umm, do that stats magic for me. www.abingo.org
  • 32. Company D Query You Will NEVER ASK LOGO • Who saw Alternative A in Experiment X? www.abingo.org
  • 33. Company D Possible Architectures LOGO • Summary statistics (participant counts & conversion counts) in MySQL table with "fairly few" rows. Simple increment statements for updates. • Participation information (Cindy, Experiment X, Alternative A) in key/value store. • Or whole thing in key/value store. www.abingo.org
  • 34. Company D Quick Speed Improvement for SQL LOGO • Give each of your alternatives a unique string ID like MD5(experiment name, alternative name). Calculate that in application code. Index on column. • UPDATE alternatives SET participants = participants + 1 where lookup_code = 'CALCULATED IN APPLICATION'; • This avoids having to translate human name in code to ID in table. (Or having to use multi-column index for lookup.) • Note: I am not a very good guy with DBs, but I am informed this is fairly fast. Test for yourself. www.abingo.org
  • 35. Company D Specific Key/Value Store LOGO Recommendations • MySQL with big string columns for key, value: ewwwwww. I mean, ewwwwww. • Memcachd: Acceptable (and fast) but not persistent. Also tends to only go down when server does. For A/B testing, might just re-run all in progress tests if it dies. • MemcacheDB: Tried it. Has unacceptable performance when BerkeleyDB flushes to disk. (5 seconds+!) • Redis: Tried it. Not in production yet. My recommendation – very fast. Vanity also uses it. www.abingo.org
  • 36. Company D API Considerations LOGO Only need to expose two methods: • ab_test(name, alternatives, conversion_name) • conversion(conversion_name) Note lack of identity in method calls. Let the framework worry about that. How you specify alternatives up to you. Array of strings is easy to understand. www.abingo.org
  • 37. Company D Consuming API LOGO ab_test(name, alternatives, conversion_name) returns the chosen alternative, handles all bookkeeping as side effect. Typically: if (ab_test(...) == "something") { #do something } else { #do something else } Fun opportunity for blocks/binding if your language supports that. www.abingo.org
  • 38. Company D Got Questions? LOGO Great A/B testing resources: • Eric Ries (startuplessonslearned.com) – heavy on motivation, less on stats/design decisions • #abtests and @abtests on Twitter. Good community, many ideas for inspiration. • http://abtests.com – ditto • http://www.bingocardcreator.com/abingo/resources – links I use when I forget the math. • http://www.kalzumeus.com – my blog • patrick@bingocardcreator.com – I'm always happy to chat about A/B testing, with anybody. Potentially available for consulting. www.abingo.org