SlideShare a Scribd company logo
Bandit Basics – A Different
take on Online Optimization




            Conductrics twitter: @mgershoff
Who is this guy?

Matt Gershoff
CEO: Conductrics
Many Years in Database Marketing (New York
and Paris)
and a bit of Web Analytics

www.conductrics.com
twitter:@mgershoff
Email:matt@conductrics.com
Speak Up




Conductrics twitter: @mgershoff
What Are We Going to Hear?
• Optimization Basics
• Multi-Armed Bandit
  • Its a Problem, Not a Method
  • Some Methods
    • AB Testing
    • Epsilon Greedy
    • Upper Confidence Interval (UCB)
• Some Results
Choices                         Targeting




Learning                         Optimization




           Conductrics twitter: @mgershoff
OPTIMIZATION


If THIS Then THAT
ITTT brings together:
1.Decision Rules
2.Predictive Analytics
3.Choice Optimization
             Conductrics twitter: @mgershoff
OPTIMIZATION
Find and Apply the Rule with the most Value

  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT
  If THIS Then THAT             If THIS Then THAT



                 Conductrics twitter: @mgershoff
OPTIMIZATION
Variables whose Values                      Variables whose Values
   Are Given to You                              You Control
        THIS                                                    THAT

If    Facebook
      High Spend
      Urban GEO
      .
                   Then                                        Offer A
                                                               Offer B
                                                               Offer C
                                                               .
      .                 Predictive Model                       .
      .                                                        .
      .            F1
                                                               .
      .
                   F2
                                             S        Valuei   .
      Home Page    Fm
                                                               Offer Y
      App Use                                                  Offer Z

       Inputs           Conductrics twitter: @mgershoff
                                                               Outputs
But …
1. We Don’t Have Data on ‘THAT’
2. Need to Collect – Sample     THAT
3. How to Sample Efficiently?   Offer A ?
                                                   Offer B ?
                                                   Offer C ?
                                                   .
                                                   .
                                                   .
                                                   .
                                                   .
                                                   Offer Y ?
                                                   Offer Z ?


                 Conductrics twitter: @mgershoff
Where
Marketing Applications:
• Websites
• Mobile
• Social Media Campaigns
• Banner Ads
Pharma: Clinical Trials
          Conductrics twitter: @mgershoff
What is a Multi Armed Bandit
One Armed Bandit –>Slot Machine
The problem:
How to pick between Slot Machines so that
you walk out with most $$$ from Casino at the
end of the Night?


                                  OR



               A                             B
                   Conductrics twitter: @mgershoff
Objective

Pick so as to get the most
return/profit as you can
over time
Technical term: Minimize Regret


             Conductrics twitter: @mgershoff
Sequential Selection
… but how to Pick?                              OR


                                            A        B




Need to Sample, but do it efficiently


              Conductrics twitter: @mgershoff
Explore – Collect Data

                              OR

                    A                B


• Data Collection is costly – an Investment
• Be Efficient – Balance the potential value of
  collecting new data with exploiting what
  you currently know.

                 Conductrics twitter: @mgershoff
Multi-Armed Bandits

“Bandit problems embody in essential
form a conflict evident in all human
action: choosing actions which yield
immediate reward vs. choosing actions
… whose benefit will come only later.”*
- Peter Whittle

 *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010.

                           Conductrics twitter: @mgershoff
Exploration Exploitation


1) Explore/Learn – Try out different actions
to learn how they perform over time – This is
a data collection task.

2) Exploit/Earn – Take advantage of what
you have learned to get highest payoff –
Your current best guess
                Conductrics twitter: @mgershoff
Not A New Problem
1933 – first work on competing
 options
1940 – WWII Problem Allies
 attempt to tackle
1953 – Bellman formulates as a
 Dynamic Programing problem
Source: http://www.lancs.ac.uk/~winterh/GRhist.html

                          Conductrics twitter: @mgershoff
Testing

• Explore First
  – All actions have an equal chance of
    selection (uniform random).
  – Use hypothesis testing to select a
    ‘Winner’.
• Then Exploit - Keep only ‘Winner’
  for selection

               Conductrics twitter: @mgershoff
Learn First
Data Collection/Sample                      Apply Leaning



    Explore/                                  Exploit/
     Learn                                     Earn

                        Time


                 Conductrics twitter: @mgershoff
P-Values: A Digression

P-Values:
• NOT the probability that the Null is
  True. P( Null=True| DATA)
• P(DATA (or more extreme)| Null=True)
• Not a great tool for deciding when to
  stop sampling
See:
http://andrewgelman.com/2010/09/noooooooooooooo_1/
http://www.stat.duke.edu/~berger/papers/02-01.pdf
                       Conductrics twitter: @mgershoff
A Couple Other Methods

1. Epsilon Greedy
  Nice and Simple

2. Upper Confidence Bounds(UCB)
  Adapts to Uncertainty


           Conductrics twitter: @mgershoff
1) Epsilon-Greedy




    Conductrics twitter: @mgershoff
Greedy

What do you mean by ‘Greedy’?

Make whatever choice seems
best at the moment.
Epsilon Greedy

What do you mean by ‘Epsilon
Greedy’?
• Explore – randomly select action
   percent of the time (say 20%)

• Exploit – Play greedy (pick the
  current best) 1-  (say 80%)
Epsilon Greedy
                          User



Explore/Learn                                          Exploit/Earn
(20%)                                                  (80%)


        Select                                     Select Current
      Randomly                                          Best
   Like AB Testing                                  (Be Greedy)

                     Conductrics twitter: @mgershoff
Epsilon Greedy
20% Random                         80% Select Best



                                    Action      Value
                                   A           $5.00
                                   B           $4.00
                                   C           $3.00
                                   D           $2.00
                                   E           $1.00
             Conductrics twitter: @mgershoff
Continuous Sampling

   Explore/Learn

    Exploit/Earn

             Time


      Conductrics twitter: @mgershoff
Epsilon Greedy

– Super Simple/low cost to implement
– Tends to be surprisingly effective
– Less affected by ‘Seasonality’

– Not optimal (hard to pick best )
– Doesn’t use measure of variance
– Should/How to decrease Exploration over
  time?       Conductrics twitter: @mgershoff
Upper Confidence Bound

Basic Idea:
1) Calculate both mean and a measure
   of uncertainty (variance) for each
   action.
2) Make Greedy selections based on
   mean + uncertainty bonus


             Conductrics twitter: @mgershoff
Confidence Interval Review

Confidence Interval = mean +/- z*Std


     - 2*Std         Mean                        +2*Std




               Conductrics twitter: @mgershoff
Upper Confidence

Score each option using the upper
portion of the interval as a Bonus


  Mean     +Bonus




              Conductrics twitter: @mgershoff
Upper Confidence Bound
1) Use upper portion of CI as ‘Bonus’

2) Make Greedy Selections

             A                                     Select A
                          B
                      C

        $0       $5                    $10
             Estimated Reward

                 Conductrics twitter: @mgershoff
Upper Confidence Bound
1) Selecting Action ‘A’ reduces uncertainty
bonus (because more data)

2) Action ‘C’ now has highest score


              A
                              B
                          C                            Select C

         $0          $5                    $10
                  Estimated Reward

                     Conductrics twitter: @mgershoff
Upper Confidence Bound


• Like A/B Test – uses variance
  measure
• Unlike A/B Test – no hypothesis
  test
• Automatically Balances
  Exploration with Exploitation
             Conductrics twitter: @mgershoff
Case Study:
          Conversion
Treatment    Rate     Served
V2V3             9.9% 14,893
V2V2             9.7%  9,720
V2V1             8.0%  2,441
V1V3             3.3%  2,090
V2V3             2.6%  1,849
V2V2             2.0%  1,817
V1V1             1.8%  1,926
V3V1             1.8%  1,821
V1V2             1.5%  1,873
          Conductrics twitter: @mgershoff
Case Study

Test Method                  Conversion
                             Rate
Adaptive                     7%

Non Adaptive                 4.5%


           Conductrics twitter: @mgershoff
AB Testing V Bandit

Option A ->


Option B ->



Option C ->




              Conductrics twitter: @mgershoff
Why Should I Care?

• More Efficient Learning

• Automation

• Changing World


               Conductrics twitter: @mgershoff
Questions?




 Conductrics twitter: @mgershoff
Thank You!



Matt Gershoff
      p) 646-384-5151
      e) matt@conductrics.com
      t) @mgershoff

    Conductrics twitter: @mgershoff

More Related Content

PPTX
Advertising placement
DOCX
Primjer 1 do primjera 4
PPTX
Olrac SPS Predictive Insurance Solutions
PDF
Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based E...
PPTX
Combining Linear and Non Linear Modeling Techniques
PDF
How to use R in different professions: R for Car Insurance Product (Speaker: ...
PDF
The Robos Are Coming - How AI will revolutionize Insurance 0117
PPT
Mobile UX Design Best Practices for Advertising
Advertising placement
Primjer 1 do primjera 4
Olrac SPS Predictive Insurance Solutions
Feeling Lucky? Multi-armed Bandits for Ordering Judgements in Pooling-based E...
Combining Linear and Non Linear Modeling Techniques
How to use R in different professions: R for Car Insurance Product (Speaker: ...
The Robos Are Coming - How AI will revolutionize Insurance 0117
Mobile UX Design Best Practices for Advertising

Viewers also liked (10)

PDF
Multi-Armed Bandits:
 Intro, examples and tricks
PDF
Predictive Modeling in Underwriting
PPTX
Uplift Modeling Workshop
PPSX
Advanced Pricing in General Insurance
PDF
Actuarial Analytics in R
PDF
Princing insurance contracts with R
PPTX
Google AdSense Presentation
PPT
Website Development with Wordpress as Content Management System
PDF
Building a Recommendation Engine - An example of a product recommendation engine
PPTX
Insurance pricing
Multi-Armed Bandits:
 Intro, examples and tricks
Predictive Modeling in Underwriting
Uplift Modeling Workshop
Advanced Pricing in General Insurance
Actuarial Analytics in R
Princing insurance contracts with R
Google AdSense Presentation
Website Development with Wordpress as Content Management System
Building a Recommendation Engine - An example of a product recommendation engine
Insurance pricing
Ad

Similar to Conductrics bandit basicsemetrics1016 (20)

KEY
Active Learning in Recommender Systems
PDF
Optimal Learning for Fun and Profit with MOE
PDF
Scott Clark, Software Engineer, Yelp at MLconf SF
PDF
Alina Beygelzimer, Senior Research Scientist, Yahoo Labs at MLconf NYC
PDF
Matt gershoff
PDF
Facebook Talk at Netflix ML Platform meetup Sep 2019
PDF
Cutting Edge Predictive Modeling For Classification
PDF
Multi-Armed Bandit: an algorithmic perspective
PDF
Artwork Personalization at Netflix
PPTX
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
PPT
ijcai09submodularity.ppt
PPT
Recommender Systems Tutorial (Part 3) -- Online Components
PDF
The math behind big systems analysis.
PPTX
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
PDF
On the value of stochastic analysis for software engineering
PDF
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
PDF
Causal inference-for-profit | Dan McKinley | DN18
PDF
DN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
PDF
Patterns (and Anti-Patterns) for Developing Machine Learning Systems
PDF
Challenges in Evaluating Exploration Effectiveness in Recommender Systems
Active Learning in Recommender Systems
Optimal Learning for Fun and Profit with MOE
Scott Clark, Software Engineer, Yelp at MLconf SF
Alina Beygelzimer, Senior Research Scientist, Yahoo Labs at MLconf NYC
Matt gershoff
Facebook Talk at Netflix ML Platform meetup Sep 2019
Cutting Edge Predictive Modeling For Classification
Multi-Armed Bandit: an algorithmic perspective
Artwork Personalization at Netflix
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
ijcai09submodularity.ppt
Recommender Systems Tutorial (Part 3) -- Online Components
The math behind big systems analysis.
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
On the value of stochastic analysis for software engineering
NON-STATIONARY BANDIT CHANGE DETECTION-BASED THOMPSON SAMPLING ALGORITHM
Causal inference-for-profit | Dan McKinley | DN18
DN18 | A/B Testing: Lessons Learned | Dan McKinley | Mailchimp
Patterns (and Anti-Patterns) for Developing Machine Learning Systems
Challenges in Evaluating Exploration Effectiveness in Recommender Systems
Ad

Conductrics bandit basicsemetrics1016

  • 1. Bandit Basics – A Different take on Online Optimization Conductrics twitter: @mgershoff
  • 2. Who is this guy? Matt Gershoff CEO: Conductrics Many Years in Database Marketing (New York and Paris) and a bit of Web Analytics www.conductrics.com twitter:@mgershoff Email:matt@conductrics.com
  • 4. What Are We Going to Hear? • Optimization Basics • Multi-Armed Bandit • Its a Problem, Not a Method • Some Methods • AB Testing • Epsilon Greedy • Upper Confidence Interval (UCB) • Some Results
  • 5. Choices Targeting Learning Optimization Conductrics twitter: @mgershoff
  • 6. OPTIMIZATION If THIS Then THAT ITTT brings together: 1.Decision Rules 2.Predictive Analytics 3.Choice Optimization Conductrics twitter: @mgershoff
  • 7. OPTIMIZATION Find and Apply the Rule with the most Value If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT If THIS Then THAT Conductrics twitter: @mgershoff
  • 8. OPTIMIZATION Variables whose Values Variables whose Values Are Given to You You Control THIS THAT If Facebook High Spend Urban GEO . Then Offer A Offer B Offer C . . Predictive Model . . . . F1 . . F2 S Valuei . Home Page Fm Offer Y App Use Offer Z Inputs Conductrics twitter: @mgershoff Outputs
  • 9. But … 1. We Don’t Have Data on ‘THAT’ 2. Need to Collect – Sample THAT 3. How to Sample Efficiently? Offer A ? Offer B ? Offer C ? . . . . . Offer Y ? Offer Z ? Conductrics twitter: @mgershoff
  • 10. Where Marketing Applications: • Websites • Mobile • Social Media Campaigns • Banner Ads Pharma: Clinical Trials Conductrics twitter: @mgershoff
  • 11. What is a Multi Armed Bandit One Armed Bandit –>Slot Machine The problem: How to pick between Slot Machines so that you walk out with most $$$ from Casino at the end of the Night? OR A B Conductrics twitter: @mgershoff
  • 12. Objective Pick so as to get the most return/profit as you can over time Technical term: Minimize Regret Conductrics twitter: @mgershoff
  • 13. Sequential Selection … but how to Pick? OR A B Need to Sample, but do it efficiently Conductrics twitter: @mgershoff
  • 14. Explore – Collect Data OR A B • Data Collection is costly – an Investment • Be Efficient – Balance the potential value of collecting new data with exploiting what you currently know. Conductrics twitter: @mgershoff
  • 15. Multi-Armed Bandits “Bandit problems embody in essential form a conflict evident in all human action: choosing actions which yield immediate reward vs. choosing actions … whose benefit will come only later.”* - Peter Whittle *Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010. Conductrics twitter: @mgershoff
  • 16. Exploration Exploitation 1) Explore/Learn – Try out different actions to learn how they perform over time – This is a data collection task. 2) Exploit/Earn – Take advantage of what you have learned to get highest payoff – Your current best guess Conductrics twitter: @mgershoff
  • 17. Not A New Problem 1933 – first work on competing options 1940 – WWII Problem Allies attempt to tackle 1953 – Bellman formulates as a Dynamic Programing problem Source: http://www.lancs.ac.uk/~winterh/GRhist.html Conductrics twitter: @mgershoff
  • 18. Testing • Explore First – All actions have an equal chance of selection (uniform random). – Use hypothesis testing to select a ‘Winner’. • Then Exploit - Keep only ‘Winner’ for selection Conductrics twitter: @mgershoff
  • 19. Learn First Data Collection/Sample Apply Leaning Explore/ Exploit/ Learn Earn Time Conductrics twitter: @mgershoff
  • 20. P-Values: A Digression P-Values: • NOT the probability that the Null is True. P( Null=True| DATA) • P(DATA (or more extreme)| Null=True) • Not a great tool for deciding when to stop sampling See: http://andrewgelman.com/2010/09/noooooooooooooo_1/ http://www.stat.duke.edu/~berger/papers/02-01.pdf Conductrics twitter: @mgershoff
  • 21. A Couple Other Methods 1. Epsilon Greedy Nice and Simple 2. Upper Confidence Bounds(UCB) Adapts to Uncertainty Conductrics twitter: @mgershoff
  • 22. 1) Epsilon-Greedy Conductrics twitter: @mgershoff
  • 23. Greedy What do you mean by ‘Greedy’? Make whatever choice seems best at the moment.
  • 24. Epsilon Greedy What do you mean by ‘Epsilon Greedy’? • Explore – randomly select action  percent of the time (say 20%) • Exploit – Play greedy (pick the current best) 1-  (say 80%)
  • 25. Epsilon Greedy User Explore/Learn Exploit/Earn (20%) (80%) Select Select Current Randomly Best Like AB Testing (Be Greedy) Conductrics twitter: @mgershoff
  • 26. Epsilon Greedy 20% Random 80% Select Best Action Value A $5.00 B $4.00 C $3.00 D $2.00 E $1.00 Conductrics twitter: @mgershoff
  • 27. Continuous Sampling Explore/Learn Exploit/Earn Time Conductrics twitter: @mgershoff
  • 28. Epsilon Greedy – Super Simple/low cost to implement – Tends to be surprisingly effective – Less affected by ‘Seasonality’ – Not optimal (hard to pick best ) – Doesn’t use measure of variance – Should/How to decrease Exploration over time? Conductrics twitter: @mgershoff
  • 29. Upper Confidence Bound Basic Idea: 1) Calculate both mean and a measure of uncertainty (variance) for each action. 2) Make Greedy selections based on mean + uncertainty bonus Conductrics twitter: @mgershoff
  • 30. Confidence Interval Review Confidence Interval = mean +/- z*Std - 2*Std Mean +2*Std Conductrics twitter: @mgershoff
  • 31. Upper Confidence Score each option using the upper portion of the interval as a Bonus Mean +Bonus Conductrics twitter: @mgershoff
  • 32. Upper Confidence Bound 1) Use upper portion of CI as ‘Bonus’ 2) Make Greedy Selections A Select A B C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  • 33. Upper Confidence Bound 1) Selecting Action ‘A’ reduces uncertainty bonus (because more data) 2) Action ‘C’ now has highest score A B C Select C $0 $5 $10 Estimated Reward Conductrics twitter: @mgershoff
  • 34. Upper Confidence Bound • Like A/B Test – uses variance measure • Unlike A/B Test – no hypothesis test • Automatically Balances Exploration with Exploitation Conductrics twitter: @mgershoff
  • 35. Case Study: Conversion Treatment Rate Served V2V3 9.9% 14,893 V2V2 9.7% 9,720 V2V1 8.0% 2,441 V1V3 3.3% 2,090 V2V3 2.6% 1,849 V2V2 2.0% 1,817 V1V1 1.8% 1,926 V3V1 1.8% 1,821 V1V2 1.5% 1,873 Conductrics twitter: @mgershoff
  • 36. Case Study Test Method Conversion Rate Adaptive 7% Non Adaptive 4.5% Conductrics twitter: @mgershoff
  • 37. AB Testing V Bandit Option A -> Option B -> Option C -> Conductrics twitter: @mgershoff
  • 38. Why Should I Care? • More Efficient Learning • Automation • Changing World Conductrics twitter: @mgershoff
  • 40. Thank You! Matt Gershoff p) 646-384-5151 e) matt@conductrics.com t) @mgershoff Conductrics twitter: @mgershoff