SlideShare a Scribd company logo
Running Stan in
Production: Bayesian
Revenue Estimation
2018-08-29 StanCon
Markus Ojala
PhD, Chief Data Scientist
Smartly.io
@MarkusOjala
Implemented in early 2017:
https://www.smartly.io/blog/tutorial-how-we-productized-bay
esian-revenue-estimation-with-stan
Referred by Andrew Gelman in early 2018:
http://andrewgelman.com/2018/01/21/smartly-io-productize
d-bayesian-revenue-estimation-stan/
“It’s BDA come to life!“
“This sort of thing is exactly what we were hoping to see.”
Blog posts
1 billion spend
Managed yearly
We Make Online Advertising
Easy, Effective, And
Enjoyable
Facebook &
Instagram Partner
Running Stan in Production: Bayesian Revenue Estimation
Use Case: Campaign Budget Allocation
Multi-Armed
Bandit
The number of pulls for a given lever should match
its actual probability of being the optimal lever
Sample from the posterior for the mean of each lever
Bayesian Bandits / Thompson sampling
Modeling results per spend
Separate revenue model into two parts
ROAS = revenue / cost
= revenue / conversions * conversions / cost
= revenue / conversions * 1 / CPA
ROAS = return on ad spend
CPA = cost per action
Existing model
● Lot of data
● Varies fast
● Big differences
New model
● Little data
● Varies slowly
● Small real differences
● Lot of random variation
● Use hierarchical model: Account -> Campaign -> Ad set
● Revenue follows usually a long-tailed distribution, use log-normal
● We observe only hourly aggregates: approximate by using
Fenton-Wilkinson log-normal approximation
Modeling the revenue per purchase
Stan!
Start with
simple model
Posterior predictive checks with +100 real data
Fix step-by-step, allow no exceptions
Simplified end-model
● Easy way to write Bayesian models and do inference
● But hard to get stable in production: limit the scope of the model
● Reparametrize, use reasonable informative priors and custom initialization
○ Solves most of the convergence issues
● Sampling does not scale: we use ADVI / variational inference approximation
○ Validate that you get same results with sampling and ADVI
○ Random initialization issues were solved by fixing ADVI parameters
● About 1000 daily runs and $1M scheduled with Celery, PyStan, monitoring
Learnings in productizing Stan
Results 1
Common case:
ad sets don’t
really differ
Results 2
Campaigns
differ, ad sets
not
Results 3
Lot of data
Ad sets differ
Thank you.
Markus.Ojala@smartly.io
Are you an expert in Bayesian
modelling? Join us!
https://www.smartly.io/careers
We observe only aggregates
Goal: estimate log-normal parameters for ad sets
Challenge: observation i is aggregate of multiple events ni
Solution: Estimate by another log-normal
Multilevel
Model
Account
Campaign
Ad set

More Related Content

PDF
Bayesian Revenue Estimation with Stan
PDF
適切なクラスタ数を機械的に求める手法の紹介
PDF
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
PPTX
Rで因子分析 商用ソフトで実行できない因子分析のあれこれ
PDF
PRML11章
PPTX
Introduction to Machine Learning with Python and scikit-learn
ODP
Prestito interbibliotecario
PDF
PRML8章
Bayesian Revenue Estimation with Stan
適切なクラスタ数を機械的に求める手法の紹介
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Rで因子分析 商用ソフトで実行できない因子分析のあれこれ
PRML11章
Introduction to Machine Learning with Python and scikit-learn
Prestito interbibliotecario
PRML8章

What's hot (13)

PDF
Rで学ぶ離散選択モデル
PPTX
How to build a Recommender System
PDF
What is the Expectation Maximization (EM) Algorithm?
PDF
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
PPTX
[Final]collaborative filtering and recommender systems
PPTX
異常検知と変化検知 7章方向データの異常検知
PPTX
頻度論とベイズ論と誤差最小化について
PDF
Chapter 8 ボルツマンマシン - 深層学習本読み会
PPTX
Neko kin
PDF
8.4 グラフィカルモデルによる推論
PDF
Road to ggplot2再入門
PDF
Introduction to Recommendation Systems
PDF
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Rで学ぶ離散選択モデル
How to build a Recommender System
What is the Expectation Maximization (EM) Algorithm?
アドテクにおける機械学習技術 @Tokyo Data Night #tokyodn
[Final]collaborative filtering and recommender systems
異常検知と変化検知 7章方向データの異常検知
頻度論とベイズ論と誤差最小化について
Chapter 8 ボルツマンマシン - 深層学習本読み会
Neko kin
8.4 グラフィカルモデルによる推論
Road to ggplot2再入門
Introduction to Recommendation Systems
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Ad

Similar to Running Stan in Production: Bayesian Revenue Estimation (20)

PDF
Striving to Demystify Bayesian Computational Modelling
ODP
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
PDF
Pattern Recognition 21BR551 MODULE 02 NOTES.pdf
PDF
Assumptions: Check yo'self before you wreck yourself
PDF
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
PDF
estaditica aplicar
PDF
Introduction to Bayesian Inference
PPTX
Informs presentation new ppt
PPTX
UnlockingthePowerofBayesianClassification6ed840070da3a8bc.pptx
PDF
NAIVE BAYES ALGORITHM
PPTX
1.1 Probability Theory and Naiv Bayse.pptx
PPTX
Search Engines
PDF
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
PPTX
PPT
126622gghyytgggffggggggfsssssssssdff70.ppt
PPT
Unit-2.ppt
PPTX
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
PPTX
Optimal Bayesian Networks
PDF
205250 crystall ball
PDF
Data Science Cheatsheet.pdf
Striving to Demystify Bayesian Computational Modelling
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Pattern Recognition 21BR551 MODULE 02 NOTES.pdf
Assumptions: Check yo'self before you wreck yourself
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
estaditica aplicar
Introduction to Bayesian Inference
Informs presentation new ppt
UnlockingthePowerofBayesianClassification6ed840070da3a8bc.pptx
NAIVE BAYES ALGORITHM
1.1 Probability Theory and Naiv Bayse.pptx
Search Engines
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
126622gghyytgggffggggggfsssssssssdff70.ppt
Unit-2.ppt
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Optimal Bayesian Networks
205250 crystall ball
Data Science Cheatsheet.pdf
Ad

More from Markus Ojala (7)

PDF
Data science in online marketing briefly
PDF
Budget Allocation in Online Marketing
PDF
Data Science in Online Marketing
PDF
How Facebook Bidding Works
PDF
Facebook bids, budgets & optimization
PDF
Multi Armed Bandits and Optimized Online Marketing
PDF
Optimized Online Marketing in Facebook (Smartly.io)
Data science in online marketing briefly
Budget Allocation in Online Marketing
Data Science in Online Marketing
How Facebook Bidding Works
Facebook bids, budgets & optimization
Multi Armed Bandits and Optimized Online Marketing
Optimized Online Marketing in Facebook (Smartly.io)

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Mega Projects Data Mega Projects Data
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Managing Community Partner Relationships
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
How to run a consulting project- client discovery
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
Leprosy and NLEP programme community medicine
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Mega Projects Data Mega Projects Data
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
DATA COLLECTION METHODS-ppt for nursing research
Managing Community Partner Relationships
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
Business Analytics and business intelligence.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
How to run a consulting project- client discovery
SAP 2 completion done . PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
importance of Data-Visualization-in-Data-Science. for mba studnts
Leprosy and NLEP programme community medicine
Topic 5 Presentation 5 Lesson 5 Corporate Fin

Running Stan in Production: Bayesian Revenue Estimation

  • 1. Running Stan in Production: Bayesian Revenue Estimation 2018-08-29 StanCon Markus Ojala PhD, Chief Data Scientist Smartly.io @MarkusOjala
  • 2. Implemented in early 2017: https://www.smartly.io/blog/tutorial-how-we-productized-bay esian-revenue-estimation-with-stan Referred by Andrew Gelman in early 2018: http://andrewgelman.com/2018/01/21/smartly-io-productize d-bayesian-revenue-estimation-stan/ “It’s BDA come to life!“ “This sort of thing is exactly what we were hoping to see.” Blog posts
  • 3. 1 billion spend Managed yearly We Make Online Advertising Easy, Effective, And Enjoyable Facebook & Instagram Partner
  • 5. Use Case: Campaign Budget Allocation
  • 7. The number of pulls for a given lever should match its actual probability of being the optimal lever Sample from the posterior for the mean of each lever Bayesian Bandits / Thompson sampling
  • 9. Separate revenue model into two parts ROAS = revenue / cost = revenue / conversions * conversions / cost = revenue / conversions * 1 / CPA ROAS = return on ad spend CPA = cost per action Existing model ● Lot of data ● Varies fast ● Big differences New model ● Little data ● Varies slowly ● Small real differences ● Lot of random variation
  • 10. ● Use hierarchical model: Account -> Campaign -> Ad set ● Revenue follows usually a long-tailed distribution, use log-normal ● We observe only hourly aggregates: approximate by using Fenton-Wilkinson log-normal approximation Modeling the revenue per purchase
  • 11. Stan!
  • 13. Posterior predictive checks with +100 real data
  • 14. Fix step-by-step, allow no exceptions
  • 16. ● Easy way to write Bayesian models and do inference ● But hard to get stable in production: limit the scope of the model ● Reparametrize, use reasonable informative priors and custom initialization ○ Solves most of the convergence issues ● Sampling does not scale: we use ADVI / variational inference approximation ○ Validate that you get same results with sampling and ADVI ○ Random initialization issues were solved by fixing ADVI parameters ● About 1000 daily runs and $1M scheduled with Celery, PyStan, monitoring Learnings in productizing Stan
  • 17. Results 1 Common case: ad sets don’t really differ
  • 19. Results 3 Lot of data Ad sets differ
  • 20. Thank you. Markus.Ojala@smartly.io Are you an expert in Bayesian modelling? Join us! https://www.smartly.io/careers
  • 21. We observe only aggregates Goal: estimate log-normal parameters for ad sets Challenge: observation i is aggregate of multiple events ni Solution: Estimate by another log-normal