Ad science bid simulator (public ver)

Outline
● Tradeoff between Online/Offline evaluation
● Simulator Architecture
○ Environment (traffic & auction)
○ Controller (algorithms)
○ Algorithms
● Data
○ How do we simulate auction
○ Example results
● Results
○ How do we evaluate
○ The insights it brings us
● Appendix (details for adScience folks)

Scope difference: indeed adSystem v.s. DSP only
● Note that our scope is pretty different from DSP companies out there.
When you own the whole system (bid+auction)
Your bid algorithm is for every ads
=> pursue “smoothly make 100% use of client budget”.
=> low bid deviation, linear pacing is virtue here.
=> algorithms changing rapidly may lead to unpredictable system.
When you are a DSP company
You only care ads controlled by you
=> you can never “change the rule of whole world”.
=> pursue “beat competitor’s algorithms”.
=> Agile algorithms may take beneﬁt.
=> CPC/CPA is the most important thing.
5%
Your bidding
“Other Bidders”
95%
Your own
Auction System
100%

● Test groups are NOT independent, they join same auctions.
○ It's an artificial world, the “environment” is changing as you apply new alg
● Ideally, to make result comparable, need to run both new/old alg 100% on same time & ads => impossible in real world.
● Simple example: an algorithm always do max_bid in last 3 days, might work for 5%, not 100%.
When 100% on product, the “world” has changed
You want an alg can compete with itself, not an alg taking
advantage from old one.
During A/B test
new alg is competing with old alg, you’re testing whether it
beat the “old world” formed by old alg.
Concern on online A/B test
5%
New Alg
Old Alg
95%
New Alg
100%

● Market trend is volatile, and even worse -- the period is long.
● For job market, traffic/user behavior may different in long holidays, new year, graduation season.
=> test in January or June could lead to opposite conclusion. (ex: traffic pred / init bid past test)
=> it’s not feasible to do all A/B test an year long.
● We're not the only player.
● Our downstream product (auction/SERP) is evolving as well, as we’re doing the test.Deboost
● Closely affected by search result.
Concern on online A/B test

● Lack of auction details for replaying the bidding history
○ Currently we only have the winning auction data (adid, revenue, position, time)
○ What we do not have:
a. There are multiple SJ slot in an auction per SERP, we don’t know the other winning prices
(2nd/3rd/4th…) in the same auction.
b. The real “auction utility” score components, and the user interests.
● The compromise we need to make:
○ We only take data from “top” auction position, to make CTR/price similar.
○ Ignore eCTR/eApplyRate, only bid by price.
● Fortunately, we still have the most important thing: bid and revenue (2nd price).
Concern on oﬄine simulation

● Object: minimize customized/repeated logic, minimize interface.
○ Encapsulate bidding algorithm as “controller”
○ Campaign management / traffic pattern / auction shared as “environment” logic.
○ Simulate different traffic pattern/market competition by only changing environment.
○ Simulate different algorithms against various environments.
○ Idea: we cannot guarantee simulation = reality, but we could choose an algorithm
survive all kinds of extreme simulated environments.
Simulator Architecture - concept

campaign
state
● The real implementation:
Evaluator
Simulator Architecture - modules
Controllers
(stateless)
Environment
Auction
(stateless)
bid bid, spend_cap
Baseline
HighFreq
PID
RBO (A/B)
DDPG / PPO
logs
metrics!
Campaign
Management
(mgr. All states)
win auction,
click, revenue

Algorithms
● Baseline
○ The current (by Jan 2020) product algorithm.
○ The only algorithm update 3 times a day among algorithms we benchmarked.
● HighFreq
○ Main idea: raise the bid update frequency from 3 times a day to update per hour.
● PID
○ A classic control system template.
○ Decide next action by Proportional/Integral/Derivative components.
● RBO
○ An original design from adScience bidAlgo team.
○ Learn from ad’s own history.

new bid
● Baseline
1. main idea: compare “last period bid result” with “target spend”
2. tsr (target spend rate) = budget_left / inventory_left
3. asr (actual spend rate) = budget_used / inventory_used
4. new_bid = last_bid * (tsr / asr) (simplified version, for easy understanding)
Baseline Algorithm
budget
used
inventory
used
actual
spend_rate
budget
remaining
Inventory
remaining
target
spend_rate
last bid
(A) ideal target (B) last observation (C) calibration

● PID
1. Main idea: decide the next change ratio by 3 components P+I+D
2. P (proportional) = p_ratio * (1 - actual_spend / target_spend)
3. I (integrational) = i_ratio * SUM(past k history)
4. D (derivative) = d_ratio * (last_error - last_last_error)
5. new_bid = old_bid * (1 + P + I + D)
PID controller

● RBO
1. Main idea: spendRate = A * bid2
2. Learn A from history!
RBO (responsive bid optimizer)
Time
spend_rate_t1 spend_rate_t2 spend_rate_t3 spend_rate_t4 spend_rate_t5 spend_rate_t6 spend_rate_t7
bid_t1 bid_t2 bid_t3 bid_t4 bid_t5 bid_t6 bid_t7
Fit “A”

● Other than main algorithm, some hidden gems actually improved a lot.
1. Budget smoothing / Stochastic bid update.
2. Separate daily budget into 3-segments guardrail helped a lot in pacing.
3. Overspend is directly cutoff by campaign management.
4. Danger-zone logic burn the remaining bit of budget in last days.
5. RBOB introduced variant length of bid-period to deal with low-traffic ads.
The secret sauce ...

● Ex: for historical adid=xxxxx, get impression / click data from day_start ~
day_end from IQL.
● Augment click count by assuming some non-clicked impressions are
clicked.
(since user interest is not in scope of bid-opt.)
Data: how do we use Ad History
clicks
Impressions
day_start ~ day_end
Historical Data
clicks
Historical Data w/ augmented
10x clicks (revenue)
Impressions
day_start ~ day_end
clicksclicksclicksclicksclicksclicksclicksclicksclicks

Simulator Environment Components
● Bidding history
○ Use historical ad data, so naturally we have all kinds of weird traffic patterns.
○ Have synthesized traffic option, but not using it since too ideal.
● Budget
○ Simulate on ads having different level of clicks.
○ Found that low-budget/traffic ads are the most challenging type.
● Market Competition
○ Monopoly / Oligopoly / Fair market
● Traffic
○ Use ideal / predicted / uniform traffic to dispatch budget, to investigate the impact of traffic
prediction.
● Initial bid
○ Simulate initial bid impact, to investigate how important initial bid is, to decide whether we need to
improve.

● Real traffic and Predicted traffic
○ Idea: invest (spend budget) according to inventory (traffic), for optimize CPC.
○ Ex: invest $120 on a week, assuming traffic of each day is 2 in weekdays and 1 in weekend.
Data: how do we use traﬃc data
2 2 2 2 2 1 1
$20 $20 $20 $20 $20 $10 $10
traﬃc
budget

How to simulate bid-price competing
● For example, In “fair competing” setup, 10 simulated ads start with
■ budget = (total revenue of augmented clicks) / 10
■ Initial bid = 1.25 * first click revenue (historically, 1st bid ~= 1.25 x 2nd bid)
● 10 ads will have 10 bid price, plus 1 extra historical revenue as “bottom-line market price”.
■ To win the bid, need to beat price from other 9 simulated ads and “market price”.
■ If all ads bid-price lower than market price, nobody win.
■ If multiple top price, random choose one.
■ Charged with 2nd price.
Historical market price
Simulated Ads
Historical revenue $1.0,
as market bottom line
10 ads deliver 10 bids by their algorithm
[$0.5, $1.5, $7, $15 … ]

● The simulation of competing is like this:
Remaining_budget
Ideally should all
go zero
Daily spend limit &
max-spend
guardrail trend
Example simulate result in macro view
Bid history of 10
simulated competing ads
Daily budget
depletion of
each ad

Ncns (no-click-no-spend)
signaltsr (true spend rate)
asr (actual spend rate)
● To find root cause for some phenomenon: dive deep into the inner control-signal of each algorithms.
● Ex: Baseline (current product) control signals like: tsr, asr, ncns_rounds, danger_zone, over_spend …
● Modify what we think fishy => simulate again => prove the root cause.
Example simulate result in micro view
Mode, over_spend,
danger_zone
Used budget

● An example while we’re investigating various budget dispatching mode
○ Simulated 50 ads for each criteria combinations (algorithm / traffic / market competition / budget_depletion_mode / initial_bid_rto ...)
○ Use median rather than mean (sensitive to outlier) as aggregated matrics result.
○ Most of time focus on budget depletion rate & deviation, also CPC, also monitor whether other metrics got interesting trend.
Example of aggregated simulate metrics

How do we evaluate
● Input: bidding history
○ Use historical ad data, so naturally we have all kinds of weird traffic patterns.
● Simulation variants: most of time, benchmark by 3 main factors
1. Algorithms
○ Baseline / HighFreq / PID / RBO / RBOB
2. Update Frequency
○ Each algorithm have variants updating bid hourly / bi-hourly / 4 hours / 6 hours / 8 hours.
3. Traffic Level
○ Find low traffic (10-30 clicks) / median traffic (30-200 clicks) / high traffic (200~10000 clicks)
historical ads, random sample 500 ads each ,
○ Simulate based on their 2019-08-01 to 2019-08-15 historical data.
○ Augmented x10 clicks, dispatch budget to 10 ads in same algorithm competing each other.
● Output: metrics
○ Mainly budget depletion & CPC
○ Also look mean/median bid-price, CBP, uptime, bid volatility, avg daily bid depletion, avg daily
spend depletion.

● Baseline
○ Weak especially for low-traffic ads.
○ Relatively more sensitive to traffic prediction and initial bid error.
● HighFreqBid
○ Online A/B test twice (adScience / SMB team, each did once), concluded as no move-forward.
○ By simulation, found it boost price too fast (hourly) for low-traffic ads, causing high CPC.
● RBO
○ Used to be even worse than baseline for low-traffic ads (same reason with HighFreq), we
developed 2 solutions (RBO/RBOB) and fixed it by simulation.
○ Both versions having similar performance, one tend to spend early but with a bit higher CPC, one
tend to wait longer and spend in the end of campaign lifetime.
○ Robust against traffic prediction and initial bid error.
● PID
○ Parameters chosen to beat RBO in Aug 2019 end up losing in Nov 2019.
○ Parameter tuning is important.
The insight it brings us - bid-opt algorithms

● Traffic Prediction
○ For US market, even we deprecate traffic prediction and use uniform traffic instead, for baseline
it’s a ~5% difference in budget depletion, for PID and RBO it’s ~1%.
● Initial Bid
○ ~3% different in budget depletion for Baseline with ads having 1 month lifetime.
○ Almost no impact for high frequency update algorithms like PID & RBO.
● Tilting Budget Pacing
○ Try different patterns to tilt budget pacing, for improving budget depletion.
The insight it brings us - upstream components

Ad science bid simulator (public ver)

More Related Content

What's hot (20)

Similar to Ad science bid simulator (public ver) (20)

Recently uploaded (20)

Ad science bid simulator (public ver)