Machine learning for profit: Computational advertising landscape

Machine learning for proﬁt
Computational advertising landscape
Sharat Chikkerur
sharat@alum.mit.edu
Principal Data Scientist
Nanigans Inc.
1

About me
• UB Alum, 2005 (CUBS)
• MIT Alum, 2010 (CBCL)
• Google, Senior Software Engineer
• AdWords pCTR modeling (McMahan et al. [2013])
• UI and position eﬀect modeling
• Microsoft, Senior Software Engineer (AzureML [2016])
• Azure machine learning
• Nanigans, Principal data scientist
• Value modeling for Facebook/Twitter/Instagram
• Modeling stack for Real time display (RTB)
2

What is computational advertising
Computational advertising is a new discipline that spans areas of
computing, economics and machine learning
Feature Discipline
Web-scale audience Distributed computing
Value estimation Machine learning
Targeted delivery Information retrieval
Personalized content Machine learning
Economics (Explore Exploit)
Recommendation system
Dynamic pricing Game theory
5

Economic scale: worldwide
• 11% annual growth rate
• 260B by 2020
6

Economic scale: US
• 17% annual growth rate, 100% annual growth for mobile
7

Entities
• Advertiser
• Publisher
• Demand side platform
• Supply side platform
• Ad networks
• Exchanges
• Data aggregators
9

© LUMA Partners LLC 2
Performance
Video / Rich Media
Targeted Networks / AMPs
Horizontal
Vertical / Custom
Mobile
ExchangesDSPs
Publisher
Tools
Data Suppliers
Ad Servers
DMPs and Data
Aggregators
Measurement
and Analytics
Creative
Optimization
Agency
Trading Desks
Ad Networks
Media Planning
and Attribution
Verification /
Privacy
Ad Servers
Retargeting
Media Mgmt Systems and Operations
Sharing Data /
Social Tools
SSPs
DISPLAY LUMAscape
M
A
R
K
E
T
E
R Tag Mgmt
Agencies
Denotes acquired company Denotes shuttered company
11

Search ads
• Intent is well known
• Search engine acts as a publisher, exchange and data
aggregator
• Targeting includes keywords, demographics, geo and user
history 12

Display ads : Retargeting
• Targets user who have almost converted elsewhere.
• Users are tracked through cookie syncing between advertiser
and exchange
• Higher intentionality, time sensitive and very proﬁtable
13

Display ads: Prospecting and brand
• Typically used for brand advertising and optimized for reach
• Primarily targeted using demographics
• Low intentionality
14

Native ads
• Sponsored content is mixed with native content
• Very popular within social networks and portal sites
• Requires good recommendation scheme 15

Multiformat
• Sponsored content is mixed with organic content
• Ads from other networks also displayed
16

Marketplace
• Grauranteed delivery (Futures contract)
• Direct advertiser publisher contract
• Publisher guarantees delivery of impression to a given audience
at a fixed price.
• Publisher on the hook for shortfall
• forecasting is the key to make a profit
• Header bidding/First look (Options contract)
• Advertiser pays a premium to get the first look at buy at a
lower price than auction
• Publishers get a better margin than the open exchange
17

Marketplace(cont.)
• Non guaranteed delivery (Auctions)
• Each impression goes through an auction in one or multiple
exchanges
• Publisher chooses the best bid
• Less expensive
18

The challenge of advertising
• Find the best match between a given user in a given context
and a suitable advertisement (when the advertisement is
relevant in the context) 1
Context
Channel Context
Search Keywords, Query, Ads, Geolocation, user behavior
Display Page content, user behavior, geolocation, demographics
Native Page content, user behavior, social connections
1
Introduction to computational advertising https://web.stanford.edu/class/msande239/
20

Problem domains
Advertising can be thought oﬀ as a sequential decision process
• Targeting:
• Who do we show the advertising to and when
• Ranking/Matching/Recommendation
• Which set of ads do we show
• Bidding
• How much is this opportunity worth to the advertiser ?
• Optimization
• What’s the objective of this ad campaign?
• Pricing
• How much is this opportunity worth to the publisher ?
• Budgeting & Pacing
• When should we spend and how much to maximize yield
21

Challenge: Targeting and ranking

Targeting: audience selection
• Targeting is usually manually speciﬁed based on campaign
goals
• Demographic (age, gender, income bracket)
• User supplied
• Credit card data
• Shipping address etc.
• Inferred demographics
• Audience targeting can be framed as search/ranking task
given (context, ad) pair
• contrast with Ad selection = search(context, user)
22

Ranking: ad selection
• Ad selection can be framed as ranking problem given (ad,
display context, user) features
• Common metric: E[revenue] = p(Click) ∗ CPC
Position dependence
• Click through rates depend on display position.
• Factored click through rate model (Richardson et al. [2007])
p(Click|ad, context) = p(view|ad, context)∗p(click|view, ad, context)
• Factorization machine (Rendle [2012])
f (ad, context) = b+
i
(wa
i ai )+
j
(wc
j cj )+
i j
vi , vj ai cj
23

Challenge: Bidding and value
estimation

Bidding: Factored value formulation
• Bidding value is the optimal strategy (Vickrey [1961])
• Value is factored based on a conversion funnel
• View → Click → Convert → Value
Factored formulation for downstream (revenue) optimization
VPI = E[V |A] ∗ P(A|Click) ∗ P(Click|Impression)
E[V |A] Value per action Linear/Log-Linear/Poisson regres
P(A|Click)orE[A|Click] Actions per click Logistic/Poisson Regression
P(Click|Impression) CTR Logistic regression
Fortunately, all of these regression can be implemented using a
general setup.
24

Value estimation: Generalized linear models
A generalized linear predictor speciﬁes
• A linear predictor of the form η(x) = wT x
• A mean estimate µ
• A link function g(µ) such that g(µ) = η(x) that relates the
mean estimate to the linear predictor.
This framework supports a variety of regression problems
Linear regression µ = wT x
Log-linear regression log(µ) = wT x
Logistic regression log( µ
1−µ) = wT x
Poisson regression log(µ) = wT x
25

Challenges: data issues
• Optimization objectives can be different
• (CPM, CPC, CPA, VPA etc.)
• Downstream objectives may conflict with upstream objectives
• high CTR impression/user may be low CVR impression/user
• Data arrives at different rates, quantity (attribution windows)
and have different maturity scales
• CTR models have high volume and granular features
• CVR models will have lower data volume
• Need to handle delayed rewards
26

Optimization: types
• Maximize volume
• Bid high enough to guarantee delivery
• Targeting should be narrow to avoid inefficiency
• Spend needs to be paced during the day
• Target cost (Average cost bidding)
• Advertiser specifies CPA, CPC
• Bid = p(CTR) ∗ p(Conversion|Clck) ∗ CPA ∗ pacing(t)
• Pacing accounts for second price discount
• Target yield (Revenue optimization)
• Advertiser specifies yield on CPA, CPC
• Bid = p(CTR) ∗ p(Conversion|Click) ∗ CPA
yield ∗ pacing(t)
27

Optimization: summary
Type Bidding
CPM value
CPC p(CTR) * value
CPA p(CTR) * p(Conversion) * value
OCPM E[value]
OCPC p(CTR) * E[value]
OCPA p(CTR) * p(Conversion) * E[value]
28

Challenge: Pricing (Exchange side)

Challenges: Pricing (Auctions)
• First price sealed bid
• Highest bidder wins
• Unstable
• Second price sealed bid
• Highest bidder wins but pays second price
• Stable
• Incentive to bid true value
• English auction (ascending price)
• Seller increases price until a single bidder remains
• Dutch auction (descending price)
• Seller decreases price until a single bidder accepts it
29

Second price auction
• Assume value is 100
• If you bid 100, net gain is 100 - cost
• If you bid > 100, increase chances that cost > 100, does not
aﬀect chance of price < 100
• If you bid < 100, reduce chances of winning
• Optimal strategy: bid 100, payout price cannot be aﬀected
• Model exposure fatigue
Vickrey-Clark-Groves (VCG) auction
• Maximizes total social value
• Everybody pays their externality (TSV w/o bidder
participation) - (TSV w/ participation)
• Optimal strategy: all bidders bid their true value
30

Generalized second price auction
• Rank by bid : cost(i) = bid(i + 1)
• Rank by revenue: cost(i) = bid(i + 1) ∗ ctr(i + 1)/ctr(i)
• Bidder always pays less than bid
• Better CTRs generate discount
• Not identical to VCG
• Revenue may be higher than VCG
31

Challenge: Budgeting and pacing

Pacing
• Purpose of pacing
• Budget pacing - distributes spend over time period
• KPI pacing - achieves target KPI (control vs. optimization)
• Method of pacing
• Throttling (allocation) - useful when bid adjustments are not
allowed
• Bid modiﬁcation
• Feedback control
32

Pacing types (Lee et al. [2013])
33

Feedback control (Zhang et al. [2016])
34

Explicit feedback control
The bid is
computed using
ba(t) = b(t)exp{φ(t)}
Here φ(t) is the control signal. Other approaches such as
ba(t) = b(t)(1 + φ(t)) were found to be ineﬀective.
35

Controller
• PID controller:
e(tk) = xr − x(tk)
φ(tk + 1) = λPe(tk) + λI
k
j=1
e(tj )∆tj + λD
∆e(tk)
∆tk
Here, xr is the reference/desired KPI, x(tk), KPI at time tk.
λP, λI , λd are conotroller gains. All control factors remain the
same between updates.
• Water-level based controller
φ(tk + 1) = φ(tk) + γ(xr − x(tk))
36

Subjective quality
Ads blindness
• Repeated exposures leads to learned blindness/adblock
• We need to predict when not showing ads are better
• Approaches
• Thresholding based on scores
• Utility function (lifetime value)
• Learning from data
Ads diversity
• Similar search results lead to redundant information
• Maximize overall utility (hard)
• Maximize incremental utility (easier)
• Tradeoﬀ marginal utility with a diversity metric (easy)
38

Summary
• Computational advertising invovles multiple subﬁelds of
computer science, machine learning, operations research,
economics and game theory
• Entire industrity is driven exclusively by data
• Diverse eco-system with lots of opportunities for data
scientists
© LUMA Partners LLC 2016
Performance
Video / Rich Media
Targeted Networks / AMPs
Horizontal
Vertical / Custom
Mobile
ExchangesDSPs
Publisher
Tools
Data Suppliers
Ad Servers
DMPs and Data
Aggregators
Measurement
and Analytics
Creative
Optimization
Agency
Trading Desks
Ad Networks
Media Planning
and Attribution
Verification /
Privacy
Ad Servers
Retargeting
Media Mgmt Systems and Operations
Sharing Data /
Social Tools
SSPs
DISPLAY LUMAscape
M
A
R
K
E
T
E
R
P
U
B
L
I
S
H
E
R
P
E
O
P
L
E
Tag Mgmt
Agencies
Denotes acquired company Denotes shuttered company
39

AzureML. Azureml: Anatomy of a machine learning service.
Journal of Machine Learning Reserach, 50, 2016.
C. Karande, A. Mehta, and R. Srikant. Optimizing budget
constrained spend in search advertising. WSDM ’13, page 697,
2013.
Kc Lee, Ali Jalali, and Ali Dasdan. Real Time Bid Optimization
with Smooth Budget Delivery in Online Advertising. arXiv
preprint arXiv:1305.3011, pages 1–13, 2013.
H Brendan McMahan, Gary Holt, D Sculley, Michael Young,
Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene
Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin
Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy
Kubica. Ad click prediction: a view from the trenches.
Proceedings of the 19th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 1222–1230,
39

2013. doi: 10.1145/2487575.2488200.
Steﬀen Rendle. Factorization machines with libFM. ACM Trans.
Intell. Syst. Technol., 3(3):57:1–57:22, May 2012. ISSN
2157-6904.
Matthew Richardson, Ewa Dominowska, and Robert Ragno.
Predicting clicks: estimating the click-through rate for new ads.
In Proceedings of the 16th international conference on World
Wide Web, pages 521–530. ACM, 2007.
William Vickrey. Counterspeculation, auctions, and competitive
sealed tenders. The Journal of ﬁnance, 16(1):8–37, 1961.
Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, and Xiaofan
Wang. Feedback control of real-time display advertising.
Technical report, 2016.
39

Machine learning for profit: Computational advertising landscape

More Related Content

Similar to Machine learning for profit: Computational advertising landscape (20)

Recently uploaded (20)

Machine learning for profit: Computational advertising landscape