StackAdapt Machine Learning Pipeline

Presentation for School of Continuing Studies

Data Science / Engineering
Section I: Advertising Technology Landscape

About Me
- Name: Larkin Liu
- Role: Data Scientist @ StackAdapt since 2016
- Specialties: Apache Spark, Scala, Python, R
- Education: MASc in Industrial Engineering, Specializing
in Operations Research, University of Toronto
- Other Fun Facts:
- Chinese / Canadian
- Competitive MMA fighter, and kickboxer.
- I really like race cars.

Agenda
Increase Profitability of Campaigns
- Ad Tech Landscape
- ML Models
- Logistic Regression
- Bagging: Random Forest
- Boosting: Adaboost (Gradient Boosted Trees, xgboost)
- Survival Regression (Proportional Hazards, Accelerated Failure Time Model)
- (Natural Language Processing)
- AB Testing
- RTB Auction Strategy

Real Time Bidding
- Online advertising goes through a process
known as Real Time Bidding (RTB)
- StackAdapt is a Demand Side Platform
(DSP).
- DSP’s interface with clients running
advertising campaigns, facing the Ad
Exchange.
- Our objective is to win valuable ad
impressions for our client’a campaigns.

Overview (Objectives)
- The ad exchange is a second price
auction.
- We bid on advertisements that are
valuable to our client.
- To accomplish we predict the likelihood
of a defined conversion, based on ML
modelling.
- We set our bid price proportional to our
predicted probability of a conversion.

Key Terms
- KPI - Key Performance Metric
- Win Price - the win price of the advertisement on the ad exchange, actual cost.
- Bid Price - what the DSP bid for the advertisement.
- CPM - Cost per Mille, cost per 1000 impressions.
- eCPC - Effective cost per click (total cost/number of clicks)
- eCPE - Effective cost per engagement (total cost/number of engagements)
- eCPA - Effective cost per action (total cost/number of conversions)
- AB Testing - split testing algorithms between control group and treatment algorithm.

Expectation
Initially we believed that each optimizer we designed will have a desired effect on the intermediate KPI’s
(CTR, eCPC, eCPE, eCPA, etc.), which in turn affect the overall profit of each campaign.

Reality
In reality, we discovered that the effect of each optimizer on various intermediate KPI’s follow a more
complex interaction scheme, which is also dependent on the market dynamics.

Section II: ML Models

Logistic Regression
Logistic Regression
We interpret the probability of p
i
provided predictor variables x
0,i
, x
1,i
, ..., x
m,i
.
Univariate logistic regression model F(x)
Can be re-written as, interpreted as the Odds ratio, where
F(x) is interpreted as probability of response = 1 (p)

Logistic Regression with
Interaction (IX) Terms
- Basic logistic regression makes a key assumption that all observations are independent of one
another. This is not the case in our data set.
- Interaction terms take into account the interaction between variables. For example, where variables X
and Z may not be independent, and the interaction between X and Z produce an effect on the log
odds.
- When deploying logistic regression for prediction of key KPI’s, the addition of interaction terms crucial
for accurate prediction, as variables are not independent, and the interaction between variables may
have a key effect in predicting KPI’s.

AdaBoost
- Adaptive Boosting (AdaBoost) is a well-established boosting algorithm.
- Unlike bagging, it produces a linear combination of tree results.
- Each weak classifier is trained on the entire dataset.
- Misclassified results are accentuated, and correctly classified results are diminished, depending on
each of the weak classifier results.
- The result is a linear combination of weak classifiers.
- Boosting can resolve the inherent capabilities of a specific class of classifiers, as well as reduce
class imbalance.

AdaBoost
Illustrative example of boosting.

Random Forest
- Random Forest (Breiman 2001) is a very
established bagging classification
algorithm allowing us to perform
classification and regression.
- An extension of the decision tree algorithm,
RF combines a random sampling of the
data, sample of the features, and sample of
the in and combines the result of many
small weak predictors.
- This approach makes RF much more
robust. Preventing overfitting and bias.

Survival Regression
- Proportional Hazards Model
- Accelerated Failure Time Model
- Models were evaluated using Akaike Information
Criterion (AIC), and Root Mean Square Error (RMSE).
- Primarily used to measure the time it takes for users to remain on a site (time on site). The
longer a user remains on a site, the lower the probability.

Survival Modelling
- We used a Random Forest model. Parameters,
- m: 33% of Total No. of Features
- No. Trees: 100
- Max Depth: 10 Layers
- Average RMSE across 10-fold cross validation
of 145. (A 25% Improvement from the Survival
Models investigated earlier).

Section III: RTB Deployment

AB Testing
- Currently our tests run 50/50 splits (S = 0.5), 50% goes to A group (control) 50% goes to B group
(experimental treatment).
- Our goal is to maximize profit, and minimize eCPC, something which we can achieve by deploying
our ML models.
- However, the effect of any model on any specific campaign can vary.

EMR-AB13-IX-5day-dailyUpdate
- Experimental Model Avg eCPC: 0.819
- Control Group eCPC: 0.833
- Experimental Model Profit: 2457.11
- Control Group Profit: 2773.32

EMR-AB14-mean_encoded_logisti
c_regression
- Experimental Model Avg eCPC: 1.246
- Control Group eCPC: 0.675
- Experimental Model Profit: 2285.23
- Control Group Profit: 1086.80

But wait
- Models perform differently with regards to various KPI’s and models on a
campaign-specific basis….
- Solution: a larger proportion of bid requests should go to the model with better KPI
performance.

RTB Optimizer
Our Min*/Max* framework is based off of a PID controller, where we adjust the split (S) proportional to
our objective of attaining a minimum or maximum value.
- Proportional: Immediate Error
- Integral: Cumulative Error
- Derivative: Rate of Change

References
- Zhang, Weinan, “Optimal Real-Time Bidding for Display Advertising”, 2016
- Freund & Schapire, “Experiments with a New Boosting Algorithm”, 1996

StackAdapt Machine Learning Pipeline

More Related Content

Similar to StackAdapt Machine Learning Pipeline (20)

Recently uploaded (20)

StackAdapt Machine Learning Pipeline