Bootstrapping of PySpark Models for Factorial A/B Tests

Bootstrapping of
(Py)Spark models for
factorial A/B tests
Ondrej Havlicek
Data Scientist

Ondřej Havlíček
• Senior data scientist
• Background
• Computer science, psychology,
neuroscience
• Focus
• Inferential statistics, machine learning, ETL
• Spark, Python, R
• A/B testing, recommendation, search, ...
• e-Commerce, social media, ...

Making data science and
machine learning have a real
impact on organizations.
We are
DataSentics PX
Personalization for
Banking and
Insurance
DS Innovate
AI/ML driven
innovation &
startups
DS TechScale
Platforms for AI-
intensive
applications
DS InRetail
Improving the
customer
experience in
Retail/FMCG
Gold partner &
Partner of the Year 2020 Professional partner
4th fastest growing in CE
Rising stars award
Partners &
Awards:
Selected
Customers:
Data science
Machine learning
specialists
Data engineering
Cloud
specialists
10+
product
owners
50+ 30+
Optimize and automate the
thousands/millions of small
decisions you do everyday
Analyse positioning, out-of-
stock, pricing and more
from a photo.
AI choice assistant for e-
commerce
AI extension for your
adform

Agenda
1. Factorial A/B testing
2. Analysis of results
3. Bootstrapping
4. Performance tuning

A/B testing
• What
• A: Control version
• B: Experimental version
• Why
• The only way to improve KPIs consistently
• Evidence > HIPPO
• Most of tested ideas actually incorrect
• How
• Usually isolated tests, in parallel or one after another
Wikipedia: a user experience research methodology ... consist of a randomized
experiment with two variants, A and B. It includes application of statistical hypothesis
testing ... and determining which of the two variants is more effective.

Why factorial A/B testing?
• Isolated tests are limiting
• Few concurrent experiments or very long
durations
• Solution: Factorial design
• Cross multiple tests orthogonally
• Each visitor assigned into a variant in all tests
• Allows running dozens of simultaneous tests
• Each test runs at all traffic
• Faster results
https://hbr.org/2017/09/the-surprising-power-of-online-experiments

Analysis of results
• What you often get
• Version B has a statistically significant effect on CR, p = 0.04
• What we ideally want
• Version B increases CR with 92.5% probability
• most likely by 1.8 %, 95% CI: [-0.3; 3.9]
Results of Test 1

Analysis of results
• How: effect size
• Big data: Spark GLM, e.g.:
• is_conversion ~ T1 + T2 + T1 * T2
• family = "binomial"
• link = "logit"
• How: uncertainty
• Std. errors generally not provided by Spark GLMs
• Bootstrapping
• A way to estimate distribution of some statistic
• “Poor man’s Bayes”, noninformative prior
Results of Test 1

Bootstrapping
• Iterate many times (hundreds..):
• Randomly resample data with replacement
• Compute statistics of interest: GLM coefficients
df_resample = df.sample(withReplacement=True, fraction=1.0)
fitted_model = model.fit(df_resample)
stats = extract_stats(fitted_model)
• How in Spark?
• Bootstrapping: Embarrassingly parallel
• Spark parallelizes tasks of model fitting = within 1 iteration
• How to scale?
• Need to run many instances of model fitting in parallel

Bootstrapping of GLM in Spark in a parallel fashion
• Multithreading
• Prepare bootstrap iterations into batches:
• Each batch contains sequential iterations
• Each iteration performs a spark action
• Stages have fewer tasks than cores
Worker 1 Worker 2
Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4
Iteration 1 Iteration 2 Iteration 3 Iteration 4
... ... ... ...
Batch 1 Batch 2 Batch 3 Batch 4
• Submit the batches in parallel using
multithreading
• Tasks get scheduled in FIFO / FAIR
fashion to the executors
Iteration 1
Stage 1
Task 1
Task 3
Task 2
Task 4
Core 1 Core 2

Bootstrapping of GLM in Spark
• Multithreading
Worker 1 Worker 2
... ... ... ...
ret_vals = []
batch_size = math.floor(n_iterations / n_threads)
batches = [{'batchnum': i + 1, 'reps': batch_size} for i in range(n_threads)]
with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as executor:
future_run = {
executor.submit(run_batch, df, model, batch['reps']): batch for batch in batches
}
for future in concurrent.futures.as_completed(future_run):
try:
batch_result = future.result()
ret_vals.append(batch_result)
...

Performance: don’t waste resources
• How many parallel batches (threads)?
• n_threads = n_cores / n_tasks * n_tasks_per_core
• n_tasks: repartition to ~100 – 200 MB
• n_tasks_per_core: empirical question, ca. 2 – 4
• Check Ganglia UI
Worker 1 Worker 2
... ... ... ...

Lessons learned
• Spark better suited for ML than inferential stats
• Bootstrapping helps
• You can do parallelization^2 in Spark
• Business users understand & like the outputs
• Core of factorial AB testing is simple
• Many interesting challenges in reality J
• Overlaps, interactions, funnels, outliers, zero-inflated metrics, variance
reduction, ...

Thank you!
Want to know more? Drop me a line
ondrej.havlicek@datasentics.com

Bootstrapping of PySpark Models for Factorial A/B Tests

More Related Content

What's hot (20)

Similar to Bootstrapping of PySpark Models for Factorial A/B Tests (20)

More from Databricks (20)

Recently uploaded (20)

Bootstrapping of PySpark Models for Factorial A/B Tests