SlideShare a Scribd company logo
Kyle Schmaus, Stitch Fix
Moment-Based Estimation
for Hierarchical Models in
Apache Spark
#DSSAIS17
Overview
• About Stitch Fix
• Hierarchical (Mixed Effects) Models
• Moment Based Estimation
• Spark Implementation & Application
2#DSSAIS17
Stitch Fix
3#DSSAIS17
Stitch Fix
4#DSSAIS17
Stitch Fix
5#DSSAIS17
Stitch Fix
6#DSSAIS17
Stitch Fix
7#DSSAIS17
Machine learning Human curation
Algorithmic
recommendations
Stitch Fix
• Algorithms Team: 80+ Data Scientists and Data
Engineers
• Data Integrates into every aspect of the
business
• Our Blog: multithreaded.stitchfix.com
8#DSSAIS17
Hierarchical Models Motivation
9#DSSAIS17
ℙ(sale) = ?
Hierarchical Models Motivation
10#DSSAIS17
Color
Cut
Size
Brand
Material
…
Shared Model
ℙ(sale) = "#$
%$
&
'
11#DSSAIS17
ℙ(sale) = "#$
%(
&
'
" ) = log
)
1 − )
Individual Model Per Group
12#DSSAIS17
Individual Model Per Group
13#DSSAIS17
ℙ(sale) = "#$
%&$
ℙ(sale) = "#$
%&'
" ( = log
(
1 − (
Hierarchical Models
14#DSSAIS17
• ![#$] =	()* +$, + .$/$
• /$ ~ 1 0, Σ
• 5 ∈ 1, … , 9
• ,,	/$,	and	Σ are	unknown
Simulation Results
15#DSSAIS17
• ! ∈ 1, … , & = 100
• )* = +*, + .*/* + 0*
• dim , =	dim /* = 11 × 1
• /* ~ 7 0, Σ
• 0* ~ 7 0, 9
• Σ ~ :;< <
<<
9, 11
●
● ● ● ● ● ●
●
●
●
●
● ●
●
●
● ●
●
● ●
●
1.00
1.25
1.50
1.75
3 4 5 6 7
log(n_obs)
rmse
model_type
●
●
●
individual model
mixed model
shared model
RMSE By N Observations Per Group
Software Implementations
16#DSSAIS17
lme4, nlme, mbest, …
statsmodels
MixedModels
Software Implementations
17#DSSAIS17
lme4, nlme, mbest, …
statsmodels
MixedModels
Likelihood Based Methods
Expectation-Maximization, Variational Approximations, or
Likelihood Maximization require an ! "#$
initial cost, then
a series of iterations costing ! %#& , where
• " is the number of total observations
• # the number of fixed and random effects
• % the number of groups
18#DSSAIS17
Moment Based Methods
Using a moment-based approach laid out in Perry (2015)
and implemented in the mbest package, we can achieve
a non-iterative fit in ! "#$ + ! &#' steps. This can be
trivially spread across ( processors.
19#DSSAIS17
Moment Based Methods
Using a moment-based approach laid out in Perry (2015)
and implemented in the mbest package, we can achieve
a non-iterative fit in ! "#$ + ! &#' steps. This can be
trivially spread across ( processors.
This improvement in computational efficiency is paid for by
sacrificing some statistical efficiency.
20#DSSAIS17
Moment Based Setup
!" = $"% + '"(" + )" for * ∈ {1, … , 0}
(" ∼ 3(0, Σ)
)" ∼ 3(0, 89)
(" and )" independent
21#DSSAIS17
Moment Based Setup
• Define !" ≡ $" %" and &"
'
≡ ('
)"
'
• *" = $"( + %")" + -" ⟹
• *" = !"&" + -"
• Estimate /&" = !"
'
!"
0
!"
'
1", for each 3 ∈ 1, … , 7
• Note when !" is rank deficient, /&" is not an
unbiased estimator
22#DSSAIS17
Moment Based Setup
• !" = $"%"&"
'
• (" = $"%"&")
'
• *" = $"%"&"+
'
• ,ϕ = ./+
=
)
012
∑"4)
5
6" − ,!" ̂9"
+
• : ≡ ∑"4)
5
<"
23#DSSAIS17
Estimate !
Ω ≡ $
%&'
(
)%'*%)%'
+
,!- = Ω/'
$
%&'
(
)%'*%)%
+
01%
is an unbiased estimator for !.
24#DSSAIS17
Estiamte !
Say	we	knew	*,	not	just	an	estimate.	Define
• 6 ≡ ∑9:;
<
=9>?9(=9
A
BC9 − =9;
A
*) VG
H
BC9 − =9;
A
*
A
?9=9>
A
• I ≡ ∑9:;
<
=9>?9J9
K>
?9=9>
A
• Ω> ≡ ∑9:;
<
=9>?9=9>
A
⨂=9>?9=9>
A
• vec{ QΣS} = Ω>
K;
vec{6 − VI}
• QΣS is	an	unbiased	estimator	for	Σ
25#DSSAIS17
Estiamte !
• "Σ$ is	an	unbiased	estimator	for	Σ if	you	know	6
a	priori.		We	don’t	…
• Instead,	we	use	 >?.	This	mean	 "Σ$ is	not	an	
unbiased	estimator	in	practice.	
• It	can	be	shown	the	bias	is	often	negligible.	
• We	can	project	 "Σ$ to	be	Positive	Semidefinite
26#DSSAIS17
Estimate !"| $%"
• Using	a	gaussian	approximation	for	p($67|87) and	
p(87), compute	posterior	distribution
– p(87|$67) ∝ p($67|87) p(87)
• There	exists	a	formula	for	C7(Σ) such	that	
– EF(87 G = C7I7J (I7
K
$67 − I7M
K
N)
– OVar(87 G = OQ7C7
27#DSSAIS17
That was a lot of math! Read
Fast moment-based estimation for hierarchical models
on arXiv, it’s a great paper
28#DSSAIS17
Moment Based Estimation Summary
29#DSSAIS17
• Estimate !"#
Moment Based Estimation Summary
30#DSSAIS17
• Estimate !"#
• Use !"# to estimate $%
Moment Based Estimation Summary
31#DSSAIS17
• Estimate !"#
• Use !"# to estimate $%
• Use !"# to estimate &'(
Moment Based Estimation Summary
32#DSSAIS17
• Estimate !"#
• Use !"# to estimate $%
• Use !"# to estimate &'(
• Use !"#, $%, &'( to estimate &Σ(
Moment Based Estimation Summary
33#DSSAIS17
• Estimate !"#
• Use !"# to estimate $%
• Use !"# to estimate &'(
• Use !"#, $%, &'( to estimate &Σ(
• Use !"#, $%, &'(, &Σ( to estimate +#|!"#
lme4 vs mbest
34#DSSAIS17
●
● ●
●
●
●
●
●
●
●
●
●
1.02
1.04
1.06
1.08
1.10
3 4 5 6
log(n_obs)
rmse
model_type
●
●
lmer
mhglm
RMSE By N Observations Per Group
● ● ● ●
●
●
●
●
●
●
●
●
0
2
4
6
3 4 5 6
log(n_obs)
log(seconds)
model_type
●
●
lmer
mhglm
Seconds To Train By N Observations Per Group
Memory Constrained For Large N,	M
• Current software is memory constrained on a
single machine
• My macbook pro (2015) starts to run out of
memory for
– M	=	100,000
– N	=	10,000,000
35#DSSAIS17
Spark Implementation
36#DSSAIS17
Spark Implementation
37#DSSAIS17
Application
Logistic Model with
• N = 120,000,000
• M = 4,000,000
• p = q = 7
• Executors = 8, 16gb memory each
• Running time approximately 40 minutes
38#DSSAIS17
Thank You
github.com/stitchfix/MomentMixedModels
39#DSSAIS17

More Related Content

PDF
2018512 AWS上での機械学習システムの構築とSageMaker
PPTX
Microservice performance-b
PDF
Weightlifting at SimplySocial
PPTX
Latency SLOs done right
PPTX
Moment-based estimation for hierarchical models in Apache Spark
PDF
Answer key for pattern recognition and machine learning
PDF
Data Augmentation and Disaggregation by Neal Fultz
PDF
MLHEP 2015: Introductory Lecture #4
2018512 AWS上での機械学習システムの構築とSageMaker
Microservice performance-b
Weightlifting at SimplySocial
Latency SLOs done right
Moment-based estimation for hierarchical models in Apache Spark
Answer key for pattern recognition and machine learning
Data Augmentation and Disaggregation by Neal Fultz
MLHEP 2015: Introductory Lecture #4

Similar to Moment-Based Estimation for Hierarchical Models in Apache Spark with Kyle Schmaus (20)

PDF
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
PDF
Pattern Mining in large time series databases
PDF
A practical Introduction to Machine(s) Learning
DOCX
introduction to machine learning unit iv
DOCX
Data Analytics Using R - Report
PDF
Subgroup identification for precision medicine. a comparative review of 13 me...
PDF
Machine Learning_SVM_KNN_K-MEANSModule 2.pdf
PPTX
Tariku Bokila SVMA Presentation.pptx ddd
PDF
bayes_machine_learning_book for data scientist
PPTX
Statistical Machine Learning unit4 lecture notes
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
PDF
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
PDF
High Performance Decision Tree Optimization within a Deep Learning Framework ...
PDF
Support Vector Machines: Optimal Hyperplane for Classification and Regression
PDF
2012 mdsp pr03 kalman filter
PDF
maxbox_starter138_top7_statistical_methods.pdf
PPTX
chap4_imbalanced_classes.pptx
PDF
0-introduction.pdf
PDF
lec5_annotated.pdf ml csci 567 vatsal sharan
Inference for stochastic differential equations via approximate Bayesian comp...
Pattern Mining in large time series databases
A practical Introduction to Machine(s) Learning
introduction to machine learning unit iv
Data Analytics Using R - Report
Subgroup identification for precision medicine. a comparative review of 13 me...
Machine Learning_SVM_KNN_K-MEANSModule 2.pdf
Tariku Bokila SVMA Presentation.pptx ddd
bayes_machine_learning_book for data scientist
Statistical Machine Learning unit4 lecture notes
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
High Performance Decision Tree Optimization within a Deep Learning Framework ...
Support Vector Machines: Optimal Hyperplane for Classification and Regression
2012 mdsp pr03 kalman filter
maxbox_starter138_top7_statistical_methods.pdf
chap4_imbalanced_classes.pptx
0-introduction.pdf
lec5_annotated.pdf ml csci 567 vatsal sharan
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Ad

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Introduction to the R Programming Language
PDF
Introduction to Data Science and Data Analysis
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to machine learning and Linear Models
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Lecture1 pattern recognition............
PDF
Mega Projects Data Mega Projects Data
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Business Analytics and business intelligence.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction-to-Cloud-ComputingFinal.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Reliability_Chapter_ presentation 1221.5784
IB Computer Science - Internal Assessment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to the R Programming Language
Introduction to Data Science and Data Analysis
Business Ppt On Nestle.pptx huunnnhhgfvu
ISS -ESG Data flows What is ESG and HowHow
Introduction to machine learning and Linear Models
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
Mega Projects Data Mega Projects Data
[EN] Industrial Machine Downtime Prediction
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf

Moment-Based Estimation for Hierarchical Models in Apache Spark with Kyle Schmaus