Best Practices for Hyperparameter Tuning with MLflow

Best Practices for Hyperparameter Tuning
with
Joseph Bradley
April 24, 2019
Spark + AI Summit

About me
Joseph Bradley
• Software engineer at Databricks
• Apache Spark committer & PMC member

TEAM
About Databricks
Started Spark project (now Apache Spark) at UC Berkeley in 2009
PRODUCT
Unified Analytics Platform
MISSION
Making Big Data Simple
Try for free today.
databricks.com

Hyperparameters
• Express high-level concepts, such as statistical assumptions
• Are fixed before training or are hard to learn from data
• Affect objective, test time performance, computational cost
E.g.:
• Linear Regression: regularization, # iterations of optimization
• Neural Network: learning rate, # hidden layers

Tuning hyperparameters
E.g.: Fitting a
polynomial
Common goals:
• More flexible modeling process
• Reduced generalization error
• Faster training
• Plug & play ML

Challenges in tuning
Curse of dimensionality
Non-convex optimization
Computational cost
Unintuitive hyperparameters

Tuning in the Data Science workflow
Data

Training Data Test Data
ML Model

Training
Data
Validation
Data
Test Data
Final
ML Model
ML Model 1
ML Model 2
ML Model 3

ML Model
Featurization
Model family
selection
Hyperparameter
tuning
“AutoML” includes hyperparameter tuning.

This talk
Popular methods for hyperparameter tuning
• Overview of methods
• Comparing methods
• Open-source tools
Tuning in practice with MLflow
• Instrument tuning
• Analyze results
• Productionize models
Beyond this talk

Overview of tuning methods
• Manual search
• Grid search
• Random search
• Population-based algorithms
• Bayesian algorithms

Manual search
Select hyperparameter settings to try based on human intuition.
2 hyperparameters:
• [0, ..., 5]
• {A, B, ..., F}
Expert knowledge tells us to try:
(2,C), (2,D), (2,E), (3,C), (3,D), (3,E)
A B C D E F
0
1
2
3
4
5

Grid Search
Try points on a grid defined by ranges and step sizes
X-axis: {A,...,F}
Y-axis: 0-5, step = 1
A B C D E F
0
1
2
3
4
5

A B C D E F
0
1
2
3
4
5
Random Search
Sample from distributions over ranges
X-axis: Uniform({A,...,F})
Y-axis: Uniform([0,5])

Start with random search, then iterate:
• Use the previous “generation” to
inform the next generation
• E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5

Model the loss function:
Hyperparameters à loss
Iteratively search space, trading off
between exploration and exploitation
A B C D E F
0
1
2
3
4
5
Bayesian Optimization

Bayesian OptimizationPerformance
Parameter Space

Comparing tuning methods
Iterative /
adaptive?
# evaluations
for P params
Model of
param space
Grid search No O(c^P) none
Random search No O(k) none
Population-based Yes O(k) implicit
Bayesian Yes O(k) explicit

Open-source tools for tuning
Grid
search
Random
search
Population
-based
Bayesian PyPi
downloads
last month
Github
stars
License
scikit-learn Yes Yes --- --- BSD
MLlib Yes --- --- Apache 2.0
scikit-
optimize
Yes 49,189 1,278 BSD
Hyperopt Yes Yes 98,282 3,286 BSD
DEAP Yes 26,700 2,789 LGPL v3
TPOT Yes 9,057 5,609 LGPL v3
GPyOpt Yes 4,959 451 BSD
As of mid-April 2019

Tracking
• Experiments
• Runs
• Parameters
• Metrics
• Tags & artifacts
Projects
• Directory or git
repository
• Entry points
• Environments
Models
• Storage format
• Flavors
• Deployment
tools

Organizing with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run
Child runs

Instrumenting tuning with
What to track in a run for a model
• Hyperparameters: all vs. ones being tuned
• Metric(s): training & validation, loss & objective, multiple objectives
• Tags: provenance, simple metadata
• Artifacts: serialized model, large metadata
Tip: Tune full pipeline, not 1 model.

Analyzing how tuning performs
Questions to answer
• Am I tuning the right hyperparameters?
• Am I exploring the right parts of the search space?
• Do I need to do another round of tuning?
Examining results
• Simple case: visualize param vs metric
• Challenges: multiple params and metrics, iterative experimentation

Moving models to production
Repeatable experiments via MLflow Projects
• Code checkpoints
• Environments
Model serialization via MLflow Models
• Flavors: TensorFlow, Keras, Spark, MLeap, ...
Deployment to prediction services
• Azure ML, AWS Sagemaker, Spark UDF

Auto-tracking MLlib with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run
Child runs
In Databricks
• CrossValidator &
TrainValidationSplit
• 1 run per setting of
hyperparameters
• Avg metrics for CV folds(demo)

Advanced topics
Efficient tuning
• Parallelizing hyperparameter
search
• Early stopping
• Transfer learning
Fancy tuning
• Multi-metric optimization
• Conditional/awkward
parameter spaces
Check out Maneesh Bhide’s talk:
"Advanced Hyperparameter
Optimization for Deep Learning"
to hear about early stopping,
multi-metric, & conditionals
Thursday @ 3:30pm, Room 3014

Advanced topics
Efficient tuning: Parallelizing
hyperparameter search
Challenge in analyzing results:
multiple parameters or
multiple metrics
Hyperopt + Apache Spark
+ MLflow integration
• Hyperopt: general tuning
library for ML in Python
• Spark integration:
parallelize model tuning in
batches
• MLflow integration: track
runs, analogous to MLlib +
MLflow integration
(demo)

Getting started
MLflow: http://mlflow.org
MLlib tuning
• Databricks auto-tracking with MLflow in private preview now, public
preview mid-May
Hyperopt
• Distributed tuning via Apache Spark: working to open-source the code
• Databricks auto-tracking with MLflow in public preview mid-May

Thank You!
Questions?
AMA @ DevLounge Theater
Thursday @ 10:30-11am
Thanks to Maneesh Bhide for
material for this talk!

Best Practices for Hyperparameter Tuning with MLflow

More Related Content

What's hot (20)

Similar to Best Practices for Hyperparameter Tuning with MLflow (20)

More from Databricks (20)

Recently uploaded (20)

Best Practices for Hyperparameter Tuning with MLflow