Tuning for Systematic Trading: Talk 2: Deep Learning

SigOpt. Conﬁdential.
Talk #2
Optimize Training and Tuning for Deep
Learning
SigOpt Talk Series
Tuning for Systematic Trading
Tobias Andreasen — Machine Learning Engineer
Tuesday, April 21, 2020

Abstract
SigOpt provides an extensive set of advanced features,
which help you, the expert, save time while
increasing model performance via experimentation.
Today, we will continue this talk series by discussing
how to best utilize your infrastructure, reduce
experiment time and accelerate training for deep
learning models.

Motivation
1. Overview of SigOpt
2. Recap on bayesian optimization
3. How to continuously and eﬃciently utilize your
project’s allotted compute infrastructure
4. How to tune models with expensive training costs

Overview of SigOpt1

Accelerate and amplify the
impact of modelers everywhere

Experiment Insights Optimization Engine
Track, analyze and reproduce any
model to improve the productivity of
your modeling
Enterprise Platform
Automate hyperparameter tuning to
maximize the performance and impact
of your models
Standardize experimentation across
any combination of library,
infrastructure, model or task
On-Premise Hybrid/Multi
Solution: Experiment, optimize and analyze at scale
6

SigOpt Features
Enterprise
Platform
Optimization
Engine
Experiment
Insights
Reproducibility
Intuitive web dashboards
Cross-team permissions
and collaboration
Advanced experiment
visualizations
Usage insights
Parameter importance
analysis
Multimetric optimization
Continuous, categorical, or
integer parameters
Constraints and
failure regions
Up to 10k observations,
100 parameters
Multitask optimization and
high parallelism
Training Monitor and
Automated Early Stopping
Infrastructure agnostic
REST API
Parallel Resource Scheduler
Black-Box Interface
Tunes without
accessing any data
Libraries for Python,
Java, R, and MATLAB

Recap on bayesian
optimization2

Your firewall
Training
Data
AI, ML, DL,
Simulation Model
Model Evaluation
or Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
Track, organize, analyze and
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale
with your needs
OPTIMIZATION ENGINE
Explore and exploit with a
variety of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters
Black Box Optimization

A graphical depiction of the iterative process
10
Build a statistical model
Sequential Model Based Optimization (SMBO)

11
Build a statistical model
Choose the next point
to maximize the acquisition function

12
Build a statistical model Build a statistical model

13
Build a statistical model Build a statistical model

SigOpt Blog Posts: Intuition Behind Bayesian Optimization
Some Relevant Blog Posts
● Intuition Behind Covariance Kernels
● Approximation of Data
● Likelihood for Gaussian Processes
● Proﬁle Likelihood vs. Kriging Variance
● Intuition behind Gaussian Processes
● Dealing with Troublesome Metrics
Find more blog posts visit:
https://sigopt.com/blog/

How to continuously and eﬃciently
utilize your project’s allotted compute
infrastructure
3

Utilize compute by asynchronous parallel optimization
SigOpt natively handles Parallel Function Evaluation with the primary goal of
minimizing the Overall Wall-Clock Time. Parallelism also provides:
• Faster time-to-results — minimized overall wall-clock time
• Full resource utilization — asynchronous parallel optimization
• Scaling with infrastructure — optimize across the number of available compute resources
This is essential to increase Research Productivity by lowering the time-to-results and scaling with
available infrastructure.
16
Continuously and eﬃciently utilize infrastructure

Your firewall
Training
Data
AI, ML, DL, Simulation
Model
Model Evaluation
or Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
reproduce any model
ENTERPRISE PLATFORM
Built to fit any stack and scale with
your needs
OPTIMIZATION ENGINE
Explore and exploit with a variety
of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters

Better
Results
EXPERIMENT INSIGHTS
reproduce any model
ENTERPRISE PLATFORM
your needs
OPTIMIZATION ENGINE
of techniques
RESTAPI
Worker

EXPERIMENT INSIGHTS
reproduce any model
ENTERPRISE PLATFORM
your needs
OPTIMIZATION ENGINE
of techniques
RESTAPI
Worker
Worker
Worker
Worker #1
Worker #2
Worker #100

Parallel function evaluations: ﬁnd the best set of suggestions
20
Parallel function evaluations are a way of eﬃciently maximizing a function
while using all available compute resources [Ginsbourger et al, 2008, Garcia-Barcos et al. 2019].
• Choosing points by jointly maximizing criteria over the entire set of open resources
• Asynchronously evaluating over a collection of points
• Fixing points which are currently being evaluated while sampling new ones
Jointly Optimize Multiple Next Points to Sample
1D - Acquisition Function 2D - Acquisition Function

Parallel optimization: diﬀerent parallel bandwidth leads to diﬀerent search
21
Parallel bandwidth = 1 Parallel bandwidth = 2 Parallel bandwidth = 3
Parallel bandwidth = 4 Parallel bandwidth = 5
Next point(s) to
evaluate:
Parallel bandwidth
represent the # of
available compute
resources
Statistical Model
More Exploration, More Exploitation: Faster Wall Clock

Parallelism Use Case
● Category: NLP
● Task: Sentiment Analysis
● Model: CNN
● Data: Rotten Tomatoes Movie Reviews
● Analysis: Predicting Positive vs. Negative Sentiment
● Result: 400x speedup
Learn more
https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuni
ng-with-aws-gpu-instances-and-sigopt/
Use Case: Fast CNN Tuning with AWS GPU Instances

How to tune models with
expensive training costs
4

How to efficiently minimize time to optimize any function
SigOpt’s multitask feature is an efficient way for modelers to tune model with an expensive training cost
with the benefit of:
• Faster time-to-market — The ability to bring expensive models into production faster
• Reduction in infrastructure cost — Intelligently leverage infrastructure while reducing cost
Through novel research SigOpt helps the user lower the overall time-to-market,
while reducing the overall compute budget.
24
Expensive Training Cost

Your firewall
Training
Data
AI, ML, DL, Simulation
Model
Model Evaluation or
Backtest
Testing
Data
New
Configurations
Objective
Metric
Better
Results
EXPERIMENT INSIGHTS
reproduce any model
ENTERPRISE PLATFORM
your needs
OPTIMIZATION ENGINE
of techniques
RESTAPI
Configuration
Parameters or
Hyperparameters

Better
Results
EXPERIMENT INSIGHTS
reproduce any model
ENTERPRISE PLATFORM
your needs
OPTIMIZATION ENGINE
of techniques
RESTAPI

Using cheap or free information to speed learning
28
SigOpt allows to the user to deﬁne lower-cost functions in order to quickly optimize expensive functions
• Cheaper-cost functions can be ﬂexible (fewer epochs, subsampled data, other custom features)
• Use cheaper tasks earlier in the tuning process to explore
• Inform more expensive tasks later by exploiting what we learn
• In the process, reduce the full time required to tune an expensive model

Using cheap or free information to speed learning
We can build better models using inaccurate data to help point the
actual optimization in the right direction with less cost.
• Using a warm start through multi-task learning logic [Swersky et al, 2014]
• Combining good anytime performance with active learning [Klein et al, 2018]
• Accepting data from multiple sources without priors [Poloczek et al, 2017]
29

Use Case: Image Classification on a Budget
Use Case
● Category: Computer Vision
● Task: Image Classification
● Model: CNN
● Data: Stanford Cars Dataset
● Analysis: Architecture Comparison
● Result: 2.4% accuracy gain with a much shallower
model
Learn more
https://mlconf.com/blog/insights-for-building-high-performing-
image-classification-models/

Next Talk: Eﬃcient Approaches to Training
Automated Early Stopping, Convergence Monitoring

Register for this talk
https://tuning.sigopt.com/tuning-for-systematic-trading

Tobias Andreasen | tobias@sigopt.com
For more information visit: https://sigopt.com/research/
Questions?

Next talk: How should one
think about convergence?
5
- and other approaches to
eﬃcient model training
techniques

How should one think about
convergence?
5

Future talks
Convergence
Implementation
Infrastructure
Implementation
Use cases
Experiment transfor
Visit us at OpMl’20
Metric Management
Parameter Importance

The best model is found through convergence
37
Think about convergence

Think about convergence
The best model is found through convergence

Tuning for Systematic Trading: Talk 2: Deep Learning

More Related Content

What's hot (12)

Similar to Tuning for Systematic Trading: Talk 2: Deep Learning (20)

More from SigOpt (10)

Recently uploaded (20)

Tuning for Systematic Trading: Talk 2: Deep Learning