Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy Machine Learning

Democratizing AI Across Clouds:
Low-Cost, Easy-to-Deploy
Machine Learning

Deep learning deﬁnes the future
Healthcare
Logistics
Banking
● 76% of companies prioritize deep learning in 2021
● 70% of these companies are small-to-medium sized
businesses that rely on public clouds to run AI jobs
● They typically focus on business logic, not infrastructural
management (e.g., what resources to use and how
expensive they are)
https://www.forbes.com/sites/louiscolumbus/2021/01/17/76-of-enterprises-prioritize-ai--machine-learning-in-2021-it-budgets/?sh=378288e5618a

Problem 1: Extremely High Costs
Newer models = higher accuracy (competitive advantage) and higher costs
e.g., GPT-2 ($43K) => GPT-3 ($12M)
70% of AI businesses use
cloud and 40% are concerned
about their expense
6- to 8-digit $
amount/year on cloud
infra for ML tasks
AI Costs are massive…
…and they are only getting worse

A Short-term Goal: Cheap and Reliable AI for All
● Goals
○ Lower costs
○ Zero developer effort, compatible with existing
jobs/pipelines
○ Guaranteed accuracy and performance SLAs
● Affordable variants of resources available
● Certain GPUs better for certain models
● Multi-tenant smart networking
● Serverless threads
● Spot instances

Our Product: ML Platform as a Service
Popular Platforms
Spot
AIOps Tools
Demand
Breeze
Runtime
Schedule
Breeze Virtual Cloud
…
Framework
Interface
Scheduler
B
B
B
…
Lower Tier

Checkpointing Cannot Handle Many Failures
Takeaway: Less than 50%
of time spent doing useful
work (blue)
Blue: training progressing
Orange: cluster made
progress but was wasted
Red: cluster restarting
from checkpoint

How Can We Provide Fast Recovery And
Accuracy?
Introduce redundancy to the pipeline
● Feasible given the fact that we are using discounted resources
● Slightly over-provisioning to maintain performance

Can we do it more intelligently?
● Duplicate layer on every pipeline so that at least 2 copies of
weights always exist within the system
● Replicate it on the previous node to exploit data locality
8

Redundant stages provide redundancy more quickly than checkpointing ✅
High performance and memory overheads if done naively! ❌
Redundancy Provides Resilience
9

Pipeline has Bubbles
10
Each mini-batch split up
into micro-batches
Accumulate micro-batch
gradients to get full batch
gradients
Bubble

Using Pipeline Bubbles to Hide Overhead
11

Improvements Provided by BreezeML
● 2-3x more
cost effective
for popular
training jobs
● Red line: use
on-demand
AWS

On-prem
Cloud
AWS Cloud
GCP Cloud
Team 1 Team 2 Team 3 Team 4 Team 5
TF PyTorch
JAX TF PyTorch
Breeze Cloud-Neutral AI Platform
Problem 2: Highly Diverse Software and Hardware
TPU CPU GPU Graviton CPU

Breeze Multi-cloud Platform
● Users create a dataﬂow graph and annotate tasks
● We partition the dataﬂow and run tasks on most
appropriate resources across clouds to satisfy users’
constraints

Years of Research and Development
[NSDI 2023] Thorpe et al. , “Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs”
[NSDI 2022] Cangialosi et al., “Privid: Practical, Privacy-Preserving Video Analytics Queries”
[ICLR 2022] Zhang et al., “GradSign: Model Performance Inference with Theoretical Insights”
[MLSys 2022] Dogga et al., “Revelio: ML-Generated Debugging Queries for Distributed Systems”
[OSDI 2021] Thorpe et al., “Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU servers and
Serverless Threads”
[OSDI 2021] Wang et al., “PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated
Corrections”
[MLSys 2021] Ding et al. “IOS: Inter-Operator Scheduler for CNN Acceleration”
[SIGCOMM 2020] Li et al., “Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics”
[SoCC 2019] Dogga et al., “A System-Wide Debugging Assistant Powered by Natural Language Processing”
[SOSP 2019] Jia et al., “TASO: Optimizing Deep Learning Computation with Automated Generation of Graph
Substitutions”

GTM and Customer Integration
Windmill API
Server
● We control the
backend
http://windmill.breezeml.ai/
apis/
Free trial, academics, and
small businesses
On-site
Deployment
● Pytorch/Tensorﬂo
w/Ray plugin
● K8S plugin
● Deploy at user’s
site
Enterprises that control
backend themselves
Partner with
Cloud Providers
● AWS
● GCP
● Azure
● Oracle Cloud
Enterprises on special deals
with cloud providers

Monetization Schemes
Licensing
Deploy our system on
customers’ cloud environment
License Fee
● Fixed-price per-year
license
Cloud Service
Subscribe to BreezeML’s cloud
service:
Service charge: $1000/year
Cut from savings: $ amount in
savings per job * # jobs * 20%

BreezeML Enables Low-Cost, Cross-Cloud AI
● Reduce burden of running on low-cost spot instances while maintaining high performance
and reliability
● Allow developers to leverage the increasingly heterogeneous cloud environment

Thank you!
http://breezeml.ai

Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy Machine Learning

More Related Content

Similar to Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy Machine Learning (20)

More from Data Con LA (20)

Recently uploaded (20)

Data Con LA 2022 - Democratizing AI Across Clouds: Low-Cost, Easy-to-Deploy Machine Learning