OS for AI: Elastic Microservices & the Next Gen of ML

OS for AI
Jon Peck
Making state-of-the-art algorithms
discoverable and accessible to everyone
Full-Spectrum Developer & Advocate
jpeck@algorithmia.com
@peckjon
bit.ly/nordic-ai

2
The Problem: ML is in a huge growth phase,
difficult/expensive for DevOps to keep up
Initially:
● A few models, a couple frameworks, 1-2 languages
● Dedicated hardware or VM Hosting
● IT Team for DevOps
● High time-to-deploy, manual discoverability
● Few end-users, heterogenous APIs (if any)
Pretty soon...
● > 5,000 algorithms (50k versions) on many runtimes / frameworks
● > 60k algorithm developers: heterogenous, largely unpredictable
● Each algorithm: 1 to 1,000 calls/second, a lot of variance
● Need auto-deploy, discoverability, low (15ms) latency
● Common API, composability, fine-grained security

3
The Need: an “Operating System for AI”
AI/ML scalable infrastructure on demand + marketplace
● Function-as-a-service for Machine & Deep Learning
● Discoverable, live inventory of AI via APIs
● Anyone can contribute & use
● Composable, Monetizable
● Every developer on earth can make their app intelligent

An Operating System for AI
What did the evolution of OS look like?
iOS/Android
Built-in App Store
(Discoverability)
Punch Cards
1970s
Unix
Multi-tenancy, Composability
DOS
Hardware Abstraction
GUI (Win/Mac)
Accessibility
4
General-purpose computing had a long evolution, as we learned what the common
problems were / what abstractions to build. AI is in the earlier stages of that evolution.
An Operating System:
• Provides common functionality needed by many programs
• Standardizes conventions to make systems easier to work with
• Presents a higher level abstraction of the underlying hardware

Use Case
Jian Yang made an app to recognize food “SeeFood”
© HBO All Rights Reserved 5

Use Case
He deployed his trained model to a GPU-enabled server
GPU-enabled
Server
?
6

Use Case
The app is a hit!
SeeFood
Productivity
7

?
?
Use Case
… and now his server is overloaded.
GPU-enabled
Server
?
xN
8

• Two distinct phases: training and inference
• Lots of processing power
• Heterogenous hardware (CPU, GPU, FPGA, TPU, etc.)
• Limited by compute rather than bandwidth
• “Tensorflow is open source, scaling it is not.”
Characteristics of AI
9

10
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Single user

TRAINING
11
Long compute cycle
Stateful
Single user
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM

12
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Stateful
Multiple usersSingle user
Stateless
Elastic
app.
Metal or VM

13
INFERENCE
OWNER: DevOps
TRAINING
Long compute cycle
Stateful
Stateless
Elastic
Analogous to an OS.
Running concurrent models
requires task scheduling.
app.
Metal or VM

14
INFERENCE
OWNER: DevOps
TRAINING
Long compute cycle
Stateful
Stateless
Elastic
Containers
Analogous to an OS.
app.
Metal or VM

15
INFERENCE
OWNER: DevOps
TRAINING
Long compute cycle
Stateful
Stateless
Elastic
Containers Kubernetes
Analogous to an OS.
app.
Metal or VM

16
INFERENCE
Stateless
Elastic
Multiple users
Containers Kubernetes
OWNER: DevOps
TRAINING
Long compute cycle
Stateful
Single user
Analogous to an OS.
app.
Metal or VM

MICROSERVICES: the design of a system as
independently deployable, loosely coupled
services.
Microservices & Serverless Computing => ML Hosting
ADVANTAGES
• Maintainable, Scalable
• Software & Hardware Agnostic
• Rolling deployments
SERVERLESS: the encapsulation, starting, and
stopping of singular functions per request, with a
just-in-time-compute model.
ADVANTAGES
• Elasticity, Cost Efficiency
• Concurrency
• Improved Latency
+ +
17

Why Serverless - Cost EfficiencyCallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
GPUServerInstances
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
160
140
120
100
80
60
40
20
Jian Yang’s “SeeFood” is most active during lunchtime.
18

Traditional Architecture - Design for Maximum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
40 machines 24 hours. $648 * 40 = $25,920 per month
GPUServerInstances
160
140
120
100
80
60
40
20
19

Autoscale Architecture - Design for Local Maximum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
19 machines 24 hours. $648 * 40 = $12,312 per month
GPUServerInstances
160
140
120
100
80
60
40
20
20

Serverless Architecture - Design for Minimum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month
160
140
120
100
80
60
40
20
GPUServerInstances
21

?
?
Why Serverless - Concurrency
GPU-enabled
Servers
?
LoadBalancer
22

Why Serverless - Improved Latency
Portability = Low Latency
23

24
+ +
Almost there! We also need:
GPU Memory Management, Job Scheduling, Cloud Abstraction,
Discoverability, Authentication, Logging, etc.

25
Elastic Scale
User
Web Load Balancer
API Load Balancer
Web Servers
API Servers
Cloud Region #1
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
Cloud Region #2
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)

26
Elastic Scaling with
Intelligent Orchestration
Knowing that:
● Algorithm A always calls Algorithm B
● Algorithm A consumes X CPU, X Memory, etc
● Algorithm B consumes X CPU, X Memory, etc
Therefore we can slot them in a way that:
● Reduce network latency
● Increase cluster utilization
● Build dependency graphs
FoodClassifier
FruitClassifier VeggieClassifier
Runtime Abstraction

27
Composability
Composability is critical for AI workflows because of data
processing pipelines and ensembles.
Fruit or Veggie
Classifier
Fruit
Classifier
Veggie
Classifiercat file.csv | grep foo | wc -l

28
Cloud Abstraction - Storage
# No storage abstraction
s3 = boto3.client("s3")
obj = s3.get_object(Bucket="bucket-name", Key="records.csv")
data = obj["Body"].read()
# With storage abstraction
data = client.file("blob://records.csv").get()
s3://foo/bar
blob://foo/bar
hdfs://foo/bar
dropbox://foo/bar
etc.

29
Compute EC2 CE VM Nova
Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy
Load Balancing
Elastic Load
Balancer
Load Balancer Load Balancer LBaaS
Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage
Partial Source: Sam Ghods, KubeConf 2016
Cloud Abstraction

30
Runtime Abstraction
Support any
programming language
or framework, including
interoperability between
mixed stacks.
Elastic Scale
Prioritize and
automatically optimize
execution of concurrent
short-lived jobs.
Cloud Abstraction
Provide portability to
algorithms, including
public clouds or private
clouds.
Discoverability, Authentication, Instrumentation, etc.
Shell & Services
Kernel
An Operating System for AI: the “AI Layer”

31
Discoverability: an App Store for AI

32
Algorithmia’s OS for AI: discover a model
1. Discover a model
● AppStore-like interface
● Categorized, tagged, rated
● Well-described
(purpose, source, API)

33
Algorithmia’s OS for AI: execute a model
2. Execute from any language
● Raw JSON, or lang stubs
● Common syntax
● Autoscaled elastic cloud-exec
● Secure, isolated
● Concurrent, orchestrated
● 15ms overhead
● Hardware agnostic

34
Algorithmia’s OS for AI: add a model
3. Add new models
● Many languages, frameworks
● Instant JSON API
● Call other models seamlessly
(regardless of lang)
● Granular permissions
● GPU environments
● Namespaces & versioning

Jon Peck Developer Advocate
Thank you!
FREE STUFF
$50 free at Algorithmia.com
signup code: NORDIC18
jpeck@algorithmia.com
@peckjon
bit.ly/nordic-ai WE ARE HIRING
algorithmia.com/jobs
● Seattle or Remote
● Bright, collaborative env
● Unlimited PTO
● Dog-friendly

OS for AI: Elastic Microservices & the Next Gen of ML

More Related Content

What's hot (20)

Similar to OS for AI: Elastic Microservices & the Next Gen of ML (20)

More from Nordic APIs (20)

Recently uploaded (20)

OS for AI: Elastic Microservices & the Next Gen of ML