SlideShare a Scribd company logo
OS for AI
Jon Peck
Making state-of-the-art algorithms
discoverable and accessible to everyone
Full-Spectrum Developer & Advocate
jpeck@algorithmia.com
@peckjon
bit.ly/nordic-ai
2
The Problem: ML is in a huge growth phase,
difficult/expensive for DevOps to keep up
Initially:
● A few models, a couple frameworks, 1-2 languages
● Dedicated hardware or VM Hosting
● IT Team for DevOps
● High time-to-deploy, manual discoverability
● Few end-users, heterogenous APIs (if any)
Pretty soon...
● > 5,000 algorithms (50k versions) on many runtimes / frameworks
● > 60k algorithm developers: heterogenous, largely unpredictable
● Each algorithm: 1 to 1,000 calls/second, a lot of variance
● Need auto-deploy, discoverability, low (15ms) latency
● Common API, composability, fine-grained security
3
The Need: an “Operating System for AI”
AI/ML scalable infrastructure on demand + marketplace
● Function-as-a-service for Machine & Deep Learning
● Discoverable, live inventory of AI via APIs
● Anyone can contribute & use
● Composable, Monetizable
● Every developer on earth can make their app intelligent
An Operating System for AI
What did the evolution of OS look like?
iOS/Android
Built-in App Store
(Discoverability)
Punch Cards
1970s
Unix
Multi-tenancy, Composability
DOS
Hardware Abstraction
GUI (Win/Mac)
Accessibility
4
General-purpose computing had a long evolution, as we learned what the common
problems were / what abstractions to build. AI is in the earlier stages of that evolution.
An Operating System:
• Provides common functionality needed by many programs
• Standardizes conventions to make systems easier to work with
• Presents a higher level abstraction of the underlying hardware
Use Case
Jian Yang made an app to recognize food “SeeFood”
© HBO All Rights Reserved 5
Use Case
He deployed his trained model to a GPU-enabled server
GPU-enabled
Server
?
6
Use Case
The app is a hit!
SeeFood
Productivity
7
?
?
Use Case
… and now his server is overloaded.
GPU-enabled
Server
?
xN
8
• Two distinct phases: training and inference
• Lots of processing power
• Heterogenous hardware (CPU, GPU, FPGA, TPU, etc.)
• Limited by compute rather than bandwidth
• “Tensorflow is open source, scaling it is not.”
Characteristics of AI
9
10
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Single user
TRAINING
11
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Single user
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
12
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
13
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
14
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Containers
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
15
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Containers Kubernetes
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
16
INFERENCE
Short compute bursts
Stateless
Elastic
Multiple users
Containers Kubernetes
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
Single user
OWNER: Data Scientists
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
MICROSERVICES: the design of a system as
independently deployable, loosely coupled
services.
Microservices & Serverless Computing => ML Hosting
ADVANTAGES
• Maintainable, Scalable
• Software & Hardware Agnostic
• Rolling deployments
SERVERLESS: the encapsulation, starting, and
stopping of singular functions per request, with a
just-in-time-compute model.
ADVANTAGES
• Elasticity, Cost Efficiency
• Concurrency
• Improved Latency
+ +
17
Why Serverless - Cost EfficiencyCallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
GPUServerInstances
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
160
140
120
100
80
60
40
20
Jian Yang’s “SeeFood” is most active during lunchtime.
18
Traditional Architecture - Design for Maximum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
40 machines 24 hours. $648 * 40 = $25,920 per month
GPUServerInstances
160
140
120
100
80
60
40
20
19
Autoscale Architecture - Design for Local Maximum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
19 machines 24 hours. $648 * 40 = $12,312 per month
GPUServerInstances
160
140
120
100
80
60
40
20
20
Serverless Architecture - Design for Minimum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month
160
140
120
100
80
60
40
20
GPUServerInstances
21
?
?
Why Serverless - Concurrency
GPU-enabled
Servers
?
LoadBalancer
22
Why Serverless - Improved Latency
Portability = Low Latency
23
24
+ +
Almost there! We also need:
GPU Memory Management, Job Scheduling, Cloud Abstraction,
Discoverability, Authentication, Logging, etc.
25
Elastic Scale
User
Web Load Balancer
API Load Balancer
Web Servers
API Servers
Cloud Region #1
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
Cloud Region #2
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
26
Elastic Scaling with
Intelligent Orchestration
Knowing that:
● Algorithm A always calls Algorithm B
● Algorithm A consumes X CPU, X Memory, etc
● Algorithm B consumes X CPU, X Memory, etc
Therefore we can slot them in a way that:
● Reduce network latency
● Increase cluster utilization
● Build dependency graphs
FoodClassifier
FruitClassifier VeggieClassifier
Runtime Abstraction
27
Composability
Composability is critical for AI workflows because of data
processing pipelines and ensembles.
Fruit or Veggie
Classifier
Fruit
Classifier
Veggie
Classifiercat file.csv | grep foo | wc -l
28
Cloud Abstraction - Storage
# No storage abstraction
s3 = boto3.client("s3")
obj = s3.get_object(Bucket="bucket-name", Key="records.csv")
data = obj["Body"].read()
# With storage abstraction
data = client.file("blob://records.csv").get()
s3://foo/bar
blob://foo/bar
hdfs://foo/bar
dropbox://foo/bar
etc.
29
Compute EC2 CE VM Nova
Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy
Load Balancing
Elastic Load
Balancer
Load Balancer Load Balancer LBaaS
Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage
Partial Source: Sam Ghods, KubeConf 2016
Cloud Abstraction
30
Runtime Abstraction
Support any
programming language
or framework, including
interoperability between
mixed stacks.
Elastic Scale
Prioritize and
automatically optimize
execution of concurrent
short-lived jobs.
Cloud Abstraction
Provide portability to
algorithms, including
public clouds or private
clouds.
Discoverability, Authentication, Instrumentation, etc.
Shell & Services
Kernel
An Operating System for AI: the “AI Layer”
31
Discoverability: an App Store for AI
32
Algorithmia’s OS for AI: discover a model
1. Discover a model
● AppStore-like interface
● Categorized, tagged, rated
● Well-described
(purpose, source, API)
33
Algorithmia’s OS for AI: execute a model
2. Execute from any language
● Raw JSON, or lang stubs
● Common syntax
● Autoscaled elastic cloud-exec
● Secure, isolated
● Concurrent, orchestrated
● 15ms overhead
● Hardware agnostic
34
Algorithmia’s OS for AI: add a model
3. Add new models
● Many languages, frameworks
● Instant JSON API
● Call other models seamlessly
(regardless of lang)
● Granular permissions
● GPU environments
● Namespaces & versioning
Jon Peck Developer Advocate
Thank you!
FREE STUFF
$50 free at Algorithmia.com
signup code: NORDIC18
jpeck@algorithmia.com
@peckjon
bit.ly/nordic-ai WE ARE HIRING
algorithmia.com/jobs
● Seattle or Remote
● Bright, collaborative env
● Unlimited PTO
● Dog-friendly

More Related Content

PDF
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
PDF
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
PDF
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
PDF
AI & Machine Learning Pipelines with Knative
PDF
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
PDF
Machine Learning Exchange (MLX)
PDF
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
PDF
Streaming all over the world Real life use cases with Kafka Streams
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
OSDC 2018 | Lifecycle of a resource. Codifying infrastructure with Terraform ...
AI & Machine Learning Pipelines with Knative
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Machine Learning Exchange (MLX)
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Streaming all over the world Real life use cases with Kafka Streams

What's hot (20)

PDF
Kubeflow Pipelines (with Tekton)
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
PPTX
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
PDF
K8s vs Cloud Foundry
PDF
Leveraging Microservices and Apache Kafka to Scale Developer Productivity
PDF
Confluent Developer Training
PDF
KFServing - Serverless Model Inferencing
KEY
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
PPTX
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
PPTX
Kafka Streams for Java enthusiasts
PDF
Containerised ASP.NET Core apps with Kubernetes
PDF
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
PPTX
Scalable On-Demand Hadoop Clusters with Docker and Mesos
PDF
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
PPTX
Cloudify workshop at CCCEU 2014
PPTX
IBM COE AI Lab at your University
PDF
How Microsoft Built and Scaled Cosmos
Kubeflow Pipelines (with Tekton)
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
K8s vs Cloud Foundry
Leveraging Microservices and Apache Kafka to Scale Developer Productivity
Confluent Developer Training
KFServing - Serverless Model Inferencing
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real time Messages at Scale with Apache Kafka and Couchbase
Kafka Streams for Java enthusiasts
Containerised ASP.NET Core apps with Kubernetes
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Cloudify workshop at CCCEU 2014
IBM COE AI Lab at your University
How Microsoft Built and Scaled Cosmos
Ad

Similar to OS for AI: Elastic Microservices & the Next Gen of ML (20)

PDF
Red Hat Storage Roadmap
PDF
Red Hat Storage Roadmap
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
Introduction to Apache Mesos and DC/OS
PDF
Clipper: A Low-Latency Online Prediction Serving System
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PPTX
High Performance Computing Pitch Deck
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
PDF
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
PDF
Nextflow on Velsera: a data-driven journey from failure to cutting-edge
PDF
PDF
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
PDF
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
PDF
Apache Drill (ver. 0.2)
PDF
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
PDF
Containerizing couchbase with microservice architecture on mesosphere.pptx
PDF
Puppet devops wdec
PPTX
HPC and cloud distributed computing, as a journey
PDF
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
PPTX
Sanger, upcoming Openstack for Bio-informaticians
Red Hat Storage Roadmap
Red Hat Storage Roadmap
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Introduction to Apache Mesos and DC/OS
Clipper: A Low-Latency Online Prediction Serving System
Productionizing Machine Learning - Bigdata meetup 5-06-2019
High Performance Computing Pitch Deck
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Nextflow on Velsera: a data-driven journey from failure to cutting-edge
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Apache Drill (ver. 0.2)
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
Containerizing couchbase with microservice architecture on mesosphere.pptx
Puppet devops wdec
HPC and cloud distributed computing, as a journey
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Sanger, upcoming Openstack for Bio-informaticians
Ad

More from Nordic APIs (20)

PPTX
How to Choose the Right API Platform - We Have the Tool You Need! - Mikkel Iv...
PPTX
Bulletproof Backend Architecture: Building Adaptive Services with Self-Descri...
PDF
Implementing Zero Trust Security in API Gateway with Cilium - Pubudu Gunatila...
PPTX
Event-Driven Architecture the Cloud-Native Way - Manuel Ottlik, HDI Global SE
PPTX
Navigating the Post-OpenAPI Era with Innovative API Design Frameworks - Danie...
PDF
Using Typespec for Open Finance Standards - Chris Wood, Ozone API
PPTX
Schema-first API Design Using Typespec - Cailin Smith, Microsoft
PPTX
Avoiding APIpocalypse; API Resiliency Testing FTW! - Naresh Jain, Xnsio
PPTX
How to Build an Integration Platform with Open Source - Magnus Hedner, Benify
PPTX
API Design First in Practise – An Experience Report - Hari Krishnan, Specmatic
PPTX
The Right Kind of API – How To Choose Appropriate API Protocols and Data Form...
PPTX
Why Frequent API Hackathons Are Key to Product Market Feedback and Go-to-Mark...
PPTX
Maximizing API Management Efficiency: The Power of Shifting Down with APIOps ...
PPTX
APIs Vs Events - Bala Bairapaka, Sandvik AB
PPTX
GraphQL in the Post-Hype Era - Daniel Hervas, Reckon Digital
PPTX
From Good API Design to Secure Design - Axel Grosse, 42Crunch
PPTX
API Revolution in IoT: How Platform Engineering Streamlines API Development -...
PPTX
Unlocking the ROI of API Platforms: What Success Actually Looks Like - Budhad...
PDF
Increase Your Productivity with No-Code GraphQL Mocking - Hugo Guerrero, Red Hat
PPTX
Securely Boosting Any Product with Generative AI APIs - Ruben Sitbon, Theodo ...
How to Choose the Right API Platform - We Have the Tool You Need! - Mikkel Iv...
Bulletproof Backend Architecture: Building Adaptive Services with Self-Descri...
Implementing Zero Trust Security in API Gateway with Cilium - Pubudu Gunatila...
Event-Driven Architecture the Cloud-Native Way - Manuel Ottlik, HDI Global SE
Navigating the Post-OpenAPI Era with Innovative API Design Frameworks - Danie...
Using Typespec for Open Finance Standards - Chris Wood, Ozone API
Schema-first API Design Using Typespec - Cailin Smith, Microsoft
Avoiding APIpocalypse; API Resiliency Testing FTW! - Naresh Jain, Xnsio
How to Build an Integration Platform with Open Source - Magnus Hedner, Benify
API Design First in Practise – An Experience Report - Hari Krishnan, Specmatic
The Right Kind of API – How To Choose Appropriate API Protocols and Data Form...
Why Frequent API Hackathons Are Key to Product Market Feedback and Go-to-Mark...
Maximizing API Management Efficiency: The Power of Shifting Down with APIOps ...
APIs Vs Events - Bala Bairapaka, Sandvik AB
GraphQL in the Post-Hype Era - Daniel Hervas, Reckon Digital
From Good API Design to Secure Design - Axel Grosse, 42Crunch
API Revolution in IoT: How Platform Engineering Streamlines API Development -...
Unlocking the ROI of API Platforms: What Success Actually Looks Like - Budhad...
Increase Your Productivity with No-Code GraphQL Mocking - Hugo Guerrero, Red Hat
Securely Boosting Any Product with Generative AI APIs - Ruben Sitbon, Theodo ...

Recently uploaded (20)

PPTX
Essential Infomation Tech presentation.pptx
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Complete React Javascript Course Syllabus.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PDF
top salesforce developer skills in 2025.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
System and Network Administration Chapter 2
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPT
JAVA ppt tutorial basics to learn java programming
PDF
System and Network Administraation Chapter 3
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
DOCX
The Five Best AI Cover Tools in 2025.docx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Essential Infomation Tech presentation.pptx
How to Choose the Right IT Partner for Your Business in Malaysia
Complete React Javascript Course Syllabus.pdf
Understanding Forklifts - TECH EHS Solution
top salesforce developer skills in 2025.pdf
Design an Analysis of Algorithms I-SECS-1021-03
ManageIQ - Sprint 268 Review - Slide Deck
System and Network Administration Chapter 2
Design an Analysis of Algorithms II-SECS-1021-03
Online Work Permit System for Fast Permit Processing
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
JAVA ppt tutorial basics to learn java programming
System and Network Administraation Chapter 3
How to Migrate SBCGlobal Email to Yahoo Easily
2025 Textile ERP Trends: SAP, Odoo & Oracle
Internet Downloader Manager (IDM) Crack 6.42 Build 41
The Five Best AI Cover Tools in 2025.docx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...

OS for AI: Elastic Microservices & the Next Gen of ML

  • 1. OS for AI Jon Peck Making state-of-the-art algorithms discoverable and accessible to everyone Full-Spectrum Developer & Advocate jpeck@algorithmia.com @peckjon bit.ly/nordic-ai
  • 2. 2 The Problem: ML is in a huge growth phase, difficult/expensive for DevOps to keep up Initially: ● A few models, a couple frameworks, 1-2 languages ● Dedicated hardware or VM Hosting ● IT Team for DevOps ● High time-to-deploy, manual discoverability ● Few end-users, heterogenous APIs (if any) Pretty soon... ● > 5,000 algorithms (50k versions) on many runtimes / frameworks ● > 60k algorithm developers: heterogenous, largely unpredictable ● Each algorithm: 1 to 1,000 calls/second, a lot of variance ● Need auto-deploy, discoverability, low (15ms) latency ● Common API, composability, fine-grained security
  • 3. 3 The Need: an “Operating System for AI” AI/ML scalable infrastructure on demand + marketplace ● Function-as-a-service for Machine & Deep Learning ● Discoverable, live inventory of AI via APIs ● Anyone can contribute & use ● Composable, Monetizable ● Every developer on earth can make their app intelligent
  • 4. An Operating System for AI What did the evolution of OS look like? iOS/Android Built-in App Store (Discoverability) Punch Cards 1970s Unix Multi-tenancy, Composability DOS Hardware Abstraction GUI (Win/Mac) Accessibility 4 General-purpose computing had a long evolution, as we learned what the common problems were / what abstractions to build. AI is in the earlier stages of that evolution. An Operating System: • Provides common functionality needed by many programs • Standardizes conventions to make systems easier to work with • Presents a higher level abstraction of the underlying hardware
  • 5. Use Case Jian Yang made an app to recognize food “SeeFood” © HBO All Rights Reserved 5
  • 6. Use Case He deployed his trained model to a GPU-enabled server GPU-enabled Server ? 6
  • 7. Use Case The app is a hit! SeeFood Productivity 7
  • 8. ? ? Use Case … and now his server is overloaded. GPU-enabled Server ? xN 8
  • 9. • Two distinct phases: training and inference • Lots of processing power • Heterogenous hardware (CPU, GPU, FPGA, TPU, etc.) • Limited by compute rather than bandwidth • “Tensorflow is open source, scaling it is not.” Characteristics of AI 9
  • 10. 10 TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Single user
  • 11. TRAINING 11 Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Single user Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 12. 12 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 13. 13 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 14. 14 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Containers Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 15. 15 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Containers Kubernetes Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 16. 16 INFERENCE Short compute bursts Stateless Elastic Multiple users Containers Kubernetes OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful Single user OWNER: Data Scientists Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 17. MICROSERVICES: the design of a system as independently deployable, loosely coupled services. Microservices & Serverless Computing => ML Hosting ADVANTAGES • Maintainable, Scalable • Software & Hardware Agnostic • Rolling deployments SERVERLESS: the encapsulation, starting, and stopping of singular functions per request, with a just-in-time-compute model. ADVANTAGES • Elasticity, Cost Efficiency • Concurrency • Improved Latency + + 17
  • 18. Why Serverless - Cost EfficiencyCallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 GPUServerInstances 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM 160 140 120 100 80 60 40 20 Jian Yang’s “SeeFood” is most active during lunchtime. 18
  • 19. Traditional Architecture - Design for Maximum CallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM 40 machines 24 hours. $648 * 40 = $25,920 per month GPUServerInstances 160 140 120 100 80 60 40 20 19
  • 20. Autoscale Architecture - Design for Local Maximum CallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM 19 machines 24 hours. $648 * 40 = $12,312 per month GPUServerInstances 160 140 120 100 80 60 40 20 20
  • 21. Serverless Architecture - Design for Minimum CallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month 160 140 120 100 80 60 40 20 GPUServerInstances 21
  • 22. ? ? Why Serverless - Concurrency GPU-enabled Servers ? LoadBalancer 22
  • 23. Why Serverless - Improved Latency Portability = Low Latency 23
  • 24. 24 + + Almost there! We also need: GPU Memory Management, Job Scheduling, Cloud Abstraction, Discoverability, Authentication, Logging, etc.
  • 25. 25 Elastic Scale User Web Load Balancer API Load Balancer Web Servers API Servers Cloud Region #1 Worker xN Docker(algorithm#1) .. Docker(algorithm#n) Cloud Region #2 Worker xN Docker(algorithm#1) .. Docker(algorithm#n)
  • 26. 26 Elastic Scaling with Intelligent Orchestration Knowing that: ● Algorithm A always calls Algorithm B ● Algorithm A consumes X CPU, X Memory, etc ● Algorithm B consumes X CPU, X Memory, etc Therefore we can slot them in a way that: ● Reduce network latency ● Increase cluster utilization ● Build dependency graphs FoodClassifier FruitClassifier VeggieClassifier Runtime Abstraction
  • 27. 27 Composability Composability is critical for AI workflows because of data processing pipelines and ensembles. Fruit or Veggie Classifier Fruit Classifier Veggie Classifiercat file.csv | grep foo | wc -l
  • 28. 28 Cloud Abstraction - Storage # No storage abstraction s3 = boto3.client("s3") obj = s3.get_object(Bucket="bucket-name", Key="records.csv") data = obj["Body"].read() # With storage abstraction data = client.file("blob://records.csv").get() s3://foo/bar blob://foo/bar hdfs://foo/bar dropbox://foo/bar etc.
  • 29. 29 Compute EC2 CE VM Nova Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy Load Balancing Elastic Load Balancer Load Balancer Load Balancer LBaaS Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage Partial Source: Sam Ghods, KubeConf 2016 Cloud Abstraction
  • 30. 30 Runtime Abstraction Support any programming language or framework, including interoperability between mixed stacks. Elastic Scale Prioritize and automatically optimize execution of concurrent short-lived jobs. Cloud Abstraction Provide portability to algorithms, including public clouds or private clouds. Discoverability, Authentication, Instrumentation, etc. Shell & Services Kernel An Operating System for AI: the “AI Layer”
  • 32. 32 Algorithmia’s OS for AI: discover a model 1. Discover a model ● AppStore-like interface ● Categorized, tagged, rated ● Well-described (purpose, source, API)
  • 33. 33 Algorithmia’s OS for AI: execute a model 2. Execute from any language ● Raw JSON, or lang stubs ● Common syntax ● Autoscaled elastic cloud-exec ● Secure, isolated ● Concurrent, orchestrated ● 15ms overhead ● Hardware agnostic
  • 34. 34 Algorithmia’s OS for AI: add a model 3. Add new models ● Many languages, frameworks ● Instant JSON API ● Call other models seamlessly (regardless of lang) ● Granular permissions ● GPU environments ● Namespaces & versioning
  • 35. Jon Peck Developer Advocate Thank you! FREE STUFF $50 free at Algorithmia.com signup code: NORDIC18 jpeck@algorithmia.com @peckjon bit.ly/nordic-ai WE ARE HIRING algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly