SlideShare a Scribd company logo
How Clarifai uses NATS and
Kubernetes for Machine Learning
Jack Li
10 May 2017
2
www.yourdomain.co
m
Who am I?
01• Senior Infrastructure Engineer @ Clarifai
• Duke/GaTech Grad
• Role:
• Backend systems implementation
• Core infrastructure - machines, cluster, all databases
• DevOps
3
www.yourdomain.co
m
Overview
01• Introduction to Clarifai
• Kubernetes & NATS @ Clarifai
• Experience with NATS and Lessons Learned
• Q&A
Clarifai is a
market leader in
visual recognition
technology
Proven, award-winning technology
Leading computer vision expert
Top VC investors
Matt Zeiler, CEO & Founder
Machine Learning PhD
Neural Network History
01
Geoffrey
Hinton
Yann
LeCun
1980s 2009
30x	Speedup
• more	data
• bigger	models
• faster	iterations
ImageNet Challenge
Convolutional Nets
Extracting Features from Layers
Zeiler et. al. Visualizing and Understanding Convolutional Networks.
ECCV 2014
Easy to use API
Here’s how you train AI with Custom Training
Here’s how you train AI with Custom Training
Oreo cookie 0.500 probabilityOreo cookie 0.588 probabilityOreo cookie 0.645 probabilityOreo cookie 0.766 probabilityOreo cookie 0.897 probabilityOreo cookie 0.912 probabilityOreo cookie 0.941 probabilityOreo cookie 0.956 probabilityOreo cookie 0.995 probability
With a few more examples, the AI gets more accurate.
Microservices at Clarifai
01• Transition from v1 to v2 microservice architecture
• Benefits:
Decouple functionality
Scale individual services
Parallel development and testing
IPC/intranode communication replaced with the cost of extra network latency
and serialization
Kubernetes at Clarifai
01• v1 architecture used a collection of Ansible scripts with AWS AMIs for deployment
• Transition off of virtual machine AMIs to Docker image
• Facilitated by our move to microservices
Many advantages to Kubernetes:
1. Speed up operations – CI/CD
2. Improved automation
3. Simplify management tools (Helm)
4. Improve security
5. Increase productivity
Microservices and Messaging
01• Communication among microservices required a messaging middleware
• Already using GRPCs to define services
• Problems: not easily extensible and requires client-side compilation of protobufs
Requirements:
1. Lightweight
2. Easy to configure and deploy on Kubernetes
3. Message persistence
4. Message queueing
5. Active community
Deciding on the Messaging System
01• Considered many options Amazon SQS, Kafka, RabbitMQ, NATS
• NATS:
1. Lightweight – Docker image size only a few MB, memory footprint in production less than 50 MB
2. Great documentation and easy to configure and deploy to Kubernetes cluster
3. Message persistence with file store (Kubernetes volumes)
4. Message queueing with exact source ordering (and at-least-once-delivery with NATS streaming)
5. Active Slack community
6. High messaging throughput and minimal latency
7. Written in Go
8. Message replay
Architecture @ Clarifai with NATS
01
Use Case of NATS @ Clarifai
01• Job queue for microservice workers
• Both fast and slow subscribers
• Trigger certain actions in services
• Message persistence during rolling continuous deployments
• Controlling the flow of messages to certain services (“pausing” subscriptions)
Monitoring NATS Streaming Subscriptions
01
Results
01• Implemented NATS into our backend in three weeks with one service
• In production for 5 months, currently used by five different services and growing
• 100% uptime with NATS
• 100k+ messages sent through NATS per day
Lessons Learned
01• Problems with NATS mainly stemmed from how we were using it
• Manual acking caused lots of issues and harder to get right, reverted to automatic acking for many
subscriptions
• Hard to monitor if there are many subscriptions (inject metrics and monitor the queue using separate
service)
Feature Requests
• Dead-letter queue
• Flushing/invalidating messages from specific subscriptions
• Clustering
E-mail: jack.li@clarifai.com
Questions?

More Related Content

PDF
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
PPTX
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
PDF
A New Way of Thinking | NATS 2.0 & Connectivity
PDF
Serverless for the Cloud Native Era with Fission
PDF
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
PDF
Implementing Microservices with NATS
PDF
Microservices Meetup San Francisco - August 2017 Talk on NATS
PPTX
NATS for Modern Messaging and Microservices
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
A New Way of Thinking | NATS 2.0 & Connectivity
Serverless for the Cloud Native Era with Fission
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
Implementing Microservices with NATS
Microservices Meetup San Francisco - August 2017 Talk on NATS
NATS for Modern Messaging and Microservices

What's hot (17)

PDF
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
PDF
NATS + Docker meetup talk Oct - 2016
PDF
Easy, Secure, and Fast: Using NATS.io for Streams and Services
PDF
KubeCon NA 2019 Keynote | NATS - Past, Present, and the Future
PDF
Micro on NATS - Microservices with Messaging
PDF
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
PDF
NATS Connect Live | NATS as a Service Mesh
PDF
NATS vs HTTP
PDF
Simple Solutions for Complex Problems
PDF
How Clarifai uses NATS and Kubernetes for Machine Learning
PDF
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
PDF
NATS Connect Live!
PDF
NATS in action - A Real time Microservices Architecture handled by NATS
PDF
Taking Your Enterprise to the Next Level with WSO2 Message Broker and WSO2 En...
PDF
Simple Solutions for Complex Problems - Boulder Meetup
PDF
WSO2 Message Broker - Product Overview
PDF
NATS: Control Flow for Distributed Systems
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
NATS + Docker meetup talk Oct - 2016
Easy, Secure, and Fast: Using NATS.io for Streams and Services
KubeCon NA 2019 Keynote | NATS - Past, Present, and the Future
Micro on NATS - Microservices with Messaging
NATS: Simple, Secure and Scalable Messaging For the Cloud Native Era
NATS Connect Live | NATS as a Service Mesh
NATS vs HTTP
Simple Solutions for Complex Problems
How Clarifai uses NATS and Kubernetes for Machine Learning
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
NATS Connect Live!
NATS in action - A Real time Microservices Architecture handled by NATS
Taking Your Enterprise to the Next Level with WSO2 Message Broker and WSO2 En...
Simple Solutions for Complex Problems - Boulder Meetup
WSO2 Message Broker - Product Overview
NATS: Control Flow for Distributed Systems
Ad

Similar to How Clarifai uses NATS and Kubernetes for Machine Learning (20)

PDF
NATS.io Meetup October - Community Update
PDF
Nats.io meetup october 2015 - Community Update
PDF
NATS - A new nervous system for distributed cloud platforms
PDF
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
PDF
OSCON: Building Cloud Native Apps with NATS
PPTX
NATS for Modern Messaging and Microservices
PDF
NATS Internals, Strengths and Challenges
PPTX
Captial One: Why Stream Data as Part of Data Transformation?
PDF
NATS Connector Framework - Boulder Meetup
PDF
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
PDF
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
PDF
NATS: A Cloud Native Messaging System
PDF
What next after microservices
PDF
Nats in action a real time microservices architecture handled by nats
PDF
Simple Solutions for Complex Problems - Boulder Meetup
PPTX
Connect Everything with NATS - Cloud Expo Europe
PPTX
Easing the Path to Network Transformation - Network Transformation Experience...
PDF
NATS & IoT
PDF
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
PDF
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
NATS.io Meetup October - Community Update
Nats.io meetup october 2015 - Community Update
NATS - A new nervous system for distributed cloud platforms
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
OSCON: Building Cloud Native Apps with NATS
NATS for Modern Messaging and Microservices
NATS Internals, Strengths and Challenges
Captial One: Why Stream Data as Part of Data Transformation?
NATS Connector Framework - Boulder Meetup
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
NATS: A Cloud Native Messaging System
What next after microservices
Nats in action a real time microservices architecture handled by nats
Simple Solutions for Complex Problems - Boulder Meetup
Connect Everything with NATS - Cloud Expo Europe
Easing the Path to Network Transformation - Network Transformation Experience...
NATS & IoT
Simple, Secure, Scalable Messaging for the Cloud Native Era - AllThingsOpen 2...
NATS: Simple, Secure, and Scalable Messaging for the Cloud Native Era
Ad

More from NATS (16)

PDF
RethinkConn 2022!
PDF
NATS Connect Live | Serverless on Kubernetes with OpenFaaS & NATS
PDF
NATS Connect Live | SwimOS & NATS
PDF
NATS Connect Live | Pub/Sub on the Power Grid
PDF
NATS Connect Live | Distributed Identity & Authorization
PDF
NATS Connect Live | Resgate
PDF
NATS Connect Live | NATS & Augmented Reality
PDF
OSCON 2019 | Time to Think Different
PDF
Writing Networking Clients in Go - GopherCon 2017 talk
PDF
NATS vs HTTP for Interservice Communication
PDF
Using NATS for Control Flow in Distributed Systems
PDF
Integration Patterns for Microservices Architectures
PDF
Actor Patterns and NATS - Boulder Meetup
PDF
Implementing Microservices with NATS
PDF
The Zen of High Performance Messaging with NATS
PDF
How Greta uses NATS to revolutionize data distribution on the Internet
RethinkConn 2022!
NATS Connect Live | Serverless on Kubernetes with OpenFaaS & NATS
NATS Connect Live | SwimOS & NATS
NATS Connect Live | Pub/Sub on the Power Grid
NATS Connect Live | Distributed Identity & Authorization
NATS Connect Live | Resgate
NATS Connect Live | NATS & Augmented Reality
OSCON 2019 | Time to Think Different
Writing Networking Clients in Go - GopherCon 2017 talk
NATS vs HTTP for Interservice Communication
Using NATS for Control Flow in Distributed Systems
Integration Patterns for Microservices Architectures
Actor Patterns and NATS - Boulder Meetup
Implementing Microservices with NATS
The Zen of High Performance Messaging with NATS
How Greta uses NATS to revolutionize data distribution on the Internet

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
System and Network Administraation Chapter 3
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Introduction to Artificial Intelligence
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Transform Your Business with a Software ERP System
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administration Chapter 2
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Nekopoi APK 2025 free lastest update
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Digital Strategies for Manufacturing Companies
PPTX
L1 - Introduction to python Backend.pptx
Understanding Forklifts - TECH EHS Solution
Adobe Illustrator 28.6 Crack My Vision of Vector Design
2025 Textile ERP Trends: SAP, Odoo & Oracle
System and Network Administraation Chapter 3
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Introduction to Artificial Intelligence
CHAPTER 2 - PM Management and IT Context
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Transform Your Business with a Software ERP System
Operating system designcfffgfgggggggvggggggggg
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administration Chapter 2
Designing Intelligence for the Shop Floor.pdf
Design an Analysis of Algorithms I-SECS-1021-03
wealthsignaloriginal-com-DS-text-... (1).pdf
How to Migrate SBCGlobal Email to Yahoo Easily
Nekopoi APK 2025 free lastest update
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Digital Strategies for Manufacturing Companies
L1 - Introduction to python Backend.pptx

How Clarifai uses NATS and Kubernetes for Machine Learning

  • 1. How Clarifai uses NATS and Kubernetes for Machine Learning Jack Li 10 May 2017
  • 2. 2 www.yourdomain.co m Who am I? 01• Senior Infrastructure Engineer @ Clarifai • Duke/GaTech Grad • Role: • Backend systems implementation • Core infrastructure - machines, cluster, all databases • DevOps
  • 3. 3 www.yourdomain.co m Overview 01• Introduction to Clarifai • Kubernetes & NATS @ Clarifai • Experience with NATS and Lessons Learned • Q&A
  • 4. Clarifai is a market leader in visual recognition technology Proven, award-winning technology Leading computer vision expert Top VC investors Matt Zeiler, CEO & Founder Machine Learning PhD
  • 5. Neural Network History 01 Geoffrey Hinton Yann LeCun 1980s 2009 30x Speedup • more data • bigger models • faster iterations
  • 8. Extracting Features from Layers Zeiler et. al. Visualizing and Understanding Convolutional Networks. ECCV 2014
  • 10. Here’s how you train AI with Custom Training
  • 11. Here’s how you train AI with Custom Training
  • 12. Oreo cookie 0.500 probabilityOreo cookie 0.588 probabilityOreo cookie 0.645 probabilityOreo cookie 0.766 probabilityOreo cookie 0.897 probabilityOreo cookie 0.912 probabilityOreo cookie 0.941 probabilityOreo cookie 0.956 probabilityOreo cookie 0.995 probability With a few more examples, the AI gets more accurate.
  • 13. Microservices at Clarifai 01• Transition from v1 to v2 microservice architecture • Benefits: Decouple functionality Scale individual services Parallel development and testing IPC/intranode communication replaced with the cost of extra network latency and serialization
  • 14. Kubernetes at Clarifai 01• v1 architecture used a collection of Ansible scripts with AWS AMIs for deployment • Transition off of virtual machine AMIs to Docker image • Facilitated by our move to microservices Many advantages to Kubernetes: 1. Speed up operations – CI/CD 2. Improved automation 3. Simplify management tools (Helm) 4. Improve security 5. Increase productivity
  • 15. Microservices and Messaging 01• Communication among microservices required a messaging middleware • Already using GRPCs to define services • Problems: not easily extensible and requires client-side compilation of protobufs Requirements: 1. Lightweight 2. Easy to configure and deploy on Kubernetes 3. Message persistence 4. Message queueing 5. Active community
  • 16. Deciding on the Messaging System 01• Considered many options Amazon SQS, Kafka, RabbitMQ, NATS • NATS: 1. Lightweight – Docker image size only a few MB, memory footprint in production less than 50 MB 2. Great documentation and easy to configure and deploy to Kubernetes cluster 3. Message persistence with file store (Kubernetes volumes) 4. Message queueing with exact source ordering (and at-least-once-delivery with NATS streaming) 5. Active Slack community 6. High messaging throughput and minimal latency 7. Written in Go 8. Message replay
  • 17. Architecture @ Clarifai with NATS 01
  • 18. Use Case of NATS @ Clarifai 01• Job queue for microservice workers • Both fast and slow subscribers • Trigger certain actions in services • Message persistence during rolling continuous deployments • Controlling the flow of messages to certain services (“pausing” subscriptions)
  • 19. Monitoring NATS Streaming Subscriptions 01
  • 20. Results 01• Implemented NATS into our backend in three weeks with one service • In production for 5 months, currently used by five different services and growing • 100% uptime with NATS • 100k+ messages sent through NATS per day
  • 21. Lessons Learned 01• Problems with NATS mainly stemmed from how we were using it • Manual acking caused lots of issues and harder to get right, reverted to automatic acking for many subscriptions • Hard to monitor if there are many subscriptions (inject metrics and monitor the queue using separate service) Feature Requests • Dead-letter queue • Flushing/invalidating messages from specific subscriptions • Clustering