SlideShare a Scribd company logo
ML Infra at an
Early-Stage
Feature Services
Nick Handel, Head of Data Science
March 2019
- Many people
“Machine Learning is
99% Infrastructure”
Unfortunately the Infra
is really hard...
4
Where should you
start?
It will look
something like this
Big tech companies are
building incredible
infrastructure
8Source: Hidden Technical Debt in Machine Learning Systems
9
Source: Meet Michelangelo: Uber’s Machine Learning Platform
10Source: Bighead - Airbnb’s End-to-End Machine Learning Platform
11
What about the rest of us?
● Public solutions are lagging
○ Big Cloud providers aren’t providing end-to-end solutions
○ There is no enterprise solution that goes end-to-end
○ There is no widely-adopted open source solution
● The option set for the rest of us:
○ Buy pieces and combine
■ Requires engineering and money
■ Some pieces of infra didn’t have solutions: feature stores
○ Build
■ Requires engineering and may lead to tech debt with scale
12
Data is at the center of ML Infra
Connect to a range of
data sources
Monitor raw and
transformed data,
Monitor for feature drift
Collect and transform
features for testing
new model ideas
Share model outputs
as features in other
models
Cache production features
for training and validation
of point in time
correctness
Transform data
consistently between
inference and training
Backfill historical features
to test new ideas offline
(Not easy)Validate raw and
transformed data (types,
ranges, etc.)
Extract
Data
Build
Features
Train
Models
Monitor
Models
Serve
Models
Collect features for many
subjects (users, devices,
markets, etc.)
(duh)
1. Start basic
2. Build (or buy) a Feature Service
3. Mature the pieces that are
important to your business
The Feature Service
Simple Definition: Service for computing, and managing ML Data
In order of importance…
1. Framework
○ Reusable code
○ Consistency
○ Ease of development
2. Computation Engine
○ Service that builds features
○ Backfills new features for old inferences
3. Cache
○ Stores derived features
15
Defining a Feature Service
Feature
Repository
DynamoDB
Architecture
Write
Read
Inference
Training
Development
Feature
Service
Flask App
Write
Read
And for training
17
Life of a Feature
Inference Training Training
Model
Iteration
Feature
Iteration
Feature Repository
DynamoDB
Feature
Iteration
Validate point in time
correctness by
running training path
on previously
computed features
Calculate
and cache
features in
production
Use cached
features for
model
development
And for testing
new features
Calculate
features in
production
Train with new
features and
save them to
the cache
Flexible methods for
merge, join and concat
Everything is built on ABCs with
automated testing
As flexible as Python
Custom one-off
transforms
Features are built on versioned
extracts and transforms
Chain of
transformations
Multiple Features from
a single extract
Feature Definition
Defining Features
● Python is approachable and fast enough for our
inference needs (<10s)
● Keeps it simple
Versions
● Easy to manage at our stage
● Consistent transforms
● Different versions for different models
Transforms
● Reuseable!
● Organized: Filter, Map, Reduce
Testing
● Code works
● Production models don’t break
Feature Definition
Validate input and
output data of features Store transformed
features at the point of
inference for records
Track metrics on
features and monitor
for drift
20
Where we are today
Extract
Data
Build
Features
Train
Models
Monitor
Models
Serve
Models
Common Feature
Transformation Code
Features
accessible by
SQL
Backfill historical
features at specific
points in time (100%!!)
Enable Training on much
larger datasets with
previously computed features
Share model outputs as
features in other models
(learned features)
Prediction:
Feature stores will be the
centerpiece of everyone's ML
Infra in 3 years
The Team
Dave Bernthal
Dennis Van Der Staay
Spencer Barton
Ting Ting Liu
Thank You!
Nick Handel
nick@branch.co
@nick_handel
Appendix
25
Branch’s ML Problem
● Long Feedback Signals
○ Problem: We make loans and get signal back between 28 and 1 year
later
○ Solution: Make it possible to reconstruct
● Feature Drift
○ Problem: The way people use their mobile phones in developing
markets changes constantly
○ Solution: Store features and adjust for feature drift
● Many data sources and types
○ Problem: We collect data from a variety of sources and types (raw
text, network data, event streams, location, etc.)
○ Solutions: Build a system for feature construction that unifies
pipelines from different sources and types of transformations
● Learned Features
○ Model Storage is easy
○ Model Serving isn’t trivial
● Monitoring
○ Concept drift is one of our primary ML challenges
● Auto ML
○ Input labels and output model for production…
○ You already have the features!
26
What’s Next
27

More Related Content

PDF
Feature drift monitoring as a service for machine learning models at scale
PPTX
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
PPTX
ML-Ops: From Proof-of-Concept to Production Application
PDF
Seamless MLOps with Seldon and MLflow
PPTX
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
PDF
The Quest for an Open Source Data Science Platform
PDF
DevOps for DataScience
PDF
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...
Feature drift monitoring as a service for machine learning models at scale
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
ML-Ops: From Proof-of-Concept to Production Application
Seamless MLOps with Seldon and MLflow
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
The Quest for an Open Source Data Science Platform
DevOps for DataScience
Spark Summit EU 2017 - Preventing revenue leakage and monitoring distributed ...

What's hot (20)

PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
PDF
Machine learning model to production
PDF
Monitoring AI with AI
PDF
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
PDF
Data ops: Machine Learning in production
PDF
Challenges of Operationalising Data Science in Production
PDF
Vertex AI: Pipelines for your MLOps workflows
PPTX
Feature Store as a Data Foundation for Machine Learning
PDF
Building Data Science into Organizations: Field Experience
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PDF
Hamburg Data Science Meetup - MLOps with a Feature Store
PPTX
Machine Learning with Apache Spark
PDF
Machine learning in production
PDF
Importance of ML Reproducibility & Applications with MLfLow
PDF
Reproducible AI using MLflow and PyTorch
PDF
Ml ops past_present_future
PPTX
Machine Learning In Production
PDF
MLflow with R
PDF
MLOps with Kubeflow
PPTX
Richard Coffey (x18140785) - Research in Computing CA2
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Machine learning model to production
Monitoring AI with AI
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
Data ops: Machine Learning in production
Challenges of Operationalising Data Science in Production
Vertex AI: Pipelines for your MLOps workflows
Feature Store as a Data Foundation for Machine Learning
Building Data Science into Organizations: Field Experience
MLOps and Data Quality: Deploying Reliable ML Models in Production
Hamburg Data Science Meetup - MLOps with a Feature Store
Machine Learning with Apache Spark
Machine learning in production
Importance of ML Reproducibility & Applications with MLfLow
Reproducible AI using MLflow and PyTorch
Ml ops past_present_future
Machine Learning In Production
MLflow with R
MLOps with Kubeflow
Richard Coffey (x18140785) - Research in Computing CA2
Ad

Similar to Ml infra at an early stage (20)

PDF
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
PPTX
Serverless machine learning architectures at Helixa
PDF
Pitfalls of machine learning in production
PDF
Machine Learning Infrastructure
PPTX
Deploying ML models in the enterprise
PPTX
From Data Science to MLOps
PDF
Building a Scalable and reliable open source ML Platform with MLFlow
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
PDF
DevOps Days Rockies MLOps
PPTX
Open, Secure & Transparent AI Pipelines
PPTX
Model Drift Monitoring using Tensorflow Model Analysis
PDF
Sf big analytics: bighead
PDF
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
PPTX
The REMICS model-driven process for migrating legacy applications to the cloud
PDF
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
PDF
Building A Machine Learning Platform At Quora (1)
PDF
DutchMLSchool. ML for Energy Trading and Automotive Sector
PDF
Ai platform at scale
PPTX
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Serverless machine learning architectures at Helixa
Pitfalls of machine learning in production
Machine Learning Infrastructure
Deploying ML models in the enterprise
From Data Science to MLOps
Building a Scalable and reliable open source ML Platform with MLFlow
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
DevOps Days Rockies MLOps
Open, Secure & Transparent AI Pipelines
Model Drift Monitoring using Tensorflow Model Analysis
Sf big analytics: bighead
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
The REMICS model-driven process for migrating legacy applications to the cloud
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Building A Machine Learning Platform At Quora (1)
DutchMLSchool. ML for Energy Trading and Automotive Sector
Ai platform at scale
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
Infrastructure Agnostic Machine Learning Workload Deployment
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
A Presentation on Artificial Intelligence
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
A Presentation on Artificial Intelligence
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx

Ml infra at an early stage

  • 1. ML Infra at an Early-Stage Feature Services Nick Handel, Head of Data Science March 2019
  • 2. - Many people “Machine Learning is 99% Infrastructure”
  • 3. Unfortunately the Infra is really hard...
  • 4. 4
  • 7. Big tech companies are building incredible infrastructure
  • 8. 8Source: Hidden Technical Debt in Machine Learning Systems
  • 9. 9 Source: Meet Michelangelo: Uber’s Machine Learning Platform
  • 10. 10Source: Bighead - Airbnb’s End-to-End Machine Learning Platform
  • 11. 11 What about the rest of us? ● Public solutions are lagging ○ Big Cloud providers aren’t providing end-to-end solutions ○ There is no enterprise solution that goes end-to-end ○ There is no widely-adopted open source solution ● The option set for the rest of us: ○ Buy pieces and combine ■ Requires engineering and money ■ Some pieces of infra didn’t have solutions: feature stores ○ Build ■ Requires engineering and may lead to tech debt with scale
  • 12. 12 Data is at the center of ML Infra Connect to a range of data sources Monitor raw and transformed data, Monitor for feature drift Collect and transform features for testing new model ideas Share model outputs as features in other models Cache production features for training and validation of point in time correctness Transform data consistently between inference and training Backfill historical features to test new ideas offline (Not easy)Validate raw and transformed data (types, ranges, etc.) Extract Data Build Features Train Models Monitor Models Serve Models Collect features for many subjects (users, devices, markets, etc.) (duh)
  • 13. 1. Start basic 2. Build (or buy) a Feature Service 3. Mature the pieces that are important to your business
  • 15. Simple Definition: Service for computing, and managing ML Data In order of importance… 1. Framework ○ Reusable code ○ Consistency ○ Ease of development 2. Computation Engine ○ Service that builds features ○ Backfills new features for old inferences 3. Cache ○ Stores derived features 15 Defining a Feature Service
  • 17. Write Read And for training 17 Life of a Feature Inference Training Training Model Iteration Feature Iteration Feature Repository DynamoDB Feature Iteration Validate point in time correctness by running training path on previously computed features Calculate and cache features in production Use cached features for model development And for testing new features Calculate features in production Train with new features and save them to the cache
  • 18. Flexible methods for merge, join and concat Everything is built on ABCs with automated testing As flexible as Python Custom one-off transforms Features are built on versioned extracts and transforms Chain of transformations Multiple Features from a single extract Feature Definition
  • 19. Defining Features ● Python is approachable and fast enough for our inference needs (<10s) ● Keeps it simple Versions ● Easy to manage at our stage ● Consistent transforms ● Different versions for different models Transforms ● Reuseable! ● Organized: Filter, Map, Reduce Testing ● Code works ● Production models don’t break Feature Definition
  • 20. Validate input and output data of features Store transformed features at the point of inference for records Track metrics on features and monitor for drift 20 Where we are today Extract Data Build Features Train Models Monitor Models Serve Models Common Feature Transformation Code Features accessible by SQL Backfill historical features at specific points in time (100%!!) Enable Training on much larger datasets with previously computed features Share model outputs as features in other models (learned features)
  • 21. Prediction: Feature stores will be the centerpiece of everyone's ML Infra in 3 years
  • 22. The Team Dave Bernthal Dennis Van Der Staay Spencer Barton Ting Ting Liu
  • 25. 25 Branch’s ML Problem ● Long Feedback Signals ○ Problem: We make loans and get signal back between 28 and 1 year later ○ Solution: Make it possible to reconstruct ● Feature Drift ○ Problem: The way people use their mobile phones in developing markets changes constantly ○ Solution: Store features and adjust for feature drift ● Many data sources and types ○ Problem: We collect data from a variety of sources and types (raw text, network data, event streams, location, etc.) ○ Solutions: Build a system for feature construction that unifies pipelines from different sources and types of transformations
  • 26. ● Learned Features ○ Model Storage is easy ○ Model Serving isn’t trivial ● Monitoring ○ Concept drift is one of our primary ML challenges ● Auto ML ○ Input labels and output model for production… ○ You already have the features! 26 What’s Next
  • 27. 27