SlideShare a Scribd company logo
3
Most read
14
Most read
16
Most read
Building Production
Platform for
Large-Scale
Recommendation
Applications
Xu Ning
Snap
About Me
● Director of Engineering, ML Platform at Snap
● (prev.) Uber Michelangelo ML Platform, Horovod Project
● (prev.) big data and infrastructure at Uber, Facebook, Akamai, Microsoft
Bing
Recommendation applications examples
Search and Ads Short Videos
Feeds
Example architecture of recommendation
systems
“Embedding-based Retrieval with Two-Tower Models in Spotlight”, Snap Eng Blog, 6/6/2023
100 millions
thousands
10s of thousands
hundreds
a pageful
Approximate nearest
neighbor search
aka “vector search”
Two towers, dot product
Wide-and-deep,
DeepFM, DCN, DLRM,
Transformers
Rule-based
List-wise LTR
Example architecture of recommendation
systems
“Machine Learning for Snapchat Ad Ranking”, Snap Eng Blog, 2/11/2022
Multiple ranking paths
compete at auction
Example recommendation models
1. “Embedding-based Retrieval with Two-Tower Models in Spotlight”, Snap Eng Blog, 6/6/2023
2. “Machine Learning for Snapchat Ad Ranking”, Snap Eng Blog, 2/11/2022
Light ranking “L1” Heavy ranking “L2”
Unique technical challenges in
recommendation systems
● Data intensive
● Large model size and freshness
● High fanout inference
Volume
● DeepSeek V3 trained with 14.6 Trillion Tokens =~60 TB
● Recommendation model at a Snap trained with 1PB data (and continue to be
incrementally trained over time)
● Typically 1-epoch training to prevent overfitting
Variety
● Types: counter, categorical, ID, ID list, embeddings, sequence (array of objects)
● Aggregation dimensions: by entity, by cohort, by category, etc
Velocity
● Trillions of events processed per day in feature pipelines
● Event->available for serving in minutes
RecSys is data intensive
“Introducing Bento, Snap's ML Platform”, Snap Eng Blog, 1/28/2025
Example: Snap’s Robusta real-time feature
platform
“Speed Up Feature Engineering for Recommendation Systems”, Snap Eng Blog, 9/29/2022
Model size: “Scaling law” before it became a
buzzword popularized by LLMs
Meta’s recommendation model, 2024
Meta’s LLaMa 3.1 405b LLM, 2024
“Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, Lian et al, 2021
Training large RecSys models
“Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, Lian et al, 2021
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al, 2022
99% of weights
DeepFM
How fresh is fresh enough?
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al, 2022
High Fanout Inference
Compiling model for inference: User feature broadcast
● Train: (user_feature, document_feature) →label
● Inference: user_feature, [(document_feature)]
○ Need to broadcast user_feature at model compilation or inference server
Document feature fetching
● Each request may need to fetch 10s of 000s document features
○ 1TB/s read volume
Externalized Embedding serving
● 1TB model–cannot fit in memory
● In memory database/serving parameter server
“Introducing Bento, Snap's ML Platform”, Snap Eng Blog, 1/28/2025
Inference and online feature fetching for
RecSys
“Introducing Bento, Snap's ML Platform”, Snap Eng Blog, 1/28/2025
Closing words
● Recommendation systems have unique platform technology and operational
challenges due to scale, and complexity.
● It’s highly customized, and there is no clear cloud/open-source OOTB solution
at scale.
○ Kuaishou Persia (unmaintained), ByteDance Monolith (unmaintained)
○ Very challenging to adopt
● More on how Snap powers its recommendation applications:
https://eng.snap.com/introducing-bento
🍱
Snap ML Platform is hiring!
● Senior Principal Machine Learning Engineer, ML Platform
● Principal Machine Learning Engineer, ML Training Platform
● Principal Machine Learning Engineer, ML Inference Platform
● Principal Software Engineer, Machine Learning Infrastructure
● Manager, Software Engineering, Machine Learning Infrastructure, AI Training Platform
● Manager, Software Engineering, Full Stack
● Machine Learning Engineer, 5+ Years Experience
● Machine Learning Engineer, 3+ Years of Experience
● Staff Machine Learning Engineer, 8+ Years of Experience
● Staff Software Engineer, ML Infrastructure, 9+ Years of Experience
● Software Engineer, ML Infrastructure, 6+ Years of Experience
● Software Engineer, ML Infrastructure, 2+ Years of Experience
https://careers.snap.com/

More Related Content

PPTX
Serverless machine learning architectures at Helixa
PDF
C19013010 the tutorial to build shared ai services session 1
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
PDF
Building a Scalable and reliable open source ML Platform with MLFlow
PDF
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
PPT
Cloud Computing concepts and technologies
PPT
Cloud computingjun28
PPT
Cloud computingjun28
Serverless machine learning architectures at Helixa
C19013010 the tutorial to build shared ai services session 1
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Building a Scalable and reliable open source ML Platform with MLFlow
Enterprise Trends for Gen AI - Berkeley LLM AI Agents MOOC
Cloud Computing concepts and technologies
Cloud computingjun28
Cloud computingjun28

Similar to AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendation Applications (20)

PDF
Using Algorithmia to leverage AI and Machine Learning APIs
PDF
Google not all clouds are created equal - sap sapphire 2014 (1)
PDF
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
PDF
Managing Large Flask Applications On Google App Engine (GAE)
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
PDF
Scaling AI/ML with Containers and Kubernetes
PDF
Path to continuous delivery
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
PPT
CloudComputingJun28.ppt
PPT
CloudComputingJun28.ppt
PPT
Cloud Computing: Concepts, Technologies and Business Implications
PPT
CloudComputingJun28.ppt
PDF
Introduction to ML.NET
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PPSX
Microservices Docker Kubernetes Istio Kanban DevOps SRE
PDF
Tech leaders guide to effective building of machine learning products
PDF
Ml infra at an early stage
PPTX
Deploying ML models in the enterprise
PDF
Deconstructing a Machine Learning Pipeline with Virtual Data Lake
Using Algorithmia to leverage AI and Machine Learning APIs
Google not all clouds are created equal - sap sapphire 2014 (1)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Managing Large Flask Applications On Google App Engine (GAE)
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Infrastructure Agnostic Machine Learning Workload Deployment
Scaling AI/ML with Containers and Kubernetes
Path to continuous delivery
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
CloudComputingJun28.ppt
CloudComputingJun28.ppt
Cloud Computing: Concepts, Technologies and Business Implications
CloudComputingJun28.ppt
Introduction to ML.NET
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Tech leaders guide to effective building of machine learning products
Ml infra at an early stage
Deploying ML models in the enterprise
Deconstructing a Machine Learning Pipeline with Virtual Data Lake
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
PDF
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Ad

Recently uploaded (20)

DOCX
The Five Best AI Cover Tools in 2025.docx
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Digital Strategies for Manufacturing Companies
PDF
AI in Product Development-omnex systems
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
System and Network Administraation Chapter 3
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
top salesforce developer skills in 2025.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
medical staffing services at VALiNTRY
PPTX
Transform Your Business with a Software ERP System
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
ISO 45001 Occupational Health and Safety Management System
The Five Best AI Cover Tools in 2025.docx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Digital Strategies for Manufacturing Companies
AI in Product Development-omnex systems
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Materi-Enum-and-Record-Data-Type (1).pptx
Materi_Pemrograman_Komputer-Looping.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
System and Network Administraation Chapter 3
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
top salesforce developer skills in 2025.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Operating system designcfffgfgggggggvggggggggg
medical staffing services at VALiNTRY
Transform Your Business with a Software ERP System
Wondershare Filmora 15 Crack With Activation Key [2025
ISO 45001 Occupational Health and Safety Management System

AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendation Applications

  • 2. About Me ● Director of Engineering, ML Platform at Snap ● (prev.) Uber Michelangelo ML Platform, Horovod Project ● (prev.) big data and infrastructure at Uber, Facebook, Akamai, Microsoft Bing
  • 3. Recommendation applications examples Search and Ads Short Videos Feeds
  • 4. Example architecture of recommendation systems “Embedding-based Retrieval with Two-Tower Models in Spotlight”, Snap Eng Blog, 6/6/2023 100 millions thousands 10s of thousands hundreds a pageful Approximate nearest neighbor search aka “vector search” Two towers, dot product Wide-and-deep, DeepFM, DCN, DLRM, Transformers Rule-based List-wise LTR
  • 5. Example architecture of recommendation systems “Machine Learning for Snapchat Ad Ranking”, Snap Eng Blog, 2/11/2022 Multiple ranking paths compete at auction
  • 6. Example recommendation models 1. “Embedding-based Retrieval with Two-Tower Models in Spotlight”, Snap Eng Blog, 6/6/2023 2. “Machine Learning for Snapchat Ad Ranking”, Snap Eng Blog, 2/11/2022 Light ranking “L1” Heavy ranking “L2”
  • 7. Unique technical challenges in recommendation systems ● Data intensive ● Large model size and freshness ● High fanout inference
  • 8. Volume ● DeepSeek V3 trained with 14.6 Trillion Tokens =~60 TB ● Recommendation model at a Snap trained with 1PB data (and continue to be incrementally trained over time) ● Typically 1-epoch training to prevent overfitting Variety ● Types: counter, categorical, ID, ID list, embeddings, sequence (array of objects) ● Aggregation dimensions: by entity, by cohort, by category, etc Velocity ● Trillions of events processed per day in feature pipelines ● Event->available for serving in minutes RecSys is data intensive “Introducing Bento, Snap's ML Platform”, Snap Eng Blog, 1/28/2025
  • 9. Example: Snap’s Robusta real-time feature platform “Speed Up Feature Engineering for Recommendation Systems”, Snap Eng Blog, 9/29/2022
  • 10. Model size: “Scaling law” before it became a buzzword popularized by LLMs Meta’s recommendation model, 2024 Meta’s LLaMa 3.1 405b LLM, 2024 “Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, Lian et al, 2021
  • 11. Training large RecSys models “Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, Lian et al, 2021 “Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al, 2022 99% of weights DeepFM
  • 12. How fresh is fresh enough? “Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al, 2022
  • 13. High Fanout Inference Compiling model for inference: User feature broadcast ● Train: (user_feature, document_feature) →label ● Inference: user_feature, [(document_feature)] ○ Need to broadcast user_feature at model compilation or inference server Document feature fetching ● Each request may need to fetch 10s of 000s document features ○ 1TB/s read volume Externalized Embedding serving ● 1TB model–cannot fit in memory ● In memory database/serving parameter server “Introducing Bento, Snap's ML Platform”, Snap Eng Blog, 1/28/2025
  • 14. Inference and online feature fetching for RecSys “Introducing Bento, Snap's ML Platform”, Snap Eng Blog, 1/28/2025
  • 15. Closing words ● Recommendation systems have unique platform technology and operational challenges due to scale, and complexity. ● It’s highly customized, and there is no clear cloud/open-source OOTB solution at scale. ○ Kuaishou Persia (unmaintained), ByteDance Monolith (unmaintained) ○ Very challenging to adopt ● More on how Snap powers its recommendation applications: https://eng.snap.com/introducing-bento 🍱
  • 16. Snap ML Platform is hiring! ● Senior Principal Machine Learning Engineer, ML Platform ● Principal Machine Learning Engineer, ML Training Platform ● Principal Machine Learning Engineer, ML Inference Platform ● Principal Software Engineer, Machine Learning Infrastructure ● Manager, Software Engineering, Machine Learning Infrastructure, AI Training Platform ● Manager, Software Engineering, Full Stack ● Machine Learning Engineer, 5+ Years Experience ● Machine Learning Engineer, 3+ Years of Experience ● Staff Machine Learning Engineer, 8+ Years of Experience ● Staff Software Engineer, ML Infrastructure, 9+ Years of Experience ● Software Engineer, ML Infrastructure, 6+ Years of Experience ● Software Engineer, ML Infrastructure, 2+ Years of Experience https://careers.snap.com/