SlideShare a Scribd company logo
Machine Learning Infrastructure at Condé Nast
James Evers & Max Cantor
Scaling Production Machine Learning Pipelines with Databricks
Scaling Production Machine Learning Pipelines with Databricks
Scaling Production Machine Learning Pipelines with Databricks
110M+
Monthly Unique Visitors
800M+
Monthly Page Views
100K+
Monthly New Or Revised
Pieces of Content
Digital Scale at Condé Nast
Landscape of MLOps
- Datasets
Landscape of MLOps
- Datasets
- Modeling
Landscape of MLOps
- Datasets
- Modeling
- Inference
Landscape of MLOpsLandscape of MLOps
- Datasets
- Modeling
- Inference
- Tracking
MLOps Across the Industry
- Michelangelo & Ludwig (Uber)
- FBLearner (Facebook)
- Noether (Spotify)
- Bighead (Airbnb)
- Metaflow (Netflix)
MLOps at
Condé Nast
Motivations
- Recommendations &
Personalization
- Advertisement & User
Segmentation
- Content Understanding
- Audience Forecasting
- Search Optimization
Challenges
● Facts of the media industry
○ low on-site times (engagement and bounce rate)
○ one and done’s (cold start)
○ Dependence on social media
● Business vs Engineering interests
● Legacy infrastructure requirements
● Staffing & overall technical resources
Features
Labeled Data
Dataprep
Spire Workflow Stages
Model
Train
Scores
Predict
Thresholds
Aleph
- User-level behavioral feature space
- Content-derived feature generation
- Vetted feature pipelines for models
Features
Labeled Data
Aleph
- Model interface standardization
- Model serialization
- Model versioning & tracking
- Hyperparameter tuning
Model
Kalos
Kalos
ML as a
Software
Product
Execution Environments
- Astronomer / Airflow
- Databricks
- Local Development
Spire Command-Line Interface
- Development
- User Queries
- Models
- Execution
API Wrappers
- Bridges Airflow Operators and Spark Jobs
- Close integration with Databricks APIs
- Development iteration, speed, and stability in processes
- Spire versioning
- Spire releases
- GitHub/repository
- CI/CD
Release
Management
Future
- Model reduction
- Model architecture enhancement
- Extend abstraction flexibility to feature sets
- In-house library consolidation
Thank you!

More Related Content

PDF
Operationalizing Machine Learning at Scale at Starbucks
PDF
Enabling Real-Time Analytics for IoT
PPTX
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
PDF
Building the Ideal Stack for Machine Learning
PDF
The Fast Path to Building Operational Applications with Spark
PDF
Machines and the Magic of Fast Learning
PDF
Driving the On-Demand Economy with Predictive Analytics
PPTX
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Operationalizing Machine Learning at Scale at Starbucks
Enabling Real-Time Analytics for IoT
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
Building the Ideal Stack for Machine Learning
The Fast Path to Building Operational Applications with Spark
Machines and the Magic of Fast Learning
Driving the On-Demand Economy with Predictive Analytics
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

What's hot (20)

PDF
Pinterest - Big Data Machine Learning Platform at Pinterest
PDF
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
PPTX
Driving the On-Demand Economy with Predictive Analytics
PPTX
Spark Summit East Keynote by Anjul Bhambhri
PPTX
Zero Downtime App Deployment using Hadoop
PPTX
Real-Time Analytics with MemSQL and Spark
PPTX
CTO View: Driving the On-Demand Economy with Predictive Analytics
PDF
Building an IoT Kafka Pipeline in Under 5 Minutes
PDF
Building the Ideal Stack for Real-Time Analytics
PPTX
Spark Summit Keynote by Shaun Connolly
PPTX
In-Memory Computing Webcast. Market Predictions 2017
PDF
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
PDF
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
PPTX
Snaplogic Live: Big Data in Motion
PDF
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
PDF
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
PDF
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
PDF
Saving Energy in Homes with a Unified Approach to Data and AI
PDF
Democratizing Data
Pinterest - Big Data Machine Learning Platform at Pinterest
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Driving the On-Demand Economy with Predictive Analytics
Spark Summit East Keynote by Anjul Bhambhri
Zero Downtime App Deployment using Hadoop
Real-Time Analytics with MemSQL and Spark
CTO View: Driving the On-Demand Economy with Predictive Analytics
Building an IoT Kafka Pipeline in Under 5 Minutes
Building the Ideal Stack for Real-Time Analytics
Spark Summit Keynote by Shaun Connolly
In-Memory Computing Webcast. Market Predictions 2017
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Snaplogic Live: Big Data in Motion
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Saving Energy in Homes with a Unified Approach to Data and AI
Democratizing Data
Ad

Similar to Scaling Production Machine Learning Pipelines with Databricks (20)

PDF
How Machine Learning is Shaping Digital Marketing
PDF
Big data and AI in Socialbakers
PPTX
FUTURE OF PUBLISHING IN THE DATA ERA
PDF
Get more from your Machine Data with Splunk AI and ML
PDF
Get more from your Machine Date with Splunk AI and ML
PPSX
Mediasmith David L Smith Digiday Hypebuster 5.14.12
PDF
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
PDF
Splunk AI & Machine Learning Roundtable 2019 - Zurich
PPTX
[DSC Europe 23] Nikola Milojevic - ScalableAI.pptx
PDF
Productionising Machine Learning Models
PDF
AI meets Big Data
PPSX
Mediasmith: Hypebusters: "Automated Media Technology"
PPTX
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
PDF
ML Model Serving at Twitter
PDF
Deploying Large Spark Models to production and model scoring in near real time
PDF
Ml infra at an early stage
PDF
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
PDF
Ideas spracklen-final
PPTX
Model deployment made easy with PMML
PDF
World Artificial Intelligence Conference Shanghai 2018
How Machine Learning is Shaping Digital Marketing
Big data and AI in Socialbakers
FUTURE OF PUBLISHING IN THE DATA ERA
Get more from your Machine Data with Splunk AI and ML
Get more from your Machine Date with Splunk AI and ML
Mediasmith David L Smith Digiday Hypebuster 5.14.12
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Splunk AI & Machine Learning Roundtable 2019 - Zurich
[DSC Europe 23] Nikola Milojevic - ScalableAI.pptx
Productionising Machine Learning Models
AI meets Big Data
Mediasmith: Hypebusters: "Automated Media Technology"
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
ML Model Serving at Twitter
Deploying Large Spark Models to production and model scoring in near real time
Ml infra at an early stage
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Ideas spracklen-final
Model deployment made easy with PMML
World Artificial Intelligence Conference Shanghai 2018
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
[EN] Industrial Machine Downtime Prediction
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Fluorescence-microscope_Botany_detailed content
PDF
annual-report-2024-2025 original latest.
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction-to-Cloud-ComputingFinal.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
[EN] Industrial Machine Downtime Prediction
Reliability_Chapter_ presentation 1221.5784
Fluorescence-microscope_Botany_detailed content
annual-report-2024-2025 original latest.
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Database Infoormation System (DBIS).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Qualitative Qantitative and Mixed Methods.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Mega Projects Data Mega Projects Data
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Scaling Production Machine Learning Pipelines with Databricks