SlideShare a Scribd company logo
Context-aware Fast Food
Recommendation with Ray on
Apache Spark at Burger King
LUYANG WANG Burger King Corporation
KAI HUANG Intel Corporation
Agenda
LUYANG WANG
Food recommendation use case
TxT model in detail
KAI HUANG
AI on big data
Distributed training pipeline with Ray on
Apache Spark
Food Recommendation Use Case
Food Recommendation Use Case
Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board
Guest completes
order
Food Recommendation Use Case
Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board
Guest completes
order
Use Case Challenges
Lack of user identifiers
Same session food compatibilities
Other variables in our use case:
locations, weathers, time, etc.
Deployment challenges
Challenges
Use Case Challenges
Session based recommendation
model
Able to take complex context
features into consideration
Able to be deployed anywhere, both
edge / cloud
Solutions
Transformer Cross Transformer (TxT)
TxT Model Overview
Sequence Transformer
▪ Taking item order sequence as input
Context Transformer
▪ Taking multiple context features as
input
Latent Cross Joint Training
▪ Element-wise product for both
transformer outputs
Model Components
Model Comparison
RNN Latent CrossTxT
Offline Evaluation
Offline Training ResultOffline Training Loss
Model Top1 Accuracy Top3 Accuracy
RNN 29.98% 46.24%
Contextual ItemCF 32.18% 48.37%
RNN Latent Cross 33.10% 49.98%
TxT 34.52% 52.37%
Online Performance
A/B Testing ResultInference Performance
Model Conversation
Rate Gain
Add-on Sales
Gain
RNN Latent
Cross (control)
- -
TxT +7.5% +4.7%
18
20
0
5
10
15
20
25
RNN Latent Cross TxT
Inference Latency (ms)
Inference Latency (ms)
Model Training Architecture
CurrentPrevious
AI on Big Data
AI on Big Data
BigDL: Distributed Deep Learning Framework for Apache Spark
https://github.com/intel-analytics/BigDL
Analytics Zoo: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink &
Ray https://github.com/intel-analytics/analytics-zoo
We develop Project Orca in Analytics Zoo based on Spark and Ray to allow users to easily scale out
single node Python notebook across large clusters, by providing:
▪ Data-parallel preprocessing for Python AI (supporting common Python libraries such as Pandas, Numpy,
PIL, TensorFlow Dataset, PyTorch DataLoader, etc.)
▪ Sklearn-style APIs for transparently distributed training and inference (supporting TensorFlow, PyTorch,
Keras, MXNet, Horovod, etc.)
https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/orca
Accelerating Data Analytics + AI Solutions At Scale
Ray
Tune: Scalable Experiment Execution and Hyperparameter Tuning
RLlib: Scalable Reinforcement Learning
RaySGD: Distributed Training Wrappers
https://github.com/ray-project/ray/
Ray is a fast and simple framework for building and running distributed applications.
Ray is packaged with several high-level libraries to accelerate machine learning workloads.
Ray Core provides easy Python interface for parallelism by using remote functions and
actors.
Distributed Training Pipeline on Big Data
Runtime cluster environment preparation.
Create a SparkContext on the drive node and use
Spark to perform data cleaning, ETL, and
preprocessing tasks.
RayContext on Spark driver launches Ray across the
cluster.
Similar to RaySGD, we implement a lightweight shim
layer around native MXNet modules for easy
deployment on YARN cluster.
Each MXNet worker takes the local data partition of
Spark RDD or DataFrame from the plasma object
store used by Ray.
Seamlessly integrate Ray applications into Spark data processing pipelines.
RayOnSpark
Minimum code changes and learning efforts are needed to scale the training from single node
to big data clusters.
The entire pipeline runs on a single cluster. No extra data transfer needed.
Project Orca provides a user-friendly interface for the pipeline.
End-to-end Distributed Training Pipeline
from zoo.orca import init_orca_context
from zoo.orca.learn.mxnet import Estimator
# init_orca_context unifies SparkContext and RayContext
sc = init_orca_context(cluster_mode="yarn", num_nodes, cores, memory)
# Use sc to load data and do data preprocessing.
mxnet_estimator = Estimator(train_config, model=txt, loss=SoftmaxCrossEntropyLoss(),
metrics=[mx.metric.Accuracy(), mx.metric.TopKAccuracy(3)])
mxnet_estimator.fit(data=train_rdd, validation_data=val_rdd, epochs=…, batch_size=…)
Conclusion
Context-Aware Fast Food Recommendation at Burger King with RayOnSpark
https://arxiv.org/abs/2010.06197
https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king-
with-rayonspark-2e7a6009dd2d
For more details of RayOnSpark: https://databricks.com/session_na20/running-
emerging-ai-applications-on-big-data-platforms-with-ray-on-apache-spark
More information for Analytics Zoo at:
https://github.com/intel-analytics/analytics-zoo
https://analytics-zoo.github.io/
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

PDF
ภุชงคประยาคฉันท์ 12
PDF
บทที่ 2 โบราณคดีและประวัติศาสตร์ไทย2
PDF
หัวข้อที่ ๔ ยุคก่อนประวัติศาสตร์ในประเทศไทย
PDF
มัทนะพาธาNo.4
PDF
ศึกษาความรู้และความเข้าใจในการปฏิบัติวิปัสสนากรรมฐานตามแนวสติปัฏฐาน ๔ ของผู้ป...
PDF
PDF
หนังสือเรียนวิชาเรียงความแก้กระทู้ธรรม นักธรรมชั้นตรี แผนใหม่
ภุชงคประยาคฉันท์ 12
บทที่ 2 โบราณคดีและประวัติศาสตร์ไทย2
หัวข้อที่ ๔ ยุคก่อนประวัติศาสตร์ในประเทศไทย
มัทนะพาธาNo.4
ศึกษาความรู้และความเข้าใจในการปฏิบัติวิปัสสนากรรมฐานตามแนวสติปัฏฐาน ๔ ของผู้ป...
หนังสือเรียนวิชาเรียงความแก้กระทู้ธรรม นักธรรมชั้นตรี แผนใหม่

Similar to Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King (20)

PDF
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
PDF
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
PDF
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
PDF
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
PDF
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
PDF
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
PDF
Scaling machinelearning as a service at uber li Erran li - 2016
PDF
Scalable AutoML for Time Series Forecasting using Ray
PDF
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PDF
Recent Developments in Spark MLlib and Beyond
PDF
Nose Dive into Apache Spark ML
PPTX
In this final review ppt we have usecase diagrams
PDF
MLlib: Spark's Machine Learning Library
PDF
Scalable Data Science in Python and R on Apache Spark
PDF
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
PDF
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
PDF
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machinelearning as a service at uber li Erran li - 2016
Scalable AutoML for Time Series Forecasting using Ray
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Recent Developments in Spark MLlib and Beyond
Nose Dive into Apache Spark ML
In this final review ppt we have usecase diagrams
MLlib: Spark's Machine Learning Library
Scalable Data Science in Python and R on Apache Spark
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Lecture1 pattern recognition............
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Lecture1 pattern recognition............
Reliability_Chapter_ presentation 1221.5784
IB Computer Science - Internal Assessment.pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Business Acumen Training GuidePresentation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King

  • 1. Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King LUYANG WANG Burger King Corporation KAI HUANG Intel Corporation
  • 2. Agenda LUYANG WANG Food recommendation use case TxT model in detail KAI HUANG AI on big data Distributed training pipeline with Ray on Apache Spark
  • 4. Food Recommendation Use Case Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order
  • 5. Food Recommendation Use Case Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order
  • 6. Use Case Challenges Lack of user identifiers Same session food compatibilities Other variables in our use case: locations, weathers, time, etc. Deployment challenges Challenges
  • 7. Use Case Challenges Session based recommendation model Able to take complex context features into consideration Able to be deployed anywhere, both edge / cloud Solutions
  • 9. TxT Model Overview Sequence Transformer ▪ Taking item order sequence as input Context Transformer ▪ Taking multiple context features as input Latent Cross Joint Training ▪ Element-wise product for both transformer outputs Model Components
  • 11. Offline Evaluation Offline Training ResultOffline Training Loss Model Top1 Accuracy Top3 Accuracy RNN 29.98% 46.24% Contextual ItemCF 32.18% 48.37% RNN Latent Cross 33.10% 49.98% TxT 34.52% 52.37%
  • 12. Online Performance A/B Testing ResultInference Performance Model Conversation Rate Gain Add-on Sales Gain RNN Latent Cross (control) - - TxT +7.5% +4.7% 18 20 0 5 10 15 20 25 RNN Latent Cross TxT Inference Latency (ms) Inference Latency (ms)
  • 14. AI on Big Data
  • 15. AI on Big Data BigDL: Distributed Deep Learning Framework for Apache Spark https://github.com/intel-analytics/BigDL Analytics Zoo: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray https://github.com/intel-analytics/analytics-zoo We develop Project Orca in Analytics Zoo based on Spark and Ray to allow users to easily scale out single node Python notebook across large clusters, by providing: ▪ Data-parallel preprocessing for Python AI (supporting common Python libraries such as Pandas, Numpy, PIL, TensorFlow Dataset, PyTorch DataLoader, etc.) ▪ Sklearn-style APIs for transparently distributed training and inference (supporting TensorFlow, PyTorch, Keras, MXNet, Horovod, etc.) https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/orca Accelerating Data Analytics + AI Solutions At Scale
  • 16. Ray Tune: Scalable Experiment Execution and Hyperparameter Tuning RLlib: Scalable Reinforcement Learning RaySGD: Distributed Training Wrappers https://github.com/ray-project/ray/ Ray is a fast and simple framework for building and running distributed applications. Ray is packaged with several high-level libraries to accelerate machine learning workloads. Ray Core provides easy Python interface for parallelism by using remote functions and actors.
  • 18. Runtime cluster environment preparation. Create a SparkContext on the drive node and use Spark to perform data cleaning, ETL, and preprocessing tasks. RayContext on Spark driver launches Ray across the cluster. Similar to RaySGD, we implement a lightweight shim layer around native MXNet modules for easy deployment on YARN cluster. Each MXNet worker takes the local data partition of Spark RDD or DataFrame from the plasma object store used by Ray. Seamlessly integrate Ray applications into Spark data processing pipelines. RayOnSpark
  • 19. Minimum code changes and learning efforts are needed to scale the training from single node to big data clusters. The entire pipeline runs on a single cluster. No extra data transfer needed. Project Orca provides a user-friendly interface for the pipeline. End-to-end Distributed Training Pipeline from zoo.orca import init_orca_context from zoo.orca.learn.mxnet import Estimator # init_orca_context unifies SparkContext and RayContext sc = init_orca_context(cluster_mode="yarn", num_nodes, cores, memory) # Use sc to load data and do data preprocessing. mxnet_estimator = Estimator(train_config, model=txt, loss=SoftmaxCrossEntropyLoss(), metrics=[mx.metric.Accuracy(), mx.metric.TopKAccuracy(3)]) mxnet_estimator.fit(data=train_rdd, validation_data=val_rdd, epochs=…, batch_size=…)
  • 20. Conclusion Context-Aware Fast Food Recommendation at Burger King with RayOnSpark https://arxiv.org/abs/2010.06197 https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king- with-rayonspark-2e7a6009dd2d For more details of RayOnSpark: https://databricks.com/session_na20/running- emerging-ai-applications-on-big-data-platforms-with-ray-on-apache-spark More information for Analytics Zoo at: https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/
  • 21. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.