SlideShare a Scribd company logo
Scaling machine learning to
millions of users
with Apache Beam
Tatiana Al-Chueyr
Principal Data Engineer @ BBC Datalab
Online, 4 August 2021
@tati_alchueyr
● Brazilian living in London UK since 2014
● Principal Data Engineer at the BBC (Datalab team)
● Graduated in Computer Engineering at Unicamp
● Software developer for 18 years
● Passionate about open-source
Apache Beam user since early 2019
BBC.datalab.hummingbirds
The knowledge in this presentation is the result of lots of teamwork
within one squad of a larger team and even broader organisation
current squad team members
previous squad team members
Darren
Mundy
David
Hollands
Richard
Bownes
Marc
Oppenheimer
Bettina
Hermant
Tatiana
Al-Chueyr
Jana
Eggink
some
business context
business context goal
to personalise the experience of millions of users of BBC Sounds
to build a replacement for an external third-party recommendation engine
business context numbers
BBC Sounds has approximately
● 200,000 podcast and music episodes
● 6.5 millions of users
The personalised rails (eg. Recommended for You) display:
● 9 episodes (smartphones) or
● 12 episodes (web)
business context problem visualisation
it is similar to finding the best match among 20,000 items per user x 65 million times
business context product rules
The recommendations must also comply to the BBC
product and editorial rules, such as:
● Diversification: no more than one item per brand
● Recency: no news episodes older than 24 hours
● Narrative arc: next drama series episode
● Language: Gaelic items to Gaelic listeners
● Availability: only available content
● Exclusion: shipping forecast and soap-opera
technology & architecture
overview
technology overview
● Python
● Google Cloud Platform
● Apache Airflow
● Apache Beam (Dataflow Runner)
● LightFM Factorisation Machine model
architecture overview
User
activity
Content
metadata
Train Model
Artefacts
Predict
Extract &
Transform
Extract &
Transform
User
activity
features
Content
metadata
features
Filtered
Predictions
Apply
rules
Predictions
historical data future
risk analysis predict on the fly
model
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations SLA goal
1500 reqs/s
< 60 ms
risk analysis predict on the fly
On the fly Precomputed Precomputed
Concurrent load tests
requests/s
50 50 1500
Success percentage 63.88% 100% 100%
Latency of p50 (success) 323.78 ms 1.68 ms 4.75 ms
Latency of p95 (success) 939.28 ms 3.21 ms 57.53 ms
Latency of p99 (success) 979.24 ms 4.51 ms 97.49 ms
Maximum successful
requests per second
23 50 1500
Machine type: c2-standard-8, Python 3.7, Sanic workers: 7, Prediction threads: 1, vCPU cores: 7, Memory: 15 Gi, Deployment Replicas: 1
risk analysis predict on the fly
model
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations SLA goal
1500 reqs/s
< 60 ms
risk analysis precompute recommendations
cost estimate: ~ US$ 10.00 run
Estimate of time (seconds) to precompute recommendations
analysis using c2-standard-30 (30 vCPU and 120 RAM) and LightFM
risk analysis sorting recommendations
sort 100k predictions per user with pure Python did not seem efficient
architecture overview
User
activity
Content
metadata
Train Model
Artefacts
Predict
Extract &
Transform
Extract &
Transform
User
activity
features
Content
metadata
features
Filtered
Predictions
Apply
rules
Predictions
historical data future
architecture overview
User
activity
Content
metadata
Train Model
Artefacts
Predict
Extract &
Transform
Extract &
Transform
User
activity
features
Content
metadata
features
Filtered
Predictions
Apply
rules
Predictions
where we used Apache Beam
historical data future
architecture overview
User activity data Content metadata
Business Rules, part I - Non-personalised
- Recency
- Availability
- Excluded Masterbrands
- Excluded genres
Business Rules, part II - Personalised
- Already seen items
- Local radio (if not consumed previously)
- Specific language (if not consumed previously)
- Episode picking from a series
- Diversification (1 episode per brand/series)
Precomputed
recommendations
Machine Learning model
training
Predict recommendations
precompute recommendations
pipeline evolution
pipeline 1.0 design & arguments
August 2020
apache-beam[gcp]==2.15.0
--runner=DataflowRunner
--machine-type = n1-standard-1 (1 vCPU & 3.75 GB RAM)
--num_workers=10
--autoscaling_algorithm=NONE
pipeline 1.0 design
August 2020
pipeline 1.0 design
August 2020
pipeline 1.0 error when running in dev & prod
August 2020
Workflow failed. Causes: S05:Read non-cold start
users/Read+Retrieve user ids+Predict+Keep best scores+Sort
scores+Process predictions+Group activity history and
recommendations/pair_with_recommendations+Group activity
history and recommendations/GroupByKey/Reify+Group activity
history and recommendations/GroupByKey/Write failed., The job
failed because a work item has failed 4 times. Look in previous log
entries for the cause of each one of the 4 failures. For more
information, see
https://cloud.google.com/dataflow/docs/guides/common-errors.
The work item was attempted on these workers:
beamapp-al-cht01-08141052-08140353-1tqj-harness-0k4v
Root cause: The worker lost contact with the service.,
beamapp-al-cht01-08141052-08140353-1tqj-harness-0k4v
Root cause: The worker lost contact with the service.,
beamapp-al-cht01-08141052-08140353-1tqj-harness-ffqv
Root cause: The worker lost contact with the service.,
beamapp-al-cht01-08141052-08140353-1tqj-harness-cjht
Root cause: The worker lost contact with the service.
pipeline 1.0 data analysis
August 2020
1. Change machine type to a larger one
○ --machine_type=custom-1-6656 (1 vCPU, 6.5 GB RM) - 6.5GB RAM /core
○ --machine_type=m1-ultramem-40 (40 vCPU, 961 GB RAM) - 24GB RAM/core
2. Refactor the pipeline
3. Reshuffle => too expensive for the operation we were doing
○ Shuffle service
○ Reshuffle function
4. Increase the amount of workers
○ --num_workers=40
pipeline 1.0 attempts to fix (i)
September 2020
5. Control the parallelism in Dataflow so the VM wouldn’t starve out of memory
pipeline 1.0 attempts to fix (ii)
Worker node (VM)
SDK Worker
Harness Threads
SDK Worker
Harness Threads
Worker node (VM)
SDK Worker
Harness Threads
Worker node (VM)
SDK Worker
Harness Threads
Harness Threads
--number_of_worker_harness_threads=1
--experiments=use_runner_v2
(or)
--sdk_worker_parallelism
--experiments=no_use_multiple_sdk_containers
--experiments=beam_fn_api
September 2020
pipeline 1.0 attempts to fix (iii)
https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
pipeline 1.0 attempts to fix (iii)
https://twitter.com/tati_alchueyr/status/1301152715498758146
https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipelines
pipeline 1.0 attempts to fix (iii)
https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
pipeline 1.0 attempts to fix (iii)
https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
pipeline 2.0 design & arguments
apache-beam== 2.24
--runner=DataflowRunner
--machine-type = custom-30-460800-ext
--num_workers= 40
--autoscaling_algorithm=NONE
September 2020
pipeline 2.0 business outcomes
● +59% increase in interactions in Recommended for You rail
● +103% increase in interactions for under 35s
internal external
September 2020
pipeline 2.0 issues
● but costs were high...
£ 279.31 per run
September 2020
pipeline 2.0 issues
OSError: [Errno 28] No space left on device During handling
March 2021
pipeline 2.0 issues
If a batch job uses Dataflow Shuffle, then the default is 25 GB;
otherwise, the default is 250 GB. March 2021
pipeline 2.0 issues
apache-beam== 2.24
--runner=DataflowRunner
--machine-type = custom-30-460800-ext
--num_workers= 40
--autoscaling_algorithm=NONE
--experiments=shuffle_mode=appliance
March 2021
cost savings plan
1. Administer pain relief 2. Hook up to bypass 3. Heart surgery
➔ Attempt shared
memory
➔ Attempt FlexRS
➔ Mid week delta (only
compute mid week for
users with activity
since Sunday’s run)
➔ Split pipeline
➔ Major refactor
➔ SCANN vs
LightFM.score()
➔ etc.
Timebox: 1 week Timebox: 2 weeks Timebox: 1 month
April 2021
pipeline 3.0 design
apache-beam== 2.24
--runner=DataflowRunner
--machine-type = custom-30-460800-ext
--num_workers= 40
--autoscaling_algorithm=NONE
--experiments=shuffle_mode=appliance
April 2021
pipeline 3.0 shared memory & FlexRS strategy
● Used production-representative data (model, auxiliary data structures)
● Ran the pipeline for 0.5% users, so the iterations would be cheap
○ 100% users: £ 266.74
○ 0.5% users: £ 80.54
● Attempts
○ Shared model using custom-30-460800-ext (15 GB/vCPU)
○ Shared model using custom-30-299520-ext (9.75 GB/vCPU)
○ Shared model using custom-6-50688-ext (8.25 GB/vCPU)
■ 0.5% users: £ 18.46 => -77.5% cost reduction!
May 2021
pipeline 3.0 shared memory & FlexRS results
● However, when we tried to run the same pipeline for 100%, it would take
hours and not complete.
● It was very inefficient and costed more than the initial implementation.
May 2021
pipeline 4.0 heart surgery
● Split compute predictions from applying rules
● Keep the interfaces to a minimal
○ between these two pipelines
○ between steps within the same pipeline
June 2021
pipeline 4.1 precompute recommendations
apache-beam== 2.29
--runner=DataflowRunner
--machine-type = n1-highmem-16
--flexrs-goal = COST_OPTIMIZED
--max-num-workers= 64
--number-of-worker-harness-threads=7
--experiments=use_runner_v2
+ Batching
+ Shared memory
https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipelines
July 2021
pipeline 4.1 precompute recommendations
Cost to run for 3.5 million users:
● 100k episodes: £ 48.92 / run
● 300 episodes: £ 3.40
● 18 episodes: £0.74
July 2021
pipeline 4.2 apply business rules
apache-beam== 2.29
--runner=DataflowRunner
--machine-type = n1-standard-1
--experiments=use_runner_v2
+ Implemented rules natively
+ Created minimal interfaces and
views of the data
July 2021
pipeline 4.2 apply business rules
Cost to run for 3.5 million users:
● £ 0.15 - 0.83 per run
July 2021
pipeline 4.0 heart surgery
● We were able to reduce the cost of the most expensive run of the pipeline
from £ 279.31 per run to less than £ 50
● Reduced the costs to -82%
July 2021
takeaways
1. plan based on your data
2. an expensive machine learning pipeline is better than none
3. reducing the scope is a good starting point to saving money
○ Apply non-personalised rules before iterating per user
○ Sort top 1k recommendations by user opposed to 100k
4. using custom machine types might limit other cost savings
○ Such as FlexRS (schedulable preemptible instances in Dataflow only work)
5. to use shared memory may not lead to cost savings
6. minimal interfaces lead to more predictable behaviours in Dataflow
7. splitting the pipeline can be a solution to costs
takeaways
Thank you!
@tati_alchueyr

More Related Content

PDF
Crafting APIs
PDF
PyConUK 2018 - Journey from HTTP to gRPC
PDF
PythonBrasil[8] - CPython for dummies
PDF
Precomputing recommendations with Apache Beam
PDF
From an idea to production: building a recommender for BBC Sounds
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
PDF
H2O 3 REST API Overview
PDF
H2O World - Intro to R, Python, and Flow - Amy Wang
Crafting APIs
PyConUK 2018 - Journey from HTTP to gRPC
PythonBrasil[8] - CPython for dummies
Precomputing recommendations with Apache Beam
From an idea to production: building a recommender for BBC Sounds
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
H2O 3 REST API Overview
H2O World - Intro to R, Python, and Flow - Amy Wang

What's hot (20)

PDF
Introduction to Apache Beam
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
PDF
ROCm and Distributed Deep Learning on Spark and TensorFlow
PDF
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
PDF
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
PDF
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PDF
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
PDF
Capacity Planning Infrastructure for Web Applications (Drupal)
PPTX
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
PDF
Introduction to Apache Airflow - Data Day Seattle 2016
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
PDF
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
PPTX
Extending the Yahoo Streaming Benchmark
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
PPTX
Apache Storm and Oracle Event Processing for Real-time Analytics
PDF
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
PPTX
Functional Comparison and Performance Evaluation of Streaming Frameworks
PDF
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
PDF
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Introduction to Apache Beam
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
ROCm and Distributed Deep Learning on Spark and TensorFlow
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Capacity Planning Infrastructure for Web Applications (Drupal)
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
Introduction to Apache Airflow - Data Day Seattle 2016
Why apache Flink is the 4G of Big Data Analytics Frameworks
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
Extending the Yahoo Streaming Benchmark
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Apache Storm and Oracle Event Processing for Real-time Analytics
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Unified Batch and Real-Time Stream Processing Using Apache Flink
Functional Comparison and Performance Evaluation of Streaming Frameworks
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Ad

Similar to Scaling machine learning to millions of users with Apache Beam (20)

PDF
The Next Generation of Data Processing and Open Source
PDF
How to use Impala query plan and profile to fix performance issues
PDF
Nephele 2.0: How to get the most out of your Nephele results
PDF
000 237
PDF
Become a Performance Diagnostics Hero
PPTX
Enterprise application performance - Understanding & Learnings
PDF
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
PDF
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
PDF
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
PDF
Greenplum versus redshift and actian vectorwise comparison
PDF
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
PDF
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
PDF
Maximizing Database Tuning in SAP SQL Anywhere
PPT
Benchmarking PyCon AU 2011 v0
PDF
Ns0 157(6)
PPTX
QSpiders - Installation and Brief Dose of Load Runner
PDF
DevoxxUK: Optimizating Application Performance on Kubernetes
PDF
Go - techniques for writing high performance Go applications
PDF
Performance Tuning Oracle Weblogic Server 12c
PDF
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
The Next Generation of Data Processing and Open Source
How to use Impala query plan and profile to fix performance issues
Nephele 2.0: How to get the most out of your Nephele results
000 237
Become a Performance Diagnostics Hero
Enterprise application performance - Understanding & Learnings
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
Greenplum versus redshift and actian vectorwise comparison
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Maximizing Database Tuning in SAP SQL Anywhere
Benchmarking PyCon AU 2011 v0
Ns0 157(6)
QSpiders - Installation and Brief Dose of Load Runner
DevoxxUK: Optimizating Application Performance on Kubernetes
Go - techniques for writing high performance Go applications
Performance Tuning Oracle Weblogic Server 12c
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Ad

More from Tatiana Al-Chueyr (20)

PDF
PyData London - Scaling AI workloads with Ray & Airflow.pdf
PDF
dbt no Airflow: Como melhorar o seu deploy (de forma correta)
PDF
Integrating dbt with Airflow - Overcoming Performance Hurdles
PDF
Best Practices for Effectively Running dbt in Airflow
PDF
Integrating ChatGPT with Apache Airflow
PDF
Contributing to Apache Airflow
PDF
Clearing Airflow Obstructions
PPTX
Scaling machine learning workflows with Apache Beam
PDF
Responsible machine learning at the BBC
PDF
Powering machine learning workflows with Apache Airflow and Python
PPTX
Responsible Machine Learning at the BBC
PDF
Sprint cPython at Globo.com
PDF
QCon SP - recommended for you
PDF
PyConUK 2016 - Writing English Right
PDF
InVesalius: 3D medical imaging software
PDF
Automatic English text correction
PDF
Python packaging and dependency resolution
PDF
Rio info 2013 - Linked Data at Globo.com
PDF
PythonBrasil[8] closing
PDF
Linking the world with Python and Semantics
PyData London - Scaling AI workloads with Ray & Airflow.pdf
dbt no Airflow: Como melhorar o seu deploy (de forma correta)
Integrating dbt with Airflow - Overcoming Performance Hurdles
Best Practices for Effectively Running dbt in Airflow
Integrating ChatGPT with Apache Airflow
Contributing to Apache Airflow
Clearing Airflow Obstructions
Scaling machine learning workflows with Apache Beam
Responsible machine learning at the BBC
Powering machine learning workflows with Apache Airflow and Python
Responsible Machine Learning at the BBC
Sprint cPython at Globo.com
QCon SP - recommended for you
PyConUK 2016 - Writing English Right
InVesalius: 3D medical imaging software
Automatic English text correction
Python packaging and dependency resolution
Rio info 2013 - Linked Data at Globo.com
PythonBrasil[8] closing
Linking the world with Python and Semantics

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
MIND Revenue Release Quarter 2 2025 Press Release
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectroscopy.pptx food analysis technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
Reach Out and Touch Someone: Haptics and Empathic Computing

Scaling machine learning to millions of users with Apache Beam

  • 1. Scaling machine learning to millions of users with Apache Beam Tatiana Al-Chueyr Principal Data Engineer @ BBC Datalab Online, 4 August 2021
  • 2. @tati_alchueyr ● Brazilian living in London UK since 2014 ● Principal Data Engineer at the BBC (Datalab team) ● Graduated in Computer Engineering at Unicamp ● Software developer for 18 years ● Passionate about open-source Apache Beam user since early 2019
  • 3. BBC.datalab.hummingbirds The knowledge in this presentation is the result of lots of teamwork within one squad of a larger team and even broader organisation current squad team members previous squad team members Darren Mundy David Hollands Richard Bownes Marc Oppenheimer Bettina Hermant Tatiana Al-Chueyr Jana Eggink
  • 5. business context goal to personalise the experience of millions of users of BBC Sounds to build a replacement for an external third-party recommendation engine
  • 6. business context numbers BBC Sounds has approximately ● 200,000 podcast and music episodes ● 6.5 millions of users The personalised rails (eg. Recommended for You) display: ● 9 episodes (smartphones) or ● 12 episodes (web)
  • 7. business context problem visualisation it is similar to finding the best match among 20,000 items per user x 65 million times
  • 8. business context product rules The recommendations must also comply to the BBC product and editorial rules, such as: ● Diversification: no more than one item per brand ● Recency: no news episodes older than 24 hours ● Narrative arc: next drama series episode ● Language: Gaelic items to Gaelic listeners ● Availability: only available content ● Exclusion: shipping forecast and soap-opera
  • 10. technology overview ● Python ● Google Cloud Platform ● Apache Airflow ● Apache Beam (Dataflow Runner) ● LightFM Factorisation Machine model
  • 11. architecture overview User activity Content metadata Train Model Artefacts Predict Extract & Transform Extract & Transform User activity features Content metadata features Filtered Predictions Apply rules Predictions historical data future
  • 12. risk analysis predict on the fly model API API user activity content metadata cached recs A. On the fly B. Precompute predicts & applies rules retrieves pre-computed recommendations SLA goal 1500 reqs/s < 60 ms
  • 13. risk analysis predict on the fly On the fly Precomputed Precomputed Concurrent load tests requests/s 50 50 1500 Success percentage 63.88% 100% 100% Latency of p50 (success) 323.78 ms 1.68 ms 4.75 ms Latency of p95 (success) 939.28 ms 3.21 ms 57.53 ms Latency of p99 (success) 979.24 ms 4.51 ms 97.49 ms Maximum successful requests per second 23 50 1500 Machine type: c2-standard-8, Python 3.7, Sanic workers: 7, Prediction threads: 1, vCPU cores: 7, Memory: 15 Gi, Deployment Replicas: 1
  • 14. risk analysis predict on the fly model API API user activity content metadata cached recs A. On the fly B. Precompute predicts & applies rules retrieves pre-computed recommendations SLA goal 1500 reqs/s < 60 ms
  • 15. risk analysis precompute recommendations cost estimate: ~ US$ 10.00 run Estimate of time (seconds) to precompute recommendations analysis using c2-standard-30 (30 vCPU and 120 RAM) and LightFM
  • 16. risk analysis sorting recommendations sort 100k predictions per user with pure Python did not seem efficient
  • 17. architecture overview User activity Content metadata Train Model Artefacts Predict Extract & Transform Extract & Transform User activity features Content metadata features Filtered Predictions Apply rules Predictions historical data future
  • 18. architecture overview User activity Content metadata Train Model Artefacts Predict Extract & Transform Extract & Transform User activity features Content metadata features Filtered Predictions Apply rules Predictions where we used Apache Beam historical data future
  • 19. architecture overview User activity data Content metadata Business Rules, part I - Non-personalised - Recency - Availability - Excluded Masterbrands - Excluded genres Business Rules, part II - Personalised - Already seen items - Local radio (if not consumed previously) - Specific language (if not consumed previously) - Episode picking from a series - Diversification (1 episode per brand/series) Precomputed recommendations Machine Learning model training Predict recommendations
  • 21. pipeline 1.0 design & arguments August 2020 apache-beam[gcp]==2.15.0 --runner=DataflowRunner --machine-type = n1-standard-1 (1 vCPU & 3.75 GB RAM) --num_workers=10 --autoscaling_algorithm=NONE
  • 24. pipeline 1.0 error when running in dev & prod August 2020 Workflow failed. Causes: S05:Read non-cold start users/Read+Retrieve user ids+Predict+Keep best scores+Sort scores+Process predictions+Group activity history and recommendations/pair_with_recommendations+Group activity history and recommendations/GroupByKey/Reify+Group activity history and recommendations/GroupByKey/Write failed., The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers: beamapp-al-cht01-08141052-08140353-1tqj-harness-0k4v Root cause: The worker lost contact with the service., beamapp-al-cht01-08141052-08140353-1tqj-harness-0k4v Root cause: The worker lost contact with the service., beamapp-al-cht01-08141052-08140353-1tqj-harness-ffqv Root cause: The worker lost contact with the service., beamapp-al-cht01-08141052-08140353-1tqj-harness-cjht Root cause: The worker lost contact with the service.
  • 25. pipeline 1.0 data analysis August 2020
  • 26. 1. Change machine type to a larger one ○ --machine_type=custom-1-6656 (1 vCPU, 6.5 GB RM) - 6.5GB RAM /core ○ --machine_type=m1-ultramem-40 (40 vCPU, 961 GB RAM) - 24GB RAM/core 2. Refactor the pipeline 3. Reshuffle => too expensive for the operation we were doing ○ Shuffle service ○ Reshuffle function 4. Increase the amount of workers ○ --num_workers=40 pipeline 1.0 attempts to fix (i) September 2020
  • 27. 5. Control the parallelism in Dataflow so the VM wouldn’t starve out of memory pipeline 1.0 attempts to fix (ii) Worker node (VM) SDK Worker Harness Threads SDK Worker Harness Threads Worker node (VM) SDK Worker Harness Threads Worker node (VM) SDK Worker Harness Threads Harness Threads --number_of_worker_harness_threads=1 --experiments=use_runner_v2 (or) --sdk_worker_parallelism --experiments=no_use_multiple_sdk_containers --experiments=beam_fn_api September 2020
  • 28. pipeline 1.0 attempts to fix (iii) https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
  • 29. pipeline 1.0 attempts to fix (iii) https://twitter.com/tati_alchueyr/status/1301152715498758146 https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipelines
  • 30. pipeline 1.0 attempts to fix (iii) https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
  • 31. pipeline 1.0 attempts to fix (iii) https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
  • 32. pipeline 2.0 design & arguments apache-beam== 2.24 --runner=DataflowRunner --machine-type = custom-30-460800-ext --num_workers= 40 --autoscaling_algorithm=NONE September 2020
  • 33. pipeline 2.0 business outcomes ● +59% increase in interactions in Recommended for You rail ● +103% increase in interactions for under 35s internal external September 2020
  • 34. pipeline 2.0 issues ● but costs were high... £ 279.31 per run September 2020
  • 35. pipeline 2.0 issues OSError: [Errno 28] No space left on device During handling March 2021
  • 36. pipeline 2.0 issues If a batch job uses Dataflow Shuffle, then the default is 25 GB; otherwise, the default is 250 GB. March 2021
  • 37. pipeline 2.0 issues apache-beam== 2.24 --runner=DataflowRunner --machine-type = custom-30-460800-ext --num_workers= 40 --autoscaling_algorithm=NONE --experiments=shuffle_mode=appliance March 2021
  • 38. cost savings plan 1. Administer pain relief 2. Hook up to bypass 3. Heart surgery ➔ Attempt shared memory ➔ Attempt FlexRS ➔ Mid week delta (only compute mid week for users with activity since Sunday’s run) ➔ Split pipeline ➔ Major refactor ➔ SCANN vs LightFM.score() ➔ etc. Timebox: 1 week Timebox: 2 weeks Timebox: 1 month April 2021
  • 39. pipeline 3.0 design apache-beam== 2.24 --runner=DataflowRunner --machine-type = custom-30-460800-ext --num_workers= 40 --autoscaling_algorithm=NONE --experiments=shuffle_mode=appliance April 2021
  • 40. pipeline 3.0 shared memory & FlexRS strategy ● Used production-representative data (model, auxiliary data structures) ● Ran the pipeline for 0.5% users, so the iterations would be cheap ○ 100% users: £ 266.74 ○ 0.5% users: £ 80.54 ● Attempts ○ Shared model using custom-30-460800-ext (15 GB/vCPU) ○ Shared model using custom-30-299520-ext (9.75 GB/vCPU) ○ Shared model using custom-6-50688-ext (8.25 GB/vCPU) ■ 0.5% users: £ 18.46 => -77.5% cost reduction! May 2021
  • 41. pipeline 3.0 shared memory & FlexRS results ● However, when we tried to run the same pipeline for 100%, it would take hours and not complete. ● It was very inefficient and costed more than the initial implementation. May 2021
  • 42. pipeline 4.0 heart surgery ● Split compute predictions from applying rules ● Keep the interfaces to a minimal ○ between these two pipelines ○ between steps within the same pipeline June 2021
  • 43. pipeline 4.1 precompute recommendations apache-beam== 2.29 --runner=DataflowRunner --machine-type = n1-highmem-16 --flexrs-goal = COST_OPTIMIZED --max-num-workers= 64 --number-of-worker-harness-threads=7 --experiments=use_runner_v2 + Batching + Shared memory https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipelines July 2021
  • 44. pipeline 4.1 precompute recommendations Cost to run for 3.5 million users: ● 100k episodes: £ 48.92 / run ● 300 episodes: £ 3.40 ● 18 episodes: £0.74 July 2021
  • 45. pipeline 4.2 apply business rules apache-beam== 2.29 --runner=DataflowRunner --machine-type = n1-standard-1 --experiments=use_runner_v2 + Implemented rules natively + Created minimal interfaces and views of the data July 2021
  • 46. pipeline 4.2 apply business rules Cost to run for 3.5 million users: ● £ 0.15 - 0.83 per run July 2021
  • 47. pipeline 4.0 heart surgery ● We were able to reduce the cost of the most expensive run of the pipeline from £ 279.31 per run to less than £ 50 ● Reduced the costs to -82% July 2021
  • 49. 1. plan based on your data 2. an expensive machine learning pipeline is better than none 3. reducing the scope is a good starting point to saving money ○ Apply non-personalised rules before iterating per user ○ Sort top 1k recommendations by user opposed to 100k 4. using custom machine types might limit other cost savings ○ Such as FlexRS (schedulable preemptible instances in Dataflow only work) 5. to use shared memory may not lead to cost savings 6. minimal interfaces lead to more predictable behaviours in Dataflow 7. splitting the pipeline can be a solution to costs takeaways