Scaling machine learning to millions of users with Apache Beam

Scaling machine learning to
millions of users
with Apache Beam
Tatiana Al-Chueyr
Principal Data Engineer @ BBC Datalab
Online, 4 August 2021

@tati_alchueyr
● Brazilian living in London UK since 2014
● Principal Data Engineer at the BBC (Datalab team)
● Graduated in Computer Engineering at Unicamp
● Software developer for 18 years
● Passionate about open-source
Apache Beam user since early 2019

BBC.datalab.hummingbirds
The knowledge in this presentation is the result of lots of teamwork
within one squad of a larger team and even broader organisation
current squad team members
previous squad team members
Darren
Mundy
David
Hollands
Richard
Bownes
Marc
Oppenheimer
Bettina
Hermant
Tatiana
Al-Chueyr
Jana
Eggink

business context goal
to personalise the experience of millions of users of BBC Sounds
to build a replacement for an external third-party recommendation engine

business context numbers
BBC Sounds has approximately
● 200,000 podcast and music episodes
● 6.5 millions of users
The personalised rails (eg. Recommended for You) display:
● 9 episodes (smartphones) or
● 12 episodes (web)

business context problem visualisation
it is similar to finding the best match among 20,000 items per user x 65 million times

business context product rules
The recommendations must also comply to the BBC
product and editorial rules, such as:
● Diversification: no more than one item per brand
● Recency: no news episodes older than 24 hours
● Narrative arc: next drama series episode
● Language: Gaelic items to Gaelic listeners
● Availability: only available content
● Exclusion: shipping forecast and soap-opera

technology & architecture
overview

technology overview
● Python
● Google Cloud Platform
● Apache Airflow
● Apache Beam (Dataflow Runner)
● LightFM Factorisation Machine model

architecture overview
User
activity
Content
metadata
Train Model
Artefacts
Predict
Extract &
Transform
Extract &
Transform
User
activity
features
Content
metadata
features
Filtered
Predictions
Apply
rules
Predictions
historical data future

risk analysis predict on the fly
model
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations SLA goal
1500 reqs/s
< 60 ms

risk analysis predict on the fly
On the fly Precomputed Precomputed
Concurrent load tests
requests/s
50 50 1500
Success percentage 63.88% 100% 100%
Latency of p50 (success) 323.78 ms 1.68 ms 4.75 ms
Maximum successful
requests per second
23 50 1500
Machine type: c2-standard-8, Python 3.7, Sanic workers: 7, Prediction threads: 1, vCPU cores: 7, Memory: 15 Gi, Deployment Replicas: 1

risk analysis precompute recommendations
cost estimate: ~ US$ 10.00 run
Estimate of time (seconds) to precompute recommendations
analysis using c2-standard-30 (30 vCPU and 120 RAM) and LightFM

risk analysis sorting recommendations
sort 100k predictions per user with pure Python did not seem efficient

User
activity
Content
metadata
Train Model
Artefacts
Predict
Extract &
Transform
Extract &
Transform
User
activity
features
Content
metadata
features
Filtered
Predictions
Apply
rules
Predictions
where we used Apache Beam
historical data future

User activity data Content metadata
Business Rules, part I - Non-personalised
- Recency
- Availability
- Excluded Masterbrands
- Excluded genres
Business Rules, part II - Personalised
- Already seen items
- Local radio (if not consumed previously)
- Speciﬁc language (if not consumed previously)
- Episode picking from a series
- Diversiﬁcation (1 episode per brand/series)
Precomputed
recommendations
Machine Learning model
training
Predict recommendations

precompute recommendations
pipeline evolution

pipeline 1.0 design & arguments
August 2020
apache-beam[gcp]==2.15.0
--runner=DataflowRunner
--machine-type = n1-standard-1 (1 vCPU & 3.75 GB RAM)
--num_workers=10
--autoscaling_algorithm=NONE

pipeline 1.0 design
August 2020

pipeline 1.0 error when running in dev & prod
August 2020
Workflow failed. Causes: S05:Read non-cold start
users/Read+Retrieve user ids+Predict+Keep best scores+Sort
scores+Process predictions+Group activity history and
recommendations/pair_with_recommendations+Group activity
history and recommendations/GroupByKey/Reify+Group activity
history and recommendations/GroupByKey/Write failed., The job
failed because a work item has failed 4 times. Look in previous log
entries for the cause of each one of the 4 failures. For more
information, see
https://cloud.google.com/dataflow/docs/guides/common-errors.
The work item was attempted on these workers:
beamapp-al-cht01-08141052-08140353-1tqj-harness-0k4v
Root cause: The worker lost contact with the service.,
beamapp-al-cht01-08141052-08140353-1tqj-harness-0k4v
beamapp-al-cht01-08141052-08140353-1tqj-harness-ffqv
beamapp-al-cht01-08141052-08140353-1tqj-harness-cjht
Root cause: The worker lost contact with the service.

pipeline 1.0 data analysis
August 2020

1. Change machine type to a larger one
○ --machine_type=custom-1-6656 (1 vCPU, 6.5 GB RM) - 6.5GB RAM /core
○ --machine_type=m1-ultramem-40 (40 vCPU, 961 GB RAM) - 24GB RAM/core
2. Refactor the pipeline
3. Reshuffle => too expensive for the operation we were doing
○ Shuffle service
○ Reshuffle function
4. Increase the amount of workers
○ --num_workers=40
pipeline 1.0 attempts to fix (i)
September 2020

5. Control the parallelism in Dataflow so the VM wouldn’t starve out of memory
pipeline 1.0 attempts to fix (ii)
Worker node (VM)
SDK Worker
Harness Threads
SDK Worker
Harness Threads
Worker node (VM)
SDK Worker
Harness Threads
Worker node (VM)
SDK Worker
Harness Threads
Harness Threads
--number_of_worker_harness_threads=1
--experiments=use_runner_v2
(or)
--sdk_worker_parallelism
--experiments=no_use_multiple_sdk_containers
--experiments=beam_fn_api
September 2020

pipeline 1.0 attempts to fix (iii)
https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline

pipeline 1.0 attempts to fix (iii)
https://twitter.com/tati_alchueyr/status/1301152715498758146
https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipelines

pipeline 2.0 design & arguments
apache-beam== 2.24
--machine-type = custom-30-460800-ext
--num_workers= 40
September 2020

pipeline 2.0 business outcomes
● +59% increase in interactions in Recommended for You rail
● +103% increase in interactions for under 35s
internal external
September 2020

pipeline 2.0 issues
● but costs were high...
£ 279.31 per run
September 2020

pipeline 2.0 issues
OSError: [Errno 28] No space left on device During handling
March 2021

pipeline 2.0 issues
If a batch job uses Dataflow Shuffle, then the default is 25 GB;
otherwise, the default is 250 GB. March 2021

pipeline 2.0 issues
apache-beam== 2.24
--num_workers= 40
--experiments=shuffle_mode=appliance
March 2021

cost savings plan
1. Administer pain relief 2. Hook up to bypass 3. Heart surgery
➔ Attempt shared
memory
➔ Attempt FlexRS
➔ Mid week delta (only
compute mid week for
users with activity
since Sunday’s run)
➔ Split pipeline
➔ Major refactor
➔ SCANN vs
LightFM.score()
➔ etc.
Timebox: 1 week Timebox: 2 weeks Timebox: 1 month
April 2021

pipeline 3.0 design
apache-beam== 2.24
--num_workers= 40
--experiments=shuffle_mode=appliance
April 2021

pipeline 3.0 shared memory & FlexRS strategy
● Used production-representative data (model, auxiliary data structures)
● Ran the pipeline for 0.5% users, so the iterations would be cheap
○ 100% users: £ 266.74
○ 0.5% users: £ 80.54
● Attempts
○ Shared model using custom-30-460800-ext (15 GB/vCPU)
○ Shared model using custom-30-299520-ext (9.75 GB/vCPU)
○ Shared model using custom-6-50688-ext (8.25 GB/vCPU)
■ 0.5% users: £ 18.46 => -77.5% cost reduction!
May 2021

pipeline 3.0 shared memory & FlexRS results
● However, when we tried to run the same pipeline for 100%, it would take
hours and not complete.
● It was very inefficient and costed more than the initial implementation.
May 2021

pipeline 4.0 heart surgery
● Split compute predictions from applying rules
● Keep the interfaces to a minimal
○ between these two pipelines
○ between steps within the same pipeline
June 2021

pipeline 4.1 precompute recommendations
apache-beam== 2.29
--machine-type = n1-highmem-16
--flexrs-goal = COST_OPTIMIZED
--max-num-workers= 64
--number-of-worker-harness-threads=7
+ Batching
+ Shared memory
https://cloud.google.com/blog/products/data-analytics/ml-inference-in-dataflow-pipelines
July 2021

pipeline 4.1 precompute recommendations
Cost to run for 3.5 million users:
● 100k episodes: £ 48.92 / run
● 300 episodes: £ 3.40
● 18 episodes: £0.74
July 2021

pipeline 4.2 apply business rules
apache-beam== 2.29
--machine-type = n1-standard-1
+ Implemented rules natively
+ Created minimal interfaces and
views of the data
July 2021

pipeline 4.2 apply business rules
Cost to run for 3.5 million users:
● £ 0.15 - 0.83 per run
July 2021

pipeline 4.0 heart surgery
● We were able to reduce the cost of the most expensive run of the pipeline
from £ 279.31 per run to less than £ 50
● Reduced the costs to -82%
July 2021

1. plan based on your data
2. an expensive machine learning pipeline is better than none
3. reducing the scope is a good starting point to saving money
○ Apply non-personalised rules before iterating per user
○ Sort top 1k recommendations by user opposed to 100k
4. using custom machine types might limit other cost savings
○ Such as FlexRS (schedulable preemptible instances in Dataflow only work)
5. to use shared memory may not lead to cost savings
6. minimal interfaces lead to more predictable behaviours in Dataflow
7. splitting the pipeline can be a solution to costs
takeaways

Scaling machine learning to millions of users with Apache Beam

More Related Content

What's hot (20)

Similar to Scaling machine learning to millions of users with Apache Beam (20)

More from Tatiana Al-Chueyr (20)

Recently uploaded (20)

Scaling machine learning to millions of users with Apache Beam