Enabling Presto Caching at Uber with Alluxio

Enable Presto® Caching
in Uber with Alluxio
Zhongting Hu: TLM @Uber Data Analytics
Beinan Wang: Software Engineer@Alluxio

Data informs every decision at Uber
Marketplace
Pricing
Community
Operations
Growth Marketing Data Science
Compliance
Eats

Presto @ Uber-scale
12K
Monthly Active Users
400K
Queries/day
2
Data Centers
6K
Nodes
14
Clusters
50PB
HDFS data
processed/day

Workloads
Interactive
Ad hoc queries
Batch
Scheduled

Data: From On-Premise to Cloud
● What
○ BI (Application)
○ Analytics (Compute)
○ Storage
● How
○ Feature Compatibility
○ Performance Measurement
○ Security / Compliance
○ Tech Debt ?
● Why
○ Cost Efﬁciency
○ Usability / Scalability / Reliability

Alluxio Local Caching-- High Level Architecture
Running as a local library in presto Worker
Key <-> Value:
HDFS File Path as the key
https://prestodb.io/blog/2020/06/16/alluxio-datacaching

Key Problems -- Data
● Data Characteristics
○ Mostly partition by Date
○ Hudi incremental update on File
○ Staging Directory / Partition from ETL framework
● Cache Data Hit Ratio
○ 3+ PB distinct data access per day
○ ~10% frequently accessed data
○ ~3% hot accessed data
● Data Cache Filtering
○ Ofﬂine Query Analytics on the Table (with Partition) Access
○ Onboarding hot accessed data

Key Problems -- Apache Hadoop® HDFS Latency
● Data Nodes can create some random latency
● In real production environment, CPU walltime mostly spent in reading data

Key Problems -- HDFS Latency, Cont
● Reading from local cache have much better guaranteed latency
● Fixing a bug of Namenode listing (ListLocatedStatus API)

Key Problems -- Presto Soft Afﬁnity Scheduling
● Compute Preferred workers
○ Split override getPreferredNodes() to return the 2 preferred workers
○ Simple Mod based algorithm
○ try to assign it one by one by looking at whether it is busy
○ If both workers are busy, then select least busy worker (with cacheable = false)
● Deﬁne Busy worker
○ Max splits per node: node-scheduler.max-splits-per-node
○ Max pending splits per task: node-scheduler.max-pending-splits-per-task

Key Problems -- Soft Afﬁnity with Consistent Hash
● Change from simple
Mod based node
selection to consistent
hashing
● 10 virtual nodes,
original 400 nodes
cluster

Current Status and next steps
● Initial testing has been ﬁnished, great improvement on queries
● TPCDS testing with sf10k in progress
● Historical Table/Partition analytics to setup cache ﬁlters
● Dashboarding, monitoring, metadata integrations

Persistent File Level Metadata for Local Cache
● Prevent stale caching
○ The underlying data ﬁles might be changed by the 3rd party frameworks. (This situation might be
rare in hive table, but very common in hudi tables)
● Scoped quota management
○ Do you want to put a cache quota for each table?
● Metadata should be recoverable after server restart

File Level Metadata -- High Level Approach
● Implement a file level metadata store which keeps the last modified time and the scope of
each data file we cached.
● The file level metadata store should be persistent on disk so the data will not disappear
after restarting

Cache data and Metadata Structure
root_path/page_size(ulong)/bucket(uint)/file_id(str)/
timestamp1/
Page_file1 (The filename is a ulong)
Page_file2
….
Page_fileN
timestamp2/
Page_file1 (The filename is a ulong)
Page_file2
….
Page_fileN
metadata (stores FileInfo in protobuf format)
Contains timestamp and scope

Metadata Awareness -- Cache Context (New in
2.6.1)

Per Query Metrics Aggregation on Presto Side

Future Work
● Performance Tuning
● Semantic Cache
● More efﬁcient deserialization

Enabling Presto Caching at Uber with Alluxio

More Related Content

What's hot (20)

Similar to Enabling Presto Caching at Uber with Alluxio (20)

More from Alluxio, Inc. (20)

Recently uploaded (20)

Enabling Presto Caching at Uber with Alluxio