SlideShare a Scribd company logo
DATA ORCHESTRATION SUMMI
T
Speeding Up Spark Performance Using
Alluxio in China Unicom
Ce Zhang | China Unicom BigData Engineer
DATA ORCHESTRATION SUMMIT
Profile
▪Ce Zhang
• China Unicom Big Data Engineer
• Alluxio Contributor
• Work with Spark, Flink, Alluxio, etc.
DATA ORCHESTRATION SUMMIT
Agenda
Background Alluxio at Unicom Alluxio 2.X at Unicom
DATA ORCHESTRATION 
SUMMIT
2020
Background
Why Unicom needs Alluxio
DATA ORCHESTRATION SUMMIT
Background
We have workloads on different compute
engines for different business requirements.
• Deal with batch jobs
with strict deadline
• The cluster size
reaches a bottleneck
• Deal with N+1 batch jobs
• Huge resource consumption.
Slow in speed.
• Deal with traditional
data analysis jobs
• Stand-alone performance
bottlenecks and high cost
DATA ORCHESTRATION SUMMIT
• Spark is more scalable than
GreenPlum
• Spark consumes less resource
and is faster than Hive
• Spark is cheaper and deals with
much bigger data than Oracle
• A unified computing platform
has better resource utilization
Background
HDFS
Building a unified computing
platform with Spark ecosystem
DATA ORCHESTRATION SUMMIT
• Performance on Spark doesn't
compare with the speed on
greenplum
• Ad-hoc query are too slow for
the business users
• Lack of stability, and multiple
rounds iterative computing
tasks are prone to failure
Background
HDFS
The challenges of building a
unified computing platform
DATA ORCHESTRATION SUMMIT
• Speed up data read and write
performance
• Improve data write stability
• Taking over Spark cache improves
Spark stability
Background
The unified computing
platform architecture
Metadata &
data cache
HDFS
DATA ORCHESTRATION 
SUMMIT
2020
Alluxio at Unicom
Alluxio usage and business improvement in Unicom
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom-Architecture
Alluxio
Hive
Yarn
PrestoFlink Spark
Jupyter Hue Zeppelin
Hbase
HDFS
Oracle DRDS Kafka Text File
TianYanUnifiedmonitoring
platform
TianGong Cloud Portal
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom
Spark SQL Spark SQL
Spark SQL
Spark SQL
Spark SQL
Spark SQL
Spark SQL
Spark SQL
Sink Source
Sink Source
Sink Sink
SinkSink
Source
Source
Sink
Source
Iterative computing: speed up downstream SQL reading
data and improve data write stability
• Most of sinks will be reused - cache hit
• Sink data in Alluxio in memory
DATA ORCHESTRATION SUMMIT
Share data between Spark Jobs
• Taking over Spark cache
Alluxio in Unicom
Alluxio
Spark SQL Spark SQL Spark SQL
Data
df.write.parquet(”/Data”) spark.read.parquet(”/Data”)
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom-Use replicas
HDFS
Alluxio
Block Block Block Block
Block
Spark SQL Spark SQL Spark SQL
./bin/alluxio fs setReplication --max 3 --min 1 /foo
./bin/alluxio fs distributedLoad --replication 2 /foo
./bin/alluxio fs pin /foo
FileSystem#free
BlockBlock
Having multiple replicas significantly
improves hot data access speed
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom-Optimize memory usage
Alluxio Client Alluxio Client Alluxio Client Alluxio Client
Alluxio Worker
Block
Alluxio Worker
Block
Alluxio Worker
Block
Alluxio Worker
Block
We want to limit the number of replicas
for a high concurrency access pattern
4 copies
DATA ORCHESTRATION SUMMIT
Solution-DeterministicHashPolicy
2 copies
Alluxio Client Alluxio Client Alluxio Client Alluxio Client
Alluxio Worker
Block
Alluxio Worker
Block
Alluxio Worker Alluxio Worker
Alluxio in Unicom-Optimize memory usage space
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom-Data load balance
Available
Used
Alluxio Worker
Alluxio Client Alluxio Client Alluxio Client
Available
Used
Alluxio Worker
Available
Used
Alluxio Worker
Adjust the data distribution
strategy for load balance
MostAvailableFirstPolicy
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom-Data load balance
Adjust the data distribution
strategy for load balance
RoundRobinPolicy
Available
Used
Alluxio Worker
Alluxio Client
Available
Used
Alluxio Worker
Available
Used
Alluxio Worker
1 2
DATA ORCHESTRATION SUMMIT
Alluxio in Unicom-Performance improvements
Alluxio brings
performance with
Increased stability
Metrics Min 25% Med 75% Max
GC Time 2s 5s 5s 6s 8s
GC Time(+Alluxio) 54ms 0.9s 1s 2s 3s
Task Time 1.1min 1.4min 1.5min 1.6min 2.0min
Task Time(+Alluxio) 1s 16s 19s 22s 31s
Job Time NA NA NA NA 2.1min
Job Time(+Alluxio) NA NA NA NA 33s
Job Restart Rate NA NA NA NA 100%
Job Restart
Rate(+Alluxio)
NA NA NA NA 0
DATA ORCHESTRATION SUMMIT
▪Core batch applications
• 50,000 Spark jobs -> 350,000 Spark jobs, The size of the
mission has been expanded sixfold
• 60% increase in the number of users
• Business speed up by 100%
▪Core Ad-hoc business
• Process 10TB of increasing data
• Table with 2.5TB in minute-level queries
▪Oracle
• User-level detail statistics
Alluxio in Unicom-The value it brings to business
DATA ORCHESTRATION 
SUMMIT
2020
Alluxio 2.X at Unicom
New features in Alluxio 2.X make a significant business improvement in Unicom
DATA ORCHESTRATION SUMMIT
Alluxio 2.X-RAFT High Availability
Distributed
Storage
(ie. HDFS)
Alluxio Standby
Master
Distributed
Quorum
(Zookeeper)
Alluxio Master
Alluxio Standby
Master
Alluxio Standby
Master
Alluxio Master
RAFT
RAFT High Available getting
rid of external dependencies
DATA ORCHESTRATION SUMMIT
Alluxio 2.X-Enhanced metadata operations
Java Heap RocksDB
RocksDB storage metadata brings
improvements
• Improve the performance and stability of recursive
operations in the 100G-plus directory
• Java Heap: Operation large directory will lead
to OOM error
• RocksDB: Operation TB-level directory can be
completed in seconds
• Reduce the impact of file Inconsistency
• Directory in TB-level which contains more than
10,000 files can be fixed within 10 seconds
• The unrelated users are not aware
DATA ORCHESTRATION SUMMIT
Alluxio 2.X-Spark write wait time optimization
Alluxio Worker
Spark SQL
/foo/_temporary
/foo/part-00001
/foo/part-00002
……
Blocking wait times for Spark writing to
a large number of files have improved
• 90% reduction in write blocking wait time for
files over 100G
DATA ORCHESTRATION SUMMIT
Alluxio 2.X-Multi-version HDFS mount
alluxio:///
UserA UserB
Path2 Path2Path1 Path1
hdfs:///
Path2Path1
hdfs:///
Path2Path1
Alluxio
HDFS 2.7 HDFS 3.1
mount mount
DATA ORCHESTRATION 
SUMMIT
2020
Summary
DATA ORCHESTRATION SUMMIT
Summary
• Alluxio accelerates Unicom's Spark batch processing and ad-hoc services to
ensure business stability and efficiency
• Share Spark data with Alluxio to speed up business pipelines and reduce the chance
of Job retrying
• Performance improved by pinning, controlling replicas, and read/write strategies
• Apps improve resource usage by actively free option
• Read and write strategies can also be used to ensure that the Alluxio cluster is load
balanced
• Alluxio 2.X
• Enhanced RAFT high availability greatly improves metadata operational performance
and availability
• Mount multiple versions of HDFS
DATA ORCHESTRATION SUMMIT
Join Alluxio Community
Join Slack channel
alluxio.io/slack
Wechat Public AccountJoin meetup groups near you
alluxio-open-source-community/

More Related Content

PDF
Accelerate Cloud Training with Alluxio
PDF
Accelerating Hive with Alluxio on S3
PDF
Hybrid data lake on google cloud with alluxio and dataproc
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
PDF
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Accelerate Cloud Training with Alluxio
Accelerating Hive with Alluxio on S3
Hybrid data lake on google cloud with alluxio and dataproc
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Flexible and Fast Storage for Deep Learning with Alluxio
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds

What's hot (20)

PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
PDF
Fluid: When Alluxio Meets Kubernetes
PDF
Burst Presto & Spark workloads to AWS EMR with no data copies
PDF
Accelerating Data Computation on Ceph Objects
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
Fast Big Data Analytics with Spark on Tachyon
PDF
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
PDF
Apache Hudi: The Path Forward
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
PDF
Improving Presto performance with Alluxio at TikTok
PDF
RaptorX: Building a 10X Faster Presto with hierarchical cache
PDF
The Practice of Alluxio in JD.com
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PPTX
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
PDF
Best Practices for Using Alluxio with Spark
PDF
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
PDF
A Reliable Memory-Centric Distributed Storage System
PDF
Embracing hybrid cloud for data-intensive analytic workloads
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PPTX
Tachyon workshop 2015-07-19
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Fluid: When Alluxio Meets Kubernetes
Burst Presto & Spark workloads to AWS EMR with no data copies
Accelerating Data Computation on Ceph Objects
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Fast Big Data Analytics with Spark on Tachyon
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Apache Hudi: The Path Forward
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Improving Presto performance with Alluxio at TikTok
RaptorX: Building a 10X Faster Presto with hierarchical cache
The Practice of Alluxio in JD.com
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Best Practices for Using Alluxio with Spark
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
A Reliable Memory-Centric Distributed Storage System
Embracing hybrid cloud for data-intensive analytic workloads
Accelerate Analytics and ML in the Hybrid Cloud Era
Tachyon workshop 2015-07-19
Ad

Similar to Speeding Up Spark Performance using Alluxio at China Unicom (20)

PDF
Spark Summit EU talk by Jiri Simsa
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
PDF
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PDF
Accelerating Spark with Kubernetes
PDF
3 storage innovations for improving performance, efficiency, and manageability
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PDF
[NetApp Managing Big Workspaces with Storage Magic
PDF
The Architecture of Decoupling Compute and Storage with Alluxio
PDF
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
PDF
Unified Data API for Distributed Cloud Analytics and AI
PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
How the Development Bank of Singapore solves on-prem compute capacity challen...
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating Spark with Kubernetes
3 storage innovations for improving performance, efficiency, and manageability
Alluxio: Unify Data at Memory Speed; 2016-11-18
[NetApp Managing Big Workspaces with Storage Magic
The Architecture of Decoupling Compute and Storage with Alluxio
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Unified Data API for Distributed Cloud Analytics and AI
HPC DAY 2017 | HPE Storage and Data Management for Big Data
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
AI in Product Development-omnex systems
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPT
Introduction Database Management System for Course Database
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Nekopoi APK 2025 free lastest update
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Digital Strategies for Manufacturing Companies
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Navsoft: AI-Powered Business Solutions & Custom Software Development
AI in Product Development-omnex systems
ManageIQ - Sprint 268 Review - Slide Deck
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
VVF-Customer-Presentation2025-Ver1.9.pptx
Introduction Database Management System for Course Database
Wondershare Filmora 15 Crack With Activation Key [2025
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Nekopoi APK 2025 free lastest update
Odoo Companies in India – Driving Business Transformation.pdf
Online Work Permit System for Fast Permit Processing
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Digital Strategies for Manufacturing Companies
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Which alternative to Crystal Reports is best for small or large businesses.pdf
Transform Your Business with a Software ERP System
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

Speeding Up Spark Performance using Alluxio at China Unicom