SlideShare a Scribd company logo
ALLUXIO	–	分布式存储系统的统一
入口
概要
•  HDFS在苏宁的使用和存在的问题;	
•  多HDFS集群的解决方案;	
•  Alluxio	Porxy分布式存储系统的统一入口的
实现;	
•  Alluxio在苏宁的未来;	
•  Q&A;
苏宁的大数据平台
数据源
存储层
计算层
服务层
HDFS在苏宁的使用
集群1
•  528	datanodes	* 40TB/
node	
•  DFS	Used:	
–  1.3PB	
–  1.3亿的文件和目录	
–  1.3亿的块	
集群2
•  100	datanodes	* 40TB/
node	
•  DFS	Used:	
–  1PB	
–  5千万的文件和目录;	
–  5千万的块
其他:为Hbase搭建的HDFS集群
单一的HDFS集群存在的问题
•  HDFS	namenode在高并发的情况下的RPC延迟很高;	
–  Client	cloud	not	complete	file,从而导致任务失败;	
–  Datanode	Last	contact时间较长,在namenode重启的时候较为明显;	
在高并发的情况下,HDFS
的水平扩展能力不足。
多HDFS集群的解决方案
•  将HDFS的集群进行拆分;需要考虑的问题有:	
–  底层的多HDFS集群对用户透明;	
–  跨集群的数据访问;	
–  集群切分的维度;	
•  HDFS社区的解决方案:	
–  Federation	+	viewFs	
–  HDFS	Router	
•  苏宁的解决方案:	
–  Alluxio	Proxy: 利用Alluxio的UnifiedNameSpace功能,选取Alluxio作为多HDFS集群
或者其他存储集群的统一入口;
社区的解决方案
•  Federation	+	viewFs:	
– 可以解决HDFS的横向扩展问题;	
– 该方案是在client端通过配置来实现路由功能的;
不利于大规模集群的运维和管理;	
•  Router:	
– HDFS	2.9.0	release;
Alluxio	–	Unifiy	data	at	memory	speed
使用Alluxio遇到的问题
•  多HDFS集群的Metadata都会进入到Alluxio	Master中,
Alluxio	Master会遇到内存的瓶颈;	
–  通过测试,相比于HDFS,Alluxio的Metadata消耗的内存
为HDFS的一倍;	
•  Alluxio	Client和Master的连接是长连接	
•  Alluxio不支持Append操作;	
•  Client的兼容性问题;
Alluxio	Proxy的架构图
Alluxio	Master的元数据量
•  解决办法:各自管理自己存储空间的数据
的元数据;	
Alluxio	
Master
HDFS	
NameNode
只管理Cache在Alluxio	Space中数据的元数据
管理HDFS	Space中数据的元数据
Use Alluxio to Unify Storage Systems in Suning
Alluxio	Master的长连接问题
•  解决办法:	
– Client主动去关闭connect;	
– 通过测试,client	reconnect的时间消耗 <	1ms,
在苏宁的使用场景中,可以接受;
Client支持Append操作
•  由于采用分层管理,各自管理自己Space的
数据的元数据信息,所以在client可以支持
直接through到底层的分布式文件中进行
append操作;
Client端兼容性
•  在实际使用中,Alluxio	Proxy在Client以
plugin的形式提供服务,整个过程中对用户
是无感知的;	
•  由于是部署在client,所以和相关组件的依
赖兼容,从而导致任务失败;	
•  解决办法:将Alluxio	runtime相关的jar包全
部shaded;
Alluxio	Proxy总结
•  利用Alluxio的Unified	Namespace功能实现多HDFS集
群的统一入口;	
–  MountTable在Alluxio	Master端保存,便于运维和管理;	
–  Alluxio的Master具有HA机制;	
•  在路由的功能之上,对热数据进行缓存,从而对计
算进行加速;	
•  将临时的,不需要落地的数据直接放在Alluxio的内
存中,从而减少namenode的元数据的频繁的增加和
删除;
Alluxio	Proxy在苏宁的使用
Alluxio	集群的规模
•  2	masters	+	3	workers	
•  Alluxio当前只用于多HDFS
集群的路由功能;	
•  按照用户对HDFS集群进行
切分;
集成的组件
•  Hadoop(HDFS	+	YARN)	
•  Hive	
•  Spark	
•  Flink	
•  Druid	
•  Sqoop	
•  Hbase	
•  Flume	
•  OLAP
Alluxio	Proxy的规划
•  推进Alluxio	Proxy成为分布式系统的存储统
一入口;	
•  利用Alluxio的缓存功能;	
•  积极参与社区的发展;
Q&A
Thanks

More Related Content

PDF
Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud
PPTX
Tachyon 2015 08 China
PPTX
應用Ceph技術打造軟體定義儲存新局
PDF
分布式存储的元数据设计
PDF
Building the Production Ready EB level Storage Product from Ceph - Dongmao Zhang
PPTX
云梯的多Namenode和跨机房之路
PDF
美团点评技术沙龙14:美团云对象存储系统
PDF
Operation and Maintenance of Large-Scale All-Flash Memory Ceph Storage Cluste...
Kyligence Leverages Alluxio to Accelerate OLAP in the Cloud
Tachyon 2015 08 China
應用Ceph技術打造軟體定義儲存新局
分布式存储的元数据设计
Building the Production Ready EB level Storage Product from Ceph - Dongmao Zhang
云梯的多Namenode和跨机房之路
美团点评技术沙龙14:美团云对象存储系统
Operation and Maintenance of Large-Scale All-Flash Memory Ceph Storage Cluste...

What's hot (20)

PPTX
“云存储系统”赏析系列分享三:Sql与nosql
PDF
Cephfs架构解读和测试分析
PDF
redis 适用场景与实现
PPTX
Hadoop安裝 (1)
PDF
Divein ceph objectstorage-cephchinacommunity-meetup
PDF
高性能Web应用缓存架构设计浅谈
PDF
美团点评技术沙龙010-点评RDS系统介绍
PPT
Mr&ueh数据库方面
PDF
NoSQL误用和常见陷阱分析
PDF
Level db
PPTX
Memcached vs redis
PDF
阿里CDN技术揭秘
PDF
Hantuo openstack
PDF
Ceph perf-tunning
PDF
Hacking Nginx at Taobao
PDF
Ceph intro
PDF
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
PDF
基于Ubuntu 12.04 LTS Server的无盘工作站
PDF
淘宝软件基础设施构建实践
PDF
Hybrid Cloud Based on Ceph Object Storage - ShanChun
“云存储系统”赏析系列分享三:Sql与nosql
Cephfs架构解读和测试分析
redis 适用场景与实现
Hadoop安裝 (1)
Divein ceph objectstorage-cephchinacommunity-meetup
高性能Web应用缓存架构设计浅谈
美团点评技术沙龙010-点评RDS系统介绍
Mr&ueh数据库方面
NoSQL误用和常见陷阱分析
Level db
Memcached vs redis
阿里CDN技术揭秘
Hantuo openstack
Ceph perf-tunning
Hacking Nginx at Taobao
Ceph intro
Ceph Day Beijing - Leverage Ceph for SDS in China Mobile
基于Ubuntu 12.04 LTS Server的无盘工作站
淘宝软件基础设施构建实践
Hybrid Cloud Based on Ceph Object Storage - ShanChun
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Ad

Use Alluxio to Unify Storage Systems in Suning