SlideShare a Scribd company logo
11
Sheepdog Status Report
Sheepdog Summit 2015
Liu Yuan
22
Agenda
Introduction - Sheepdog Overview
Past and Now - Sheepdog Community
Working In Progress – Problems and Solutions
33
Sheepdog Overview
Introduction
44
• Distributed Object Storage System In User Space
– Manage Disks and Nodes
• Aggregate the capacity and the power (IOPS + throughput)
• Hide the failure of hardware
• Dynamically grow or shrink the scale
– Secure Data
• Provide redundancy mechanisms (replication and erasure code) for high-
availability
• Secure the data with auto-healing and auto-rebalanced mechanisms
– Provide Interfaces (in a single cluster)
• Virtual volume for QEMU VM, iSCSI TGT (Best supported)
• RESTful container (Openstack Swift and Amazon S3 Compatible, in progress)
• Storage for Openstack Cinder, Glance, Nova (in progress)
• POSIX file via NFS (in progress)
• Linux Block Device
What is Sheepdog
55
Gateway
Store
1TB 1TB
1TB
Gateway
Store
1TB 1TB
2TB
Gateway
Store
1TB 2TB
X
Private Hash Ring: Local Rebalance
Global Consistent Hash Ring and P2P Global Rebalance
No meta servers!Zookeeper: membership management and message queue
4TB Hot-plugged Auto unplugged on EIO
Disks and Nodes Management
66
Data Management
Sheep Sheep Sheep
Full Replication
Sheep Sheep Sheep Sheep Sheep Sheep
Erasure Coding
Parity
77
Sheep Sheep Sheep Sheep
Object LUN
Volume
File
Openstack
NFS HTTP iSCSI
Glance
Nova
Cinder
Block
SBD
Interfaces
QEMU
Sheepdog
88
Use Patterns
SD VM SD VM
SD VM SD VM
VM running inside
Sheepdog Cluster
SD SD
SD SD
SD
SD
HTTP
HTTP object storage
SD SD
SD SD
SD
SD
LUN device pool
iSCSI backend
Nginx
99
Sheepdog Community
Past and Now
1010
Peoples
Kazutaka Morita 2009.9
People from Taobao 2011.9
Christph Hellwig from Nebula 2012.4
More production uses from the world
People from Intel 2014
People from China Mobile 2015
Stayed for around half the year
Valerio, Andy, startups at China and Japan
Add isa-l for Erasure code
Open sourced the Sheepdog
Add features, bug fixing, redesign
Make sheepdog better
1111
Patches
2009 2010 2011 2012 2013 2014 2015
0
200
400
600
800
1000
1200
Patches Per Year
●
Culminate at 2012 and 2013,
suffer a decline recently.
●
It is always easier to open
source the code, but build a
community is really difficult.
●
China Mobile is committed to
release all its patches to the
community.
1212
Comparison with Ceph and GlusterFS
Pros:
The simplicity is the biggest advantage for Sheepdog
Sheepdog: 20k+ lines in user space
Ceph: 400k+ lines in user space and 20k+ in kernel
GlusterFS: 330K+ lines in user space
Cons:
●
No company behind
●
inactive community
●
few users and few developers
But Sheepdog is not technically inferior! Simplicity doesn't mean bad!
1313
Sheepdog-ng
Why?
We forked it at May because of endless crashes, panics by our stressing test. I
discussed with NTT guys with the redesign idea to remove shared states between
sheep nodes. They asked me to fork Sheepdog instead simply because they don't use
zookeeper as they always replied to a user with some features they don't use (e.g.,
object cache)
http://lists.wpkg.org/pipermail/sheepdog/2015-May/067736.html
The technical reason:
Share nothing or share more and more state with overwhelming complexity.
The non-technical reason:
Community is not as friendly and open as before. We want to build a real community-
based project.
Subscribe the list: send email to sheepdog-ng+subscribe@googlegroups.com
1414
Problems and Solutions
Working In
Progress
1515
iSCSI Target Scalability
LUN1 LUN2
STGT
sheep
Main thread
Max req == nr of workers
Sync
LUN1 LUN2
New Target
sheep
Unlimted!
Async
Thread per LUN
Problems:
●
OS tends to issue more and
more request (blk-mp, scsi-mp)
●
A single LUN can saturate stgt,
not scale at all
●
STGT take too much resource
●
Multipath is not so good
Solution – Rewrite
●
from sync to async, less threads
and Fds
●
Tailored for sheepdog
●
Add io rebalance and cache
support New target
1616
Performance Degradation
X
IO hang
IO Resume
Problem with default Dynamic Hash Ring
●
If object is in recovery, we need to wait!
●
What make it worse , recovery IO will
complete with user IO for bandwidth, CPU
●
Neither slow nor fast recovery is satisfied
Solution – Static Hash Ring
Failure of node won't change the hash ring.Trade
data reliability for performance! We don't recover
object if some of redundancy data are missing.
Useful for small cluster with mostly deal with single
node event.
X
Drop this IOSHR
DHR
1717
Live Patching
A ----> B ----> C
A B C
B`
After Patching
B` is loaded by Linux's
dynamic loader on the fly
Sheep tracer
Similar to Linux's Ftrace, virtually add
constructor and destructor to every function.
This mechanism relies on the 5 bytes space
(A.K.A mcount) injected by GCC beforehand.
Based on the tracer, we can replace any function
in the sheep daemon on the fly.
Useful for one-liner bug fixing but is limited on
function level.
1818
NFS Server
Current status:
Just a toy with file size < 4M, NFSv3 is not fully supported and virtually no file
system code (need implement inode, dentry and free space management)
Todos
- finish stubs
- add extent to file allocation
- add btree or hash based kv store to manage dentries
- implement a multi-threaded SUNRPC to take place of poor performance
glibc RPC
- implement NFS v4
1919
Cinder - Block Storage
– Support since day 1
Glance - Image Storage
– Support merged at Havana version
Nova - Ephemeral Storage
– Not yet started
Swift - Object Storage
– Swift API compatible In progress
Final Goal - Unified Storage
– Copy-On-Write anywhere ?
– Data dedup ?
Sheep Sheep Sheep Sheep
Cinder Glance
Unified Storage
NovaSwift
Openstack
Plan to rewrite the driver with libsheepdog.so
2020
Enjoy yourself in Suzhou

More Related Content

PDF
Overview of sheepdog
PPT
Sheepdog: yet another all in-one storage for openstack
ODP
Gluster volume snapshot
PPTX
Cinder Live Migration and Replication - OpenStack Summit Austin
PDF
Red Hat Ceph Storage Roadmap: January 2016
PPTX
Optimizing VM images for OpenStack with KVM/QEMU
PDF
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
PDF
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Overview of sheepdog
Sheepdog: yet another all in-one storage for openstack
Gluster volume snapshot
Cinder Live Migration and Replication - OpenStack Summit Austin
Red Hat Ceph Storage Roadmap: January 2016
Optimizing VM images for OpenStack with KVM/QEMU
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...

What's hot (20)

PPTX
Ceph Day Bring Ceph To Enterprise
PDF
Compute 101 - OpenStack Summit Vancouver 2015
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
PDF
Kubernetes networking
PDF
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
PPTX
Ceph on 64-bit ARM with X-Gene
PDF
Ceph Day Beijing - SPDK for Ceph
PPTX
Ceph Day Tokyo - Bring Ceph to Enterprise
PDF
Disaster recovery of OpenStack Cinder using DRBD
PPTX
ceph-barcelona-v-1.2
PDF
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
PDF
Cloud foundry on kubernetes
PPTX
Ceph Performance and Sizing Guide
PPTX
Performance analysis with_ceph
PPTX
Cloud Storage Introduction ( CEPH )
PDF
Introduction to Vacuum Freezing and XID
ODP
Managing ceph through_oVirt_using_Cinder
PDF
Kvm performance optimization for ubuntu
PPTX
SUSE Enterprise Storage on ThunderX
PPTX
Build an affordable Cloud Stroage
Ceph Day Bring Ceph To Enterprise
Compute 101 - OpenStack Summit Vancouver 2015
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Kubernetes networking
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
Ceph on 64-bit ARM with X-Gene
Ceph Day Beijing - SPDK for Ceph
Ceph Day Tokyo - Bring Ceph to Enterprise
Disaster recovery of OpenStack Cinder using DRBD
ceph-barcelona-v-1.2
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
Cloud foundry on kubernetes
Ceph Performance and Sizing Guide
Performance analysis with_ceph
Cloud Storage Introduction ( CEPH )
Introduction to Vacuum Freezing and XID
Managing ceph through_oVirt_using_Cinder
Kvm performance optimization for ubuntu
SUSE Enterprise Storage on ThunderX
Build an affordable Cloud Stroage
Ad

Viewers also liked (9)

PDF
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
PDF
Sheepdog内部实现机制
PDF
Erasure codes and storage tiers on gluster
PPTX
分散ストレージ技術Cephの最新情報
PDF
Qemu & KVM Guide #1 (intro & basic)
PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PPTX
Ceph アーキテクチャ概説
PDF
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
PDF
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdog内部实现机制
Erasure codes and storage tiers on gluster
分散ストレージ技術Cephの最新情報
Qemu & KVM Guide #1 (intro & basic)
Performance comparison of Distributed File Systems on 1Gbit networks
Ceph アーキテクチャ概説
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
Ad

Similar to Sheepdog Status Report (20)

PDF
GIST AI-X Computing Cluster
PDF
What’s New in ScyllaDB Open Source 5.0
PDF
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
PPT
10Gbps transfers
PDF
Under The Hood Of A Shard-Per-Core Database Architecture
PDF
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
PDF
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
PDF
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
PPTX
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
PDF
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
PDF
Sanger OpenStack presentation March 2017
PDF
00 opencapi acceleration framework yonglu_ver2
PPTX
Sanger, upcoming Openstack for Bio-informaticians
PPTX
Flexible compute
PPTX
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
PDF
Containers > VMs
PDF
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
PDF
Building a Database for the End of the World
PDF
Netflix Open Source Meetup Season 4 Episode 2
GIST AI-X Computing Cluster
What’s New in ScyllaDB Open Source 5.0
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
10Gbps transfers
Under The Hood Of A Shard-Per-Core Database Architecture
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Sanger OpenStack presentation March 2017
00 opencapi acceleration framework yonglu_ver2
Sanger, upcoming Openstack for Bio-informaticians
Flexible compute
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Containers > VMs
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Building a Database for the End of the World
Netflix Open Source Meetup Season 4 Episode 2

Recently uploaded (20)

PPTX
Lecture Notes Electrical Wiring System Components
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
OOP with Java - Java Introduction (Basics)
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Project quality management in manufacturing
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
additive manufacturing of ss316l using mig welding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
PPT on Performance Review to get promotions
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Lecture Notes Electrical Wiring System Components
CYBER-CRIMES AND SECURITY A guide to understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Embodied AI: Ushering in the Next Era of Intelligent Systems
OOP with Java - Java Introduction (Basics)
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Project quality management in manufacturing
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
additive manufacturing of ss316l using mig welding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Foundation to blockchain - A guide to Blockchain Tech
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT on Performance Review to get promotions
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

Sheepdog Status Report

  • 1. 11 Sheepdog Status Report Sheepdog Summit 2015 Liu Yuan
  • 2. 22 Agenda Introduction - Sheepdog Overview Past and Now - Sheepdog Community Working In Progress – Problems and Solutions
  • 4. 44 • Distributed Object Storage System In User Space – Manage Disks and Nodes • Aggregate the capacity and the power (IOPS + throughput) • Hide the failure of hardware • Dynamically grow or shrink the scale – Secure Data • Provide redundancy mechanisms (replication and erasure code) for high- availability • Secure the data with auto-healing and auto-rebalanced mechanisms – Provide Interfaces (in a single cluster) • Virtual volume for QEMU VM, iSCSI TGT (Best supported) • RESTful container (Openstack Swift and Amazon S3 Compatible, in progress) • Storage for Openstack Cinder, Glance, Nova (in progress) • POSIX file via NFS (in progress) • Linux Block Device What is Sheepdog
  • 5. 55 Gateway Store 1TB 1TB 1TB Gateway Store 1TB 1TB 2TB Gateway Store 1TB 2TB X Private Hash Ring: Local Rebalance Global Consistent Hash Ring and P2P Global Rebalance No meta servers!Zookeeper: membership management and message queue 4TB Hot-plugged Auto unplugged on EIO Disks and Nodes Management
  • 6. 66 Data Management Sheep Sheep Sheep Full Replication Sheep Sheep Sheep Sheep Sheep Sheep Erasure Coding Parity
  • 7. 77 Sheep Sheep Sheep Sheep Object LUN Volume File Openstack NFS HTTP iSCSI Glance Nova Cinder Block SBD Interfaces QEMU Sheepdog
  • 8. 88 Use Patterns SD VM SD VM SD VM SD VM VM running inside Sheepdog Cluster SD SD SD SD SD SD HTTP HTTP object storage SD SD SD SD SD SD LUN device pool iSCSI backend Nginx
  • 10. 1010 Peoples Kazutaka Morita 2009.9 People from Taobao 2011.9 Christph Hellwig from Nebula 2012.4 More production uses from the world People from Intel 2014 People from China Mobile 2015 Stayed for around half the year Valerio, Andy, startups at China and Japan Add isa-l for Erasure code Open sourced the Sheepdog Add features, bug fixing, redesign Make sheepdog better
  • 11. 1111 Patches 2009 2010 2011 2012 2013 2014 2015 0 200 400 600 800 1000 1200 Patches Per Year ● Culminate at 2012 and 2013, suffer a decline recently. ● It is always easier to open source the code, but build a community is really difficult. ● China Mobile is committed to release all its patches to the community.
  • 12. 1212 Comparison with Ceph and GlusterFS Pros: The simplicity is the biggest advantage for Sheepdog Sheepdog: 20k+ lines in user space Ceph: 400k+ lines in user space and 20k+ in kernel GlusterFS: 330K+ lines in user space Cons: ● No company behind ● inactive community ● few users and few developers But Sheepdog is not technically inferior! Simplicity doesn't mean bad!
  • 13. 1313 Sheepdog-ng Why? We forked it at May because of endless crashes, panics by our stressing test. I discussed with NTT guys with the redesign idea to remove shared states between sheep nodes. They asked me to fork Sheepdog instead simply because they don't use zookeeper as they always replied to a user with some features they don't use (e.g., object cache) http://lists.wpkg.org/pipermail/sheepdog/2015-May/067736.html The technical reason: Share nothing or share more and more state with overwhelming complexity. The non-technical reason: Community is not as friendly and open as before. We want to build a real community- based project. Subscribe the list: send email to sheepdog-ng+subscribe@googlegroups.com
  • 15. 1515 iSCSI Target Scalability LUN1 LUN2 STGT sheep Main thread Max req == nr of workers Sync LUN1 LUN2 New Target sheep Unlimted! Async Thread per LUN Problems: ● OS tends to issue more and more request (blk-mp, scsi-mp) ● A single LUN can saturate stgt, not scale at all ● STGT take too much resource ● Multipath is not so good Solution – Rewrite ● from sync to async, less threads and Fds ● Tailored for sheepdog ● Add io rebalance and cache support New target
  • 16. 1616 Performance Degradation X IO hang IO Resume Problem with default Dynamic Hash Ring ● If object is in recovery, we need to wait! ● What make it worse , recovery IO will complete with user IO for bandwidth, CPU ● Neither slow nor fast recovery is satisfied Solution – Static Hash Ring Failure of node won't change the hash ring.Trade data reliability for performance! We don't recover object if some of redundancy data are missing. Useful for small cluster with mostly deal with single node event. X Drop this IOSHR DHR
  • 17. 1717 Live Patching A ----> B ----> C A B C B` After Patching B` is loaded by Linux's dynamic loader on the fly Sheep tracer Similar to Linux's Ftrace, virtually add constructor and destructor to every function. This mechanism relies on the 5 bytes space (A.K.A mcount) injected by GCC beforehand. Based on the tracer, we can replace any function in the sheep daemon on the fly. Useful for one-liner bug fixing but is limited on function level.
  • 18. 1818 NFS Server Current status: Just a toy with file size < 4M, NFSv3 is not fully supported and virtually no file system code (need implement inode, dentry and free space management) Todos - finish stubs - add extent to file allocation - add btree or hash based kv store to manage dentries - implement a multi-threaded SUNRPC to take place of poor performance glibc RPC - implement NFS v4
  • 19. 1919 Cinder - Block Storage – Support since day 1 Glance - Image Storage – Support merged at Havana version Nova - Ephemeral Storage – Not yet started Swift - Object Storage – Swift API compatible In progress Final Goal - Unified Storage – Copy-On-Write anywhere ? – Data dedup ? Sheep Sheep Sheep Sheep Cinder Glance Unified Storage NovaSwift Openstack Plan to rewrite the driver with libsheepdog.so