SlideShare a Scribd company logo
Backing up thousands of
containers
OR
How to fail miserably at
copying data
OpenFest 2015
Backing up thousands of containers
Talk about backup systems...Why?
➢First backup system built in 1999
➢Since then, 10 different systems
➢But why built your own?
➢ simple: SCALE
➢I'm very proud of the design of the last two
systems my team and I build
Backup considerations
➢Storage capacity
➢Amount of backup copies
➢HDD and RAID speeds
➢Almost never the network
Networking....
➢typical transfer speed over 1Gbit/s ~ 24MB/s
➢typical transfer speed over 10Gbit/s ~ 110MB/s
➢Restoring a 80% full 2TB drive
➢ ~21h over 1Gbit/s with 24MB/s
➢ ~4h and a half over 10Gbit/s with 110MB/s
➢Overlapping backups on the same network
equipment
➢Overlapping backups and restores
➢Switch uplinks
Architecture of container backups
➢Designed for 100,000 containers
➢backup each container at least once a day
➢30 incremental copies
➢Now I'll explain HOW :)
Host machine architecture
➢We use LVM
➢RAID array which exposes a single drive
➢setup a single Physical Volume on that drive
➢setup a single Volume Group using the above
PV
➢Thin provisioned VG
➢Each container with its own Logical Volume
Backup node architecture
➢Again we use LVM
➢RAID array which exposes a single drive
➢5 equally big Physical Volumes
➢on each PV we create a VG with thin pool
➢each container has a single LV
➢each incremental backup is a new snapshot
from the LV
➢when the max number of incremental backups
is reached, we remove the first LV
For now, there is nothing reallyFor now, there is nothing really
new or very interesting here.new or very interesting here.
So let me start with the funSo let me start with the fun
part.part.
➢We use rsync (nothing revolutionary here)
➢We need the size of the deleted files
➢ https://github.com/kyupltd/rsync/tree/deleted-stats
➢Restore files directly in client's containers, no
SSH into them
➢ https://github.com/kyupltd/rsync/tree/mount-ns
Backup system architecture
➢ One central database
➢ Public/Private IP addresses
➢ Maximum slots per machine
➢ Gearman for messaging layer
➢ Scheduler for backups
➢ Backup worker
The Scheduler
➢ Check if we have to backup the container
➢ Get the last backup timestamp
➢ Check if the host node has available backup
slots
➢ Schedule a 'start-backup' job at the gearman
on the backup node
start-backup worker
➢ Works on each backup node
➢ Started as many times as the Backup server
can handle
➢ handles the actual backup
➢ creates snapshots
➢ monitors rsync
➢ remove snapshots
➢ update database
No problems... they say :)
➢ We lost ALL of our backups from TWO node
➢ corrupted VG metadata
➢ VG metadata is not enough (more then 2000)
LVs
➢ create the VGs a little bit smaller then the total size
of the PV
➢ separate the VGs to loose less
No problems... they say :)
➢ LV creation becomes sluggish because LVM tries to
scan for devices in /dev
➢ obtain_device_list_from_udev = 1
➢ write_cache_state = 0
➢ specify the devices in scan = [ “/dev” ]
➢lvmetad and dmetad break...
➢ when they breack, they corrupt the metadata of all currently
opened containers
➢lvcreate leaks file descriptors
➢ once lvmetad or dmeventd are out of FDs everything breaks
Then the Avatar came
➢ We wanted to reduce the restore time from 4h to
under 1h, even under 30min
➢ So instead of backing up whole containers...
➢ We now backup accounts
➢ Soon we will be able to do distributed restore
➢ single host node backup
➢ from multiple backup nodes
➢ to multiple host nodes
Layerd backupsSparse File
Physical Volume
Volume Group
ThinPool
Logical Volume
Snapshot6
Snapshot5
Snapshot4
Snapshot3
Snapshot2
Snapshot1
Snapshot0
Loop mount
Issues here
➢ We can't keep a machine UP for more then 19
hours, LVM kernel BUG
➢ 2.6 till 4.3 - when discarding data it crashes
➢ Removing old snapshots does not discard the
data
➢ LVM umounts a volume when dmeventd
reaches the limit of Fds
➢ It does umount -l, the bastard
Issues here
➢ LVM dmeventd try's to extend the volume, but
if you don't have free extents it will silently
umount -l your LV
➢ Monitor your thinpool metadata
➢ Make your thinpool smaller then the VG and
always plan to have a few spare PE for
extending the pool
➢ kabbi__ irc.freenode.net #lvm
Any Questions?
Backing up thousands of containers

More Related Content

ODP
Tools used for debugging
ODP
Perl dancer
PDF
Eduardo Silva - monkey http-server everywhere
PDF
T.Pollak y C.Yaconi - Prey
PDF
Haproxy - zastosowania
PDF
PyGotham 2014 Introduction to Profiling
PDF
ReplacingSquidWithATS
PDF
How to turn any dynamic website into a static site | 24.01.2018 | Artem Danil...
Tools used for debugging
Perl dancer
Eduardo Silva - monkey http-server everywhere
T.Pollak y C.Yaconi - Prey
Haproxy - zastosowania
PyGotham 2014 Introduction to Profiling
ReplacingSquidWithATS
How to turn any dynamic website into a static site | 24.01.2018 | Artem Danil...

What's hot (20)

ODP
Varnish: Making eZ Publish sites fly
PDF
Adrian Mouat - Docker Tips and Tricks
ODP
WebSockets with PHP: Mission impossible
PPTX
Curl Tutorial
PDF
Scaling WordPress
PPTX
PDF
[Js hcm] Deploying node.js with Forever.js and nginx
PDF
Nginx + PHP
PDF
A little systemtap
PDF
PDF
Network Automation: Ansible 102
PDF
Securing Prometheus exporters using HashiCorp Vault
PPTX
WP-CLI Workshop at WordPress Meetup Cluj-Napoca
PDF
Ondřej Šika: Docker, Traefik a CI - Mějte nasazené všeny větve na kterých pra...
ODP
Linuxday.at - Lightning Talk
PPTX
Techniques to Improve Cache Speed
PDF
Docker remote-api
PDF
Nginx وب سروری برای تمام فصول
Varnish: Making eZ Publish sites fly
Adrian Mouat - Docker Tips and Tricks
WebSockets with PHP: Mission impossible
Curl Tutorial
Scaling WordPress
[Js hcm] Deploying node.js with Forever.js and nginx
Nginx + PHP
A little systemtap
Network Automation: Ansible 102
Securing Prometheus exporters using HashiCorp Vault
WP-CLI Workshop at WordPress Meetup Cluj-Napoca
Ondřej Šika: Docker, Traefik a CI - Mějte nasazené všeny větve na kterých pra...
Linuxday.at - Lightning Talk
Techniques to Improve Cache Speed
Docker remote-api
Nginx وب سروری برای تمام فصول
Ad

Viewers also liked (18)

PDF
Choose your dev platform
PDF
Internet de les coses low cost
PPT
Kei2
PDF
Changing Companies Minds About Women
PPTX
El i pod yiseth 10.c
PDF
Grafico diario del eurostoxx 50 para el 23 02 2012
PPT
Parousiasi elvatzoglou
DOCX
Apple iOS training at IICT Chrompet | Tambaram | Pallavaram | Guindy | Poteri
DOCX
What to decide before going in for an intranet
PDF
Chrome Communications - Case Study - The Silent Auction Pop-Up
PPTX
5 trendów, które zmienią oblicze polskiego przemysłu.
PPT
Me - My life, My job, My..
PDF
Les Grandes conférences, Saint-Lô
PPTX
Construction & Materials (2)
PDF
Memory management in iOS.
PDF
Network namespaces
Choose your dev platform
Internet de les coses low cost
Kei2
Changing Companies Minds About Women
El i pod yiseth 10.c
Grafico diario del eurostoxx 50 para el 23 02 2012
Parousiasi elvatzoglou
Apple iOS training at IICT Chrompet | Tambaram | Pallavaram | Guindy | Poteri
What to decide before going in for an intranet
Chrome Communications - Case Study - The Silent Auction Pop-Up
5 trendów, które zmienią oblicze polskiego przemysłu.
Me - My life, My job, My..
Les Grandes conférences, Saint-Lô
Construction & Materials (2)
Memory management in iOS.
Network namespaces
Ad

Similar to Backing up thousands of containers (20)

PDF
Putting some "logic" in LVM.
PDF
LVM Management & Disaster Recovery - RHCSA+.pdf
PDF
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
PDF
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
PDF
FOSDEM'17: Disaster Recovery Management with ReaR (relax-and-recover) & DRLM ...
PDF
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
PPTX
Cinder Live Migration and Replication - OpenStack Summit Austin
PPTX
lvm.pptx
PDF
Containers in a file
PDF
Containers in a File
PDF
Scale11x lxc talk
PDF
Disaster recovery with open nebula
PDF
Containerization is more than the new Virtualization: enabling separation of ...
ODP
Using CloudStack With Clustered LVM
PDF
Wheeler w 0450_linux_file_systems1
PDF
Wheeler w 0450_linux_file_systems1
PPTX
module-3-chapter-3-replication-san1.pptx
PDF
Why btrfs is the Bread and Butter of Filesystems
PDF
LinuxCon_2013_NA_Eckermann_Filesystems_btrfs.pdf
PDF
Red hat lvm cheatsheet
Putting some "logic" in LVM.
LVM Management & Disaster Recovery - RHCSA+.pdf
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real cloud...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
FOSDEM'17: Disaster Recovery Management with ReaR (relax-and-recover) & DRLM ...
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Cinder Live Migration and Replication - OpenStack Summit Austin
lvm.pptx
Containers in a file
Containers in a File
Scale11x lxc talk
Disaster recovery with open nebula
Containerization is more than the new Virtualization: enabling separation of ...
Using CloudStack With Clustered LVM
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
module-3-chapter-3-replication-san1.pptx
Why btrfs is the Bread and Butter of Filesystems
LinuxCon_2013_NA_Eckermann_Filesystems_btrfs.pdf
Red hat lvm cheatsheet

More from Marian Marinov (20)

PDF
How to start and then move forward in IT
PDF
Thinking about highly-available systems and their setup
PDF
Understanding your memory usage under Linux
PDF
How to implement PassKeys in your application
PDF
Dev.bg DevOps March 2024 Monitoring & Logging
PDF
Basic presentation of cryptography mechanisms
PDF
Microservices: Benefits, drawbacks and are they for me?
PDF
Introduction and replication to DragonflyDB
PDF
Message Queuing - Gearman, Mosquitto, Kafka and RabbitMQ
PDF
How to successfully migrate to DevOps .pdf
PDF
How to survive in the work from home era
PDF
Managing sysadmins
PDF
Improve your storage with bcachefs
PDF
Control your service resources with systemd
PDF
Comparison of-foss-distributed-storage
PDF
Защо и как да обогатяваме знанията си?
PDF
Securing your MySQL server
PDF
Sysadmin vs. dev ops
PDF
DoS and DDoS mitigations with eBPF, XDP and DPDK
PDF
Challenges with high density networks
How to start and then move forward in IT
Thinking about highly-available systems and their setup
Understanding your memory usage under Linux
How to implement PassKeys in your application
Dev.bg DevOps March 2024 Monitoring & Logging
Basic presentation of cryptography mechanisms
Microservices: Benefits, drawbacks and are they for me?
Introduction and replication to DragonflyDB
Message Queuing - Gearman, Mosquitto, Kafka and RabbitMQ
How to successfully migrate to DevOps .pdf
How to survive in the work from home era
Managing sysadmins
Improve your storage with bcachefs
Control your service resources with systemd
Comparison of-foss-distributed-storage
Защо и как да обогатяваме знанията си?
Securing your MySQL server
Sysadmin vs. dev ops
DoS and DDoS mitigations with eBPF, XDP and DPDK
Challenges with high density networks

Recently uploaded (20)

PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPT
introduction to datamining and warehousing
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPT
Occupational Health and Safety Management System
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
737-MAX_SRG.pdf student reference guides
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Abrasive, erosive and cavitation wear.pdf
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
Artificial Intelligence
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
Visual Aids for Exploratory Data Analysis.pdf
introduction to datamining and warehousing
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Occupational Health and Safety Management System
UNIT 4 Total Quality Management .pptx
Fundamentals of Mechanical Engineering.pptx
737-MAX_SRG.pdf student reference guides
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Abrasive, erosive and cavitation wear.pdf
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Artificial Intelligence
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Information Storage and Retrieval Techniques Unit III
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Safety Seminar civil to be ensured for safe working.
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx

Backing up thousands of containers

  • 1. Backing up thousands of containers OR How to fail miserably at copying data OpenFest 2015
  • 3. Talk about backup systems...Why? ➢First backup system built in 1999 ➢Since then, 10 different systems ➢But why built your own? ➢ simple: SCALE ➢I'm very proud of the design of the last two systems my team and I build
  • 4. Backup considerations ➢Storage capacity ➢Amount of backup copies ➢HDD and RAID speeds ➢Almost never the network
  • 5. Networking.... ➢typical transfer speed over 1Gbit/s ~ 24MB/s ➢typical transfer speed over 10Gbit/s ~ 110MB/s ➢Restoring a 80% full 2TB drive ➢ ~21h over 1Gbit/s with 24MB/s ➢ ~4h and a half over 10Gbit/s with 110MB/s ➢Overlapping backups on the same network equipment ➢Overlapping backups and restores ➢Switch uplinks
  • 6. Architecture of container backups ➢Designed for 100,000 containers ➢backup each container at least once a day ➢30 incremental copies ➢Now I'll explain HOW :)
  • 7. Host machine architecture ➢We use LVM ➢RAID array which exposes a single drive ➢setup a single Physical Volume on that drive ➢setup a single Volume Group using the above PV ➢Thin provisioned VG ➢Each container with its own Logical Volume
  • 8. Backup node architecture ➢Again we use LVM ➢RAID array which exposes a single drive ➢5 equally big Physical Volumes ➢on each PV we create a VG with thin pool ➢each container has a single LV ➢each incremental backup is a new snapshot from the LV ➢when the max number of incremental backups is reached, we remove the first LV
  • 9. For now, there is nothing reallyFor now, there is nothing really new or very interesting here.new or very interesting here. So let me start with the funSo let me start with the fun part.part.
  • 10. ➢We use rsync (nothing revolutionary here) ➢We need the size of the deleted files ➢ https://github.com/kyupltd/rsync/tree/deleted-stats ➢Restore files directly in client's containers, no SSH into them ➢ https://github.com/kyupltd/rsync/tree/mount-ns
  • 11. Backup system architecture ➢ One central database ➢ Public/Private IP addresses ➢ Maximum slots per machine ➢ Gearman for messaging layer ➢ Scheduler for backups ➢ Backup worker
  • 12. The Scheduler ➢ Check if we have to backup the container ➢ Get the last backup timestamp ➢ Check if the host node has available backup slots ➢ Schedule a 'start-backup' job at the gearman on the backup node
  • 13. start-backup worker ➢ Works on each backup node ➢ Started as many times as the Backup server can handle ➢ handles the actual backup ➢ creates snapshots ➢ monitors rsync ➢ remove snapshots ➢ update database
  • 14. No problems... they say :) ➢ We lost ALL of our backups from TWO node ➢ corrupted VG metadata ➢ VG metadata is not enough (more then 2000) LVs ➢ create the VGs a little bit smaller then the total size of the PV ➢ separate the VGs to loose less
  • 15. No problems... they say :) ➢ LV creation becomes sluggish because LVM tries to scan for devices in /dev ➢ obtain_device_list_from_udev = 1 ➢ write_cache_state = 0 ➢ specify the devices in scan = [ “/dev” ] ➢lvmetad and dmetad break... ➢ when they breack, they corrupt the metadata of all currently opened containers ➢lvcreate leaks file descriptors ➢ once lvmetad or dmeventd are out of FDs everything breaks
  • 16. Then the Avatar came ➢ We wanted to reduce the restore time from 4h to under 1h, even under 30min ➢ So instead of backing up whole containers... ➢ We now backup accounts ➢ Soon we will be able to do distributed restore ➢ single host node backup ➢ from multiple backup nodes ➢ to multiple host nodes
  • 17. Layerd backupsSparse File Physical Volume Volume Group ThinPool Logical Volume Snapshot6 Snapshot5 Snapshot4 Snapshot3 Snapshot2 Snapshot1 Snapshot0 Loop mount
  • 18. Issues here ➢ We can't keep a machine UP for more then 19 hours, LVM kernel BUG ➢ 2.6 till 4.3 - when discarding data it crashes ➢ Removing old snapshots does not discard the data ➢ LVM umounts a volume when dmeventd reaches the limit of Fds ➢ It does umount -l, the bastard
  • 19. Issues here ➢ LVM dmeventd try's to extend the volume, but if you don't have free extents it will silently umount -l your LV ➢ Monitor your thinpool metadata ➢ Make your thinpool smaller then the VG and always plan to have a few spare PE for extending the pool ➢ kabbi__ irc.freenode.net #lvm