SlideShare a Scribd company logo
Persistent Storage with Docker in
production - Which solution and why?
Cheryl Hung @oicheryl
© 2013-2017 StorageOS Ltd. All rights reserved.
Cheryl
@oicheryl
© 2013-2017 StorageOS Ltd. All rights reserved. 2
Why do I need storage?
© 2013-2017 StorageOS Ltd. All rights reserved. 3
@oicheryl
why do I need storage? Stupid question right? Well yes, but applications
typically have more than one storage requirement.
Application binaries need ephemeral, performance storage.
Application data (what most people think of, like databases, message queues)
need dedicated persistent performance, for example block storage for
databases, replication for high availability etc., snapshots for point-in-time
copies and encryption.
Configuration needs to be shared and persistent, typically filesystem.
For backup, you want it to be cost efficient, so you’d be looking for
compression and deduplication, maybe back up to the cloud.
So why is this difficult with containers?
Why do I need storage?
© 2013-2017 StorageOS Ltd. All rights reserved. 4
App
binaries
App
data
Config Backup
@oicheryl
why do I need storage? Stupid question right? Well yes, but applications
typically have more than one storage requirement.
Application binaries need ephemeral, performance storage.
Application data for running databases, message queues need dedicated
persistent performance, for example block storage, replication for high
availability etc., snapshots and encryption.
Configuration files needs to be shared and persistent.
For backup, you want it to be cost efficient, so you’d be looking for
compression and deduplication, maybe back up to the cloud.
You probably want more than one storage solution. So why is this difficult with
containers?
5
Why is this tricky with
containers?
@oicheryl
First of all, does anyone not know what containers are?
Applications have become loosely coupled and stateless
Designed to scale and manage failure – it is no longer economical to remediate state
So why is this difficult with containers?
6
@oicheryl
No pets
First of all, You know the cattle/pet analogy? Don’t treat your servers like pets that you have to
lovingly name and take care of, treat them like cattle that when they get ill, you take them out
back and shoot them.
In other words, you don’t want special storage pets, you want ordinary commodity hardware or
cloud instances.
7
@oicheryl
Data
follows
Secondly, data needs to follow containers around. When nodes fail you want to reschedule
that container to another node and for data to follow it.
You want to avoid mapping containers to individual hosts - then you’ve lost your portability and
mobility.
8
@oicheryl
Humans
are
failable
Thirdly, humans are failable. Don’t rely on someone running through a playbook, they will
screw things up.
You want to manage storage through APIs and you want that integrated with Docker and
Kubernetes. So let’s take a closer look at Docker.
9
Docker container layers
@oicheryl
Docker containers comprise a layered image and a writable ‘Container layer’. Here, the base
image is Ubuntu and the top right is the thin r/w container layer, where new or modified data is
stored.
When a container is deleted its writable layer is removed leaving just the underlying image
layers behind.
This is good because sharing layers makes images smaller and lack of state makes it easy to
move containers around.
This is bad because generally you want your app to do something useful in the real world.
So we can look at options with Docker.
Docker local volumes
@oicheryl
Docker volumes allow you to share data between host and containers through local
volumes.
Here you run docker volume create which creates a volume called mydata, mount
that into the container at /data, and write a file. When the container exits, it’s persisted
under /var/lib/docker/volumes.
On the plus side, you can’t get faster than writing to your local host. The downside is
that because the data tied to that host, it doesn’t follow you around and if that host
goes down your data is inaccessible. Plus there’s no locking so you have to be careful
with consistency, plus it’s subject to the noisy neighbour problem.
To extend storage Docker has volume plugins, but first I want to
11
Eight principles of
Cloud Native Storage
12
What is Cloud Native?
@oicheryl
Horizontally scalable
Built to handle failures
Resilient and survivable
Minimal operator overhead
Decoupled from the underlying platform
What do I mean by cloud native?
Horizontally scalable
Built to handle failures, so no single point of failure
Resilient and survivable, in other words it should be self healing
Minimal operator overhead - it should be API-driven and automatable
Decoupled from the underlying platform and hardware.
How does that apply to storage?
1. Platform agnostic
Eight principles of Cloud Native Storage
@oicheryl
The storage platform should be able to run anywhere and not have proprietary
dependencies that lock an application to a particular platform or a cloud provider.
Additionally, it should be capable of being scaled out in a distributed topology just as
easily as it could be scaled up based on application requirements. Upgrades and
scaling of the storage platform should be implemented as a non-disruptive operation
to the running applications.
1. Platform agnostic
2. API driven
Eight principles of Cloud Native Storage
@oicheryl
Storage resources and services should be easy to be provisioned, consumed, moved
and managed via an API, and provide integration with application runtime and
orchestrator platforms.
1. Platform agnostic
2. API driven
3. Declarative and
composable
Eight principles of Cloud Native Storage
@oicheryl
Storage resources should be declared and composed just like all other resources
required by applications and services, allowing storage resources and services to be
deployed and provisioned as part of application instantiation through orchestrators.
1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
@oicheryl
Storage should be presented to and consumed by applications and not by operating
systems or hypervisors. It is no longer desirable to present storage to operating
systems instances, and then, later, have to map applications to operating system
instances to link to storage (whether on-premises or in a cloud provider, or on VMs or
bare metal) . Storage needs to be able to follow an application as it scales, grows,
and moves between platforms and clouds.
1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
@oicheryl
The platform should be able to dynamically react to changes in the environment and
be able to move application data between locations, dynamically resize volumes for
growth, take point in time copies of data for data retention or to facilitate rapid
recovery of data, and integrate naturally into dynamic, rapidly changing application
environments.
1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
6. Natively secure
@oicheryl
Storage services should integrate and inline security features such as encryption and
RBAC and not depend on secondary products to secure application data.
1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
6. Natively secure
7. Performant
@oicheryl
The storage platform should be able to offer deterministic performance in complex
distributed environments and scale efficiently using a minimum of compute resources.
1. Platform agnostic
2. API driven
3. Declarative and
composable
4. Application centric
Eight principles of Cloud Native Storage
5. Agile
6. Natively secure
7. Performant
8. Consistently
available
@oicheryl
The storage platform should manage data distribution with a predictable, proven data
model to ensure high availability, durability, consistency of application data. During
failure conditions, data recovery processes should be application independent and not
affect normal operations.
21
Storage landscape
So that’s the eight principles of cloud native storage. Now we’ll take a look at few
different paradigms, a popular example of each and score them against those 8
principles.
Warning in advance, this is really high level but I hope I can put these into the context
of running them with Docker.
Centralised file system: NFS
@oicheryl
Your classic NAS or network attached storage is NFS. You take one NFS server and
export the local filesystem over the network to a number of clients. Who here uses
NFS?
It doesn’t really follow any of the eight principles; it’s hard to scale horizontally, it’s not
integrated into Docker or Kubernetes natively, and it’s a single point of failure so
there’s no availability guarantees, although there are commercial options for failover.
First designed in 1984 - definitely not cloud native. I’ve scored NFS 0.
Centralised file system: NFS
@oicheryl
Your classic NAS or network attached storage is NFS. You take one NFS server and
export the local filesystem over the network to a number of clients. Who here uses
NFS?
It doesn’t really follow any of the eight principles; it’s hard to scale horizontally, it’s not
integrated into Docker or Kubernetes natively, and it’s a single point of failure so
there’s no availability guarantees, although there are commercial options for failover.
First designed in 1984 - definitely not cloud native. I’ve scored NFS 0.
Storage array: Dell EMC
@oicheryl
Next up is the classic hardware storage array like Dell EMC. This image is totally
unfair of course but it does show the complexity.
It’s even less platform agnostic, since you have to buy the hardware from a specific
vendor, and typically is easier to scale up than horizontally. It’s also absurdly
expensive, has long lead times, inefficient (no thin provisioning) and definitely not
application centric.
But it’s optimised for deterministic performance, which is why many enterprises who
use databases still use storage arrays. Anybody using storage arrays from Dell EMC,
HPE, NetApp, Hitachi and so on?
Also not very cloud native - I’ve given it 2.
Storage array: Dell EMC
@oicheryl
Next up is the classic hardware storage array like Dell EMC. This image is totally
unfair of course but it does show the complexity.
It’s even less platform agnostic, since you have to buy the hardware from a specific
vendor, and typically is easier to scale up than horizontally. It’s also absurdly
expensive, has long lead times, inefficient (no thin provisioning) and definitely not
application centric.
But it’s optimised for deterministic performance, which is why many enterprises who
use databases still use storage arrays.
Also not very cloud native - I’ve given it 2.
Distributed: Ceph
@oicheryl
Jumping ahead, let’s talk about distributed storage like Ceph, which is a distributed
object store maintained by Red Hat. Distributed architectures typically trade off
performance against consistency, so you get better performance if you don’t need
strong consistency.
So how cloud native is Ceph? Distributed architectures are usually designed with
scaling in mind, and although it’s not natively integrated with Docker or K8s you can
look into a project called Rook for the latter.
The big downsides of Ceph is that it’s complicated to set up, and failures are
expensive. Because of the way the data is distributed across all nodes in a cluster
means that any failures need rebuilding from the whole cluster. The distributed
architecture also means that one write fans out between 13 and 40 times, which limits
the performance you can get from your cluster.
4/8 By the way, if you want to look at the numbers I’m giving, these slides are on
oicheryl.com.
Distributed: Ceph
@oicheryl
Jumping ahead, let’s talk about distributed storage like Ceph, which is a distributed
object store maintained by Red Hat. Distributed architectures typically trade off
performance against consistency, so you get better performance if you don’t need
strong consistency.
So how cloud native is Ceph? Distributed architectures are usually designed with
scaling in mind, and although it’s not natively integrated with Docker or K8s you can
look into a project called Rook for the latter.
The big downsides of Ceph is that it’s complicated to set up, and failures are
expensive. Because of the way the data is distributed across all nodes in a cluster
means that any failures need rebuilding from the whole cluster. The distributed
architecture also means that one write fans out between 13 and 40 times, which limits
the performance you can get from your cluster.
4/8 By the way, if you want to look at the numbers I’m giving, these slides are on
oicheryl.com.
Public cloud: AWS EBS
@oicheryl
Now if you’re a hipster and you want to join all the cool kids on public cloud, then EBS
is a popular option, which stands for Elastic block storage.
EBS is pretty nice, everything is scalable, highly consistent and high performance. On
the downside, you have a maximum of 40 EBS volumes you can mount per EC2
instance, which limits how many containers you can run per EC2 instance, and when
you move containers between hosts, mounting physical block devices to nodes takes
at least 45 seconds.
I’ll also mention S3 which is Amazon’s eventual consistency object store and
seemingly powers half the internet. Lots of users choose a single Availability Zone
because it’s cheap, but because only S3 guarantees 99.9% monthly uptime, which is
43 minutes downtime a month, outages take out half the internet too, which might be
a good thing if you’re as addicted to Reddit as I am. Great for better for backups and
non-critical data, not so great for business data.
How are we doing on the cloud native front? Well, it’s scalable and highly consistent
and high performance, which is great. But you’re locked into Amazon as a cloud
provider, which is obviously how they like it, it gets pretty expensive which I’m sure
some of you know, and there’s privacy issues about moving sensitive data to the
cloud.
6/8
Public cloud: AWS EBS
@oicheryl
Now if you’re a hipster and you want to join all the cool kids on public cloud, then EBS
is a popular option, which stands for Elastic block storage.
EBS is pretty nice, everything is scalable, highly consistent and high performance. On
the downside, you have a maximum of 40 EBS instances you can mount per EC2
instance, which limits how many containers you can run per EC2 instance, and
mounting physical block devices to nodes takes at least 45 seconds, which is not
good for being able to move containers to different hosts.
I’ll also mention S3 which is Amazon’s eventual consistency object store and
seemingly powers half the internet. Lots of users choose a single Availability Zone
because it’s cheap, and because only S3 guarantees 99.9% monthly uptime, which is
43 minutes downtime a month, outages take out half the internet too, which might be
a good thing if you’re as addicted to Reddit as I am. Great for better for backups and
non-critical data, not so great for business data.
How are we doing on the cloud native front? Well, it’s scalable and highly consistent
and high performance, which is great. But you’re locked into Amazon as a cloud
provider, which is obviously how they like it, it gets pretty expensive which I’m sure
some of you know, and there’s privacy issues about moving sensitive data to the
cloud.
6/8 By the way, if you want to look at the numbers I’m giving, these slides are on
oicheryl.com.
Volume plugin: StorageOS
@oicheryl
Volume plugins are Docker’s way of extending storage capabilities, and StorageOS is
an example of a distributed block storage platform which is deployed with Docker.
To use StorageOS you could create a docker volume with the storageos driver, set
the size is 15 GB and in this case, tell it to create two replicas of the volume on other
nodes. This gets you the high availability (if the node with the master volume goes
down you can promote one of the replicas to a new master), plus all the volumes are
accessible from any node, so if your container goes down you can spin it up
anywhere without worrying about which host it’s on. But it’s not a distributed
filesystem, so the StorageOS can schedule the master volume to the same node as
the container, meaning your reads are local and fast and deteministic.
Given this was built with those principles in mind, I’m obviously giving it an 8, but
there are some downsides; for instance, right now it assumes your cluster is
geographically close, so cross availability zone replication would be slow.
Volume plugin: StorageOS
@oicheryl
Volume plugins are Docker’s way of extending storage capabilities, and StorageOS is
an example of a distributed block storage platform which is deployed with Docker.
To use StorageOS you could create a docker volume with the storageos driver, set
the size is 15 GB and in this case, tell it to create two replicas of the volume on other
nodes. This gets you the high availability (if the node with the master volume goes
down you can promote one of the replicas to a new master), plus all the volumes are
accessible from any node, so if your container goes down you can spin it up
anywhere without worrying about which host it’s on. But because it’s not a distributed
filesystem, the StorageOS scheduler can always schedule the master volume to the
same node as the container, meaning your reads are local and you get good
throughput.
Given this was built with those principles in mind, I’m obviously giving it an 8, but
there are some downsides; for instance, right now it assumes your cluster is
geographically close, so cross availability zone replication would be slow.
Plugin framework: REX-Ray
@oicheryl
I mentioned Dell EMC before as a hardware vendor, they are also the developers of
REX-Ray which I want to mention because superficially it looks like another storage
plugin.
REX-Ray doesn’t provide storage itself; it’s a framework which supports a number of
different storage systems. Not really a cloud native - it’s just a connector to existing
storage options. So I’m not going to give it a score.
33
Conclusion
Of course, you can run lots of these things in combination. You could run StorageOS
on EBS, REX-Ray on top of Ceph, or NFS from VMs. There’s no one solution, and
often it’s a bit of trial and expensive (error). But hopefully those eight principles have
given you a way to evaluate what you need against what you’re currently using.
If you’re interested in learning more, standards are continuing to
improve and the K8s Storage Special Interest Group and CNCF
Storage Working Group are proposing a Container Storage
Interface to make it easier to move between storage options.
K8S Storage SIG & CNCF Storage WG:
https://github.com/cncf/wg-storage
Objective is to define an industry standard “Container
Storage Interface” (CSI) that will enable storage vendors
(SP) to develop a plugin once and have it work across a
number of container orchestration (CO) systems.
© 2013-2017 StorageOS Ltd. All rights reserved. 34
@oicheryl
Thanks
Slides at oicheryl.com
© 2013-2017 StorageOS Ltd. All rights reserved.

More Related Content

PDF
Persistent Storage with Kubernetes in Production
PDF
Persistent Storage with Kubernetes in Production
PDF
Persistent Storage with Kubernetes in Production
PDF
Accelerate Spark Workloads on S3
PDF
Persistent Storage with Kubernetes in Production
PDF
Introduction to Container Storage
PDF
EMC Isilon Best Practices for Hadoop Data Storage
 
PPTX
2/18 Technical Overview
Persistent Storage with Kubernetes in Production
Persistent Storage with Kubernetes in Production
Persistent Storage with Kubernetes in Production
Accelerate Spark Workloads on S3
Persistent Storage with Kubernetes in Production
Introduction to Container Storage
EMC Isilon Best Practices for Hadoop Data Storage
 
2/18 Technical Overview

What's hot (18)

PDF
Reach new heights with Nutanix
PDF
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...
PPTX
Arcserve Portfolio Technical Overview
PDF
IBM Object Storage and Software Defined Solutions - Cleversafe
PDF
Preserve user response time while ensuring data availability
PDF
Compared to a similarly sized solution from a scale-out vendor, the Dell EMC ...
PDF
Boost your work with hardware from Intel
PDF
Give DevOps teams self-service resource pools within your private infrastruct...
PPTX
Technical track 1: arcserve UDP deep dvie
PDF
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
PDF
Ibm power systems hpc cluster
PPTX
Next Generation Data Protection Architecture
PDF
Accelerating Analytics with EMR on your S3 Data Lake
PPTX
Technical track 2: arcserve UDP for virtualization & cloud
PDF
Scality - RING Overview
PPTX
Appliance Launch Webcast
PDF
Keep data available without affecting user response time
PDF
SNIA : Swift Object Storage adding EC (Erasure Code)
Reach new heights with Nutanix
Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC P...
Arcserve Portfolio Technical Overview
IBM Object Storage and Software Defined Solutions - Cleversafe
Preserve user response time while ensuring data availability
Compared to a similarly sized solution from a scale-out vendor, the Dell EMC ...
Boost your work with hardware from Intel
Give DevOps teams self-service resource pools within your private infrastruct...
Technical track 1: arcserve UDP deep dvie
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
Ibm power systems hpc cluster
Next Generation Data Protection Architecture
Accelerating Analytics with EMR on your S3 Data Lake
Technical track 2: arcserve UDP for virtualization & cloud
Scality - RING Overview
Appliance Launch Webcast
Keep data available without affecting user response time
SNIA : Swift Object Storage adding EC (Erasure Code)
Ad

Similar to Persistent storage in Docker (20)

PDF
Persistent Storage with Kubernetes in Production
PDF
Persistent Storage with Kubernetes in Production
PDF
Eight principles of cloud native storage
PDF
2017 11-06-cloud-native-storage
PDF
Introduction to Container Storage
PDF
Persistent storage with containers By Kaslin Fields
PDF
Containerizing stateful apps with Kubernetes and SUSE CaaS Platform
PDF
There's no such thing as a stateless architecture
PPTX
Top 6 Practices to Harden Docker Images to Enhance Security
PDF
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
PPTX
StorageOS - 8 core principles of cloud native storage
PPTX
PaaS with Docker
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Think like a storage architect, in four questions
PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
PDF
Amsteram Docker meetup - Cloud Native Storage - Chris Brandon
PDF
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
PDF
Unit No. III Docker ppt part 2.pdf Cloud Microservices & Application
PPTX
Catching the Software Defined Storage Wave
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
Persistent Storage with Kubernetes in Production
Persistent Storage with Kubernetes in Production
Eight principles of cloud native storage
2017 11-06-cloud-native-storage
Introduction to Container Storage
Persistent storage with containers By Kaslin Fields
Containerizing stateful apps with Kubernetes and SUSE CaaS Platform
There's no such thing as a stateless architecture
Top 6 Practices to Harden Docker Images to Enhance Security
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
StorageOS - 8 core principles of cloud native storage
PaaS with Docker
Achieving Separation of Compute and Storage in a Cloud World
Think like a storage architect, in four questions
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Amsteram Docker meetup - Cloud Native Storage - Chris Brandon
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
Unit No. III Docker ppt part 2.pdf Cloud Microservices & Application
Catching the Software Defined Storage Wave
How the Development Bank of Singapore solves on-prem compute capacity challen...
Ad

More from Cheryl Hung (20)

PDF
Building the Customer Identity Community, Together.pdf
PDF
SlideShare a Scribd company logo Search Submit Search Upload Download free...
PDF
Building AI platforms together, for everyone @ SOOCon25.pdf
PDF
Building the Future, Together - Kubernetes Community Days, 2024
PDF
Key Trends Shaping Cloud Infrastructure and Edge Infrastructure
PDF
Key Trends Shaping the Future of Infrastructure.pdf
PDF
Key Trends Shaping the Future of Infrastructure.pdf
PPTX
Multi-Arch Infra From the Ground Up.pptx
PDF
Multi-arch from the ground up
PDF
Crossing the chasm with multi-arch
PDF
Lessons Learned from 3 years inside CNCF
PDF
Infrastructure matters - The DevOps Conference, Copenhagen
PDF
Infrastructure matters.pdf
PDF
Cloud Native Trends and 2022 Predictions - Cheryl Hung, 16 June 2022 - Cloud ...
PDF
Lessons learned from 3 years inside cncf - WTF is Cloud Native, 4 September 2021
PDF
Lessons learned from 3 years inside CNCF - Swiss Cloud Native Day
PDF
10 predictions for cloud native in 2021 - Fidelity Cloud Cast
PDF
10 predictions for cloud native in 2021 - Cheryl Hung GIFEE day
PPTX
Data and Storage Ecosystem Opportunities and Need - Cheryl Hung Sodacon2020 k...
PPTX
10 predictions for cloud native in 2021
Building the Customer Identity Community, Together.pdf
SlideShare a Scribd company logo Search Submit Search Upload Download free...
Building AI platforms together, for everyone @ SOOCon25.pdf
Building the Future, Together - Kubernetes Community Days, 2024
Key Trends Shaping Cloud Infrastructure and Edge Infrastructure
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Multi-Arch Infra From the Ground Up.pptx
Multi-arch from the ground up
Crossing the chasm with multi-arch
Lessons Learned from 3 years inside CNCF
Infrastructure matters - The DevOps Conference, Copenhagen
Infrastructure matters.pdf
Cloud Native Trends and 2022 Predictions - Cheryl Hung, 16 June 2022 - Cloud ...
Lessons learned from 3 years inside cncf - WTF is Cloud Native, 4 September 2021
Lessons learned from 3 years inside CNCF - Swiss Cloud Native Day
10 predictions for cloud native in 2021 - Fidelity Cloud Cast
10 predictions for cloud native in 2021 - Cheryl Hung GIFEE day
Data and Storage Ecosystem Opportunities and Need - Cheryl Hung Sodacon2020 k...
10 predictions for cloud native in 2021

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology

Persistent storage in Docker

  • 1. Persistent Storage with Docker in production - Which solution and why? Cheryl Hung @oicheryl © 2013-2017 StorageOS Ltd. All rights reserved.
  • 2. Cheryl @oicheryl © 2013-2017 StorageOS Ltd. All rights reserved. 2
  • 3. Why do I need storage? © 2013-2017 StorageOS Ltd. All rights reserved. 3 @oicheryl why do I need storage? Stupid question right? Well yes, but applications typically have more than one storage requirement. Application binaries need ephemeral, performance storage. Application data (what most people think of, like databases, message queues) need dedicated persistent performance, for example block storage for databases, replication for high availability etc., snapshots for point-in-time copies and encryption. Configuration needs to be shared and persistent, typically filesystem. For backup, you want it to be cost efficient, so you’d be looking for compression and deduplication, maybe back up to the cloud. So why is this difficult with containers?
  • 4. Why do I need storage? © 2013-2017 StorageOS Ltd. All rights reserved. 4 App binaries App data Config Backup @oicheryl why do I need storage? Stupid question right? Well yes, but applications typically have more than one storage requirement. Application binaries need ephemeral, performance storage. Application data for running databases, message queues need dedicated persistent performance, for example block storage, replication for high availability etc., snapshots and encryption. Configuration files needs to be shared and persistent. For backup, you want it to be cost efficient, so you’d be looking for compression and deduplication, maybe back up to the cloud. You probably want more than one storage solution. So why is this difficult with containers?
  • 5. 5 Why is this tricky with containers? @oicheryl First of all, does anyone not know what containers are? Applications have become loosely coupled and stateless Designed to scale and manage failure – it is no longer economical to remediate state So why is this difficult with containers?
  • 6. 6 @oicheryl No pets First of all, You know the cattle/pet analogy? Don’t treat your servers like pets that you have to lovingly name and take care of, treat them like cattle that when they get ill, you take them out back and shoot them. In other words, you don’t want special storage pets, you want ordinary commodity hardware or cloud instances.
  • 7. 7 @oicheryl Data follows Secondly, data needs to follow containers around. When nodes fail you want to reschedule that container to another node and for data to follow it. You want to avoid mapping containers to individual hosts - then you’ve lost your portability and mobility.
  • 8. 8 @oicheryl Humans are failable Thirdly, humans are failable. Don’t rely on someone running through a playbook, they will screw things up. You want to manage storage through APIs and you want that integrated with Docker and Kubernetes. So let’s take a closer look at Docker.
  • 9. 9 Docker container layers @oicheryl Docker containers comprise a layered image and a writable ‘Container layer’. Here, the base image is Ubuntu and the top right is the thin r/w container layer, where new or modified data is stored. When a container is deleted its writable layer is removed leaving just the underlying image layers behind. This is good because sharing layers makes images smaller and lack of state makes it easy to move containers around. This is bad because generally you want your app to do something useful in the real world. So we can look at options with Docker.
  • 10. Docker local volumes @oicheryl Docker volumes allow you to share data between host and containers through local volumes. Here you run docker volume create which creates a volume called mydata, mount that into the container at /data, and write a file. When the container exits, it’s persisted under /var/lib/docker/volumes. On the plus side, you can’t get faster than writing to your local host. The downside is that because the data tied to that host, it doesn’t follow you around and if that host goes down your data is inaccessible. Plus there’s no locking so you have to be careful with consistency, plus it’s subject to the noisy neighbour problem. To extend storage Docker has volume plugins, but first I want to
  • 12. 12 What is Cloud Native? @oicheryl Horizontally scalable Built to handle failures Resilient and survivable Minimal operator overhead Decoupled from the underlying platform What do I mean by cloud native? Horizontally scalable Built to handle failures, so no single point of failure Resilient and survivable, in other words it should be self healing Minimal operator overhead - it should be API-driven and automatable Decoupled from the underlying platform and hardware. How does that apply to storage?
  • 13. 1. Platform agnostic Eight principles of Cloud Native Storage @oicheryl The storage platform should be able to run anywhere and not have proprietary dependencies that lock an application to a particular platform or a cloud provider. Additionally, it should be capable of being scaled out in a distributed topology just as easily as it could be scaled up based on application requirements. Upgrades and scaling of the storage platform should be implemented as a non-disruptive operation to the running applications.
  • 14. 1. Platform agnostic 2. API driven Eight principles of Cloud Native Storage @oicheryl Storage resources and services should be easy to be provisioned, consumed, moved and managed via an API, and provide integration with application runtime and orchestrator platforms.
  • 15. 1. Platform agnostic 2. API driven 3. Declarative and composable Eight principles of Cloud Native Storage @oicheryl Storage resources should be declared and composed just like all other resources required by applications and services, allowing storage resources and services to be deployed and provisioned as part of application instantiation through orchestrators.
  • 16. 1. Platform agnostic 2. API driven 3. Declarative and composable 4. Application centric Eight principles of Cloud Native Storage @oicheryl Storage should be presented to and consumed by applications and not by operating systems or hypervisors. It is no longer desirable to present storage to operating systems instances, and then, later, have to map applications to operating system instances to link to storage (whether on-premises or in a cloud provider, or on VMs or bare metal) . Storage needs to be able to follow an application as it scales, grows, and moves between platforms and clouds.
  • 17. 1. Platform agnostic 2. API driven 3. Declarative and composable 4. Application centric Eight principles of Cloud Native Storage 5. Agile @oicheryl The platform should be able to dynamically react to changes in the environment and be able to move application data between locations, dynamically resize volumes for growth, take point in time copies of data for data retention or to facilitate rapid recovery of data, and integrate naturally into dynamic, rapidly changing application environments.
  • 18. 1. Platform agnostic 2. API driven 3. Declarative and composable 4. Application centric Eight principles of Cloud Native Storage 5. Agile 6. Natively secure @oicheryl Storage services should integrate and inline security features such as encryption and RBAC and not depend on secondary products to secure application data.
  • 19. 1. Platform agnostic 2. API driven 3. Declarative and composable 4. Application centric Eight principles of Cloud Native Storage 5. Agile 6. Natively secure 7. Performant @oicheryl The storage platform should be able to offer deterministic performance in complex distributed environments and scale efficiently using a minimum of compute resources.
  • 20. 1. Platform agnostic 2. API driven 3. Declarative and composable 4. Application centric Eight principles of Cloud Native Storage 5. Agile 6. Natively secure 7. Performant 8. Consistently available @oicheryl The storage platform should manage data distribution with a predictable, proven data model to ensure high availability, durability, consistency of application data. During failure conditions, data recovery processes should be application independent and not affect normal operations.
  • 21. 21 Storage landscape So that’s the eight principles of cloud native storage. Now we’ll take a look at few different paradigms, a popular example of each and score them against those 8 principles. Warning in advance, this is really high level but I hope I can put these into the context of running them with Docker.
  • 22. Centralised file system: NFS @oicheryl Your classic NAS or network attached storage is NFS. You take one NFS server and export the local filesystem over the network to a number of clients. Who here uses NFS? It doesn’t really follow any of the eight principles; it’s hard to scale horizontally, it’s not integrated into Docker or Kubernetes natively, and it’s a single point of failure so there’s no availability guarantees, although there are commercial options for failover. First designed in 1984 - definitely not cloud native. I’ve scored NFS 0.
  • 23. Centralised file system: NFS @oicheryl Your classic NAS or network attached storage is NFS. You take one NFS server and export the local filesystem over the network to a number of clients. Who here uses NFS? It doesn’t really follow any of the eight principles; it’s hard to scale horizontally, it’s not integrated into Docker or Kubernetes natively, and it’s a single point of failure so there’s no availability guarantees, although there are commercial options for failover. First designed in 1984 - definitely not cloud native. I’ve scored NFS 0.
  • 24. Storage array: Dell EMC @oicheryl Next up is the classic hardware storage array like Dell EMC. This image is totally unfair of course but it does show the complexity. It’s even less platform agnostic, since you have to buy the hardware from a specific vendor, and typically is easier to scale up than horizontally. It’s also absurdly expensive, has long lead times, inefficient (no thin provisioning) and definitely not application centric. But it’s optimised for deterministic performance, which is why many enterprises who use databases still use storage arrays. Anybody using storage arrays from Dell EMC, HPE, NetApp, Hitachi and so on? Also not very cloud native - I’ve given it 2.
  • 25. Storage array: Dell EMC @oicheryl Next up is the classic hardware storage array like Dell EMC. This image is totally unfair of course but it does show the complexity. It’s even less platform agnostic, since you have to buy the hardware from a specific vendor, and typically is easier to scale up than horizontally. It’s also absurdly expensive, has long lead times, inefficient (no thin provisioning) and definitely not application centric. But it’s optimised for deterministic performance, which is why many enterprises who use databases still use storage arrays. Also not very cloud native - I’ve given it 2.
  • 26. Distributed: Ceph @oicheryl Jumping ahead, let’s talk about distributed storage like Ceph, which is a distributed object store maintained by Red Hat. Distributed architectures typically trade off performance against consistency, so you get better performance if you don’t need strong consistency. So how cloud native is Ceph? Distributed architectures are usually designed with scaling in mind, and although it’s not natively integrated with Docker or K8s you can look into a project called Rook for the latter. The big downsides of Ceph is that it’s complicated to set up, and failures are expensive. Because of the way the data is distributed across all nodes in a cluster means that any failures need rebuilding from the whole cluster. The distributed architecture also means that one write fans out between 13 and 40 times, which limits the performance you can get from your cluster. 4/8 By the way, if you want to look at the numbers I’m giving, these slides are on oicheryl.com.
  • 27. Distributed: Ceph @oicheryl Jumping ahead, let’s talk about distributed storage like Ceph, which is a distributed object store maintained by Red Hat. Distributed architectures typically trade off performance against consistency, so you get better performance if you don’t need strong consistency. So how cloud native is Ceph? Distributed architectures are usually designed with scaling in mind, and although it’s not natively integrated with Docker or K8s you can look into a project called Rook for the latter. The big downsides of Ceph is that it’s complicated to set up, and failures are expensive. Because of the way the data is distributed across all nodes in a cluster means that any failures need rebuilding from the whole cluster. The distributed architecture also means that one write fans out between 13 and 40 times, which limits the performance you can get from your cluster. 4/8 By the way, if you want to look at the numbers I’m giving, these slides are on oicheryl.com.
  • 28. Public cloud: AWS EBS @oicheryl Now if you’re a hipster and you want to join all the cool kids on public cloud, then EBS is a popular option, which stands for Elastic block storage. EBS is pretty nice, everything is scalable, highly consistent and high performance. On the downside, you have a maximum of 40 EBS volumes you can mount per EC2 instance, which limits how many containers you can run per EC2 instance, and when you move containers between hosts, mounting physical block devices to nodes takes at least 45 seconds. I’ll also mention S3 which is Amazon’s eventual consistency object store and seemingly powers half the internet. Lots of users choose a single Availability Zone because it’s cheap, but because only S3 guarantees 99.9% monthly uptime, which is 43 minutes downtime a month, outages take out half the internet too, which might be a good thing if you’re as addicted to Reddit as I am. Great for better for backups and non-critical data, not so great for business data. How are we doing on the cloud native front? Well, it’s scalable and highly consistent and high performance, which is great. But you’re locked into Amazon as a cloud provider, which is obviously how they like it, it gets pretty expensive which I’m sure some of you know, and there’s privacy issues about moving sensitive data to the cloud.
  • 29. 6/8
  • 30. Public cloud: AWS EBS @oicheryl Now if you’re a hipster and you want to join all the cool kids on public cloud, then EBS is a popular option, which stands for Elastic block storage. EBS is pretty nice, everything is scalable, highly consistent and high performance. On the downside, you have a maximum of 40 EBS instances you can mount per EC2 instance, which limits how many containers you can run per EC2 instance, and mounting physical block devices to nodes takes at least 45 seconds, which is not good for being able to move containers to different hosts. I’ll also mention S3 which is Amazon’s eventual consistency object store and seemingly powers half the internet. Lots of users choose a single Availability Zone because it’s cheap, and because only S3 guarantees 99.9% monthly uptime, which is 43 minutes downtime a month, outages take out half the internet too, which might be a good thing if you’re as addicted to Reddit as I am. Great for better for backups and non-critical data, not so great for business data. How are we doing on the cloud native front? Well, it’s scalable and highly consistent and high performance, which is great. But you’re locked into Amazon as a cloud provider, which is obviously how they like it, it gets pretty expensive which I’m sure some of you know, and there’s privacy issues about moving sensitive data to the cloud.
  • 31. 6/8 By the way, if you want to look at the numbers I’m giving, these slides are on oicheryl.com.
  • 32. Volume plugin: StorageOS @oicheryl Volume plugins are Docker’s way of extending storage capabilities, and StorageOS is an example of a distributed block storage platform which is deployed with Docker. To use StorageOS you could create a docker volume with the storageos driver, set the size is 15 GB and in this case, tell it to create two replicas of the volume on other nodes. This gets you the high availability (if the node with the master volume goes down you can promote one of the replicas to a new master), plus all the volumes are accessible from any node, so if your container goes down you can spin it up anywhere without worrying about which host it’s on. But it’s not a distributed filesystem, so the StorageOS can schedule the master volume to the same node as the container, meaning your reads are local and fast and deteministic. Given this was built with those principles in mind, I’m obviously giving it an 8, but there are some downsides; for instance, right now it assumes your cluster is geographically close, so cross availability zone replication would be slow.
  • 33. Volume plugin: StorageOS @oicheryl Volume plugins are Docker’s way of extending storage capabilities, and StorageOS is an example of a distributed block storage platform which is deployed with Docker. To use StorageOS you could create a docker volume with the storageos driver, set the size is 15 GB and in this case, tell it to create two replicas of the volume on other nodes. This gets you the high availability (if the node with the master volume goes down you can promote one of the replicas to a new master), plus all the volumes are accessible from any node, so if your container goes down you can spin it up anywhere without worrying about which host it’s on. But because it’s not a distributed filesystem, the StorageOS scheduler can always schedule the master volume to the same node as the container, meaning your reads are local and you get good throughput. Given this was built with those principles in mind, I’m obviously giving it an 8, but there are some downsides; for instance, right now it assumes your cluster is geographically close, so cross availability zone replication would be slow.
  • 34. Plugin framework: REX-Ray @oicheryl I mentioned Dell EMC before as a hardware vendor, they are also the developers of REX-Ray which I want to mention because superficially it looks like another storage plugin. REX-Ray doesn’t provide storage itself; it’s a framework which supports a number of different storage systems. Not really a cloud native - it’s just a connector to existing storage options. So I’m not going to give it a score.
  • 35. 33 Conclusion Of course, you can run lots of these things in combination. You could run StorageOS on EBS, REX-Ray on top of Ceph, or NFS from VMs. There’s no one solution, and often it’s a bit of trial and expensive (error). But hopefully those eight principles have given you a way to evaluate what you need against what you’re currently using.
  • 36. If you’re interested in learning more, standards are continuing to improve and the K8s Storage Special Interest Group and CNCF Storage Working Group are proposing a Container Storage Interface to make it easier to move between storage options. K8S Storage SIG & CNCF Storage WG: https://github.com/cncf/wg-storage Objective is to define an industry standard “Container Storage Interface” (CSI) that will enable storage vendors (SP) to develop a plugin once and have it work across a number of container orchestration (CO) systems. © 2013-2017 StorageOS Ltd. All rights reserved. 34 @oicheryl
  • 37. Thanks Slides at oicheryl.com © 2013-2017 StorageOS Ltd. All rights reserved.