SlideShare a Scribd company logo
Joel Jacobson
Scaling DataStax in Docker
How it started
© DataStax, All Rights Reserved. 2
Internal project at dotCloud
Pivoted to Docker Inc.
Execution using libcontainer
Huge adoption
What is Docker?
and why is it important?
3 key concepts
© DataStax, All Rights Reserved. 4
Images
Registries
Containers
Example Dockerfile image
© DataStax, All Rights Reserved. 5
Why are containers important?
© DataStax, All Rights Reserved. 6
Speeding up application development
Better resource utilization
Mobility
Faster provisioning
Microservices
Why are containers important?
© DataStax, All Rights Reserved. 7
WEB UI BILLINGCUSTOMER
MYSQL
EXT SERVICE
DB ADAPTER
PAYMENTS
SERVICE X
SERVICE YREST API
EXT SERVICE
Why are containers important?
© DataStax, All Rights Reserved. 8
WEB UI
BILLING
REST API
CUSTOMER
REST API
CASSANDRA SPARKSOLR
PAYMENTS
REST API
SERVICE X
REST API
SERVICE Y
REST API
EXT SERVICEEXT SERVICE
Why are containers important?
© DataStax, All Rights Reserved. 9
Why are containers important?
© DataStax, All Rights Reserved. 10
DataStax Enterprise in Docker
Why are containers important?
© DataStax, All Rights Reserved. 12
Build once, deploy anywhere
Flexibility for sharing binaries and libraries across applications
Process of managing, maintaing and deploying turn key
Officially supported since DSE 4.8
DSE processes
© DataStax, All Rights Reserved. 13
Core DSE JVM
One or more Spark executor processes
Single Spark worker process
Multiple processes for the Hadoop stack
Ad-hoc process (Spark job server, SparkSQL, CLI etc.)
OpsCenter agent
DataStax Enterprise configuration
© DataStax, All Rights Reserved. 14
Cassandra configuration (seeds,
cluster_name etc)
Where to manage Cassandra data
Optimal JVM heap size
Optimal garbage collector
DataStax Enterprise configuration
© DataStax, All Rights Reserved. 15
Default capability limits of Docker break mlockall
Add –XX:+AlwaysPreTouch to the JVM arguments
ulimits inherited from Docker daemon
Disable swap on host OS
Networking
© DataStax, All Rights Reserved. 16
Default networking (via Linux bridge) not recommended
Instead use docker run –net=host
Use pipework or weave for consistent IP addresses
Storage
© DataStax, All Rights Reserved. 17
Everything in /var/lib/cassandra;
commitlog
saved_caches
data directories
Use supported filesystem
Storage
© DataStax, All Rights Reserved. 18
Data volumes can be shared and reused amoung containers
Changes are made directly
Changes to a volume will not be included when you update an image
Data volumes persist if container is deleted
Storage
© DataStax, All Rights Reserved. 19
docker run –v <some root dir>/<dse_image_name>-data:/data –v
<some root dir>/<dse_image_name>-conf:/conf –v <some root
dir>/<dse_image_name>-logs:/logs –d <dse_image_name>
DSE Docker Demo
Futures
© DataStax, All Rights Reserved. 21
Splitting up DSE processes into
separate containers
Integration with Kubernetes, Mesos
Deployment model on
public/private clouds
Summary
© DataStax, All Rights Reserved. 22
Configure OS and JVM
Map storage volumes
Avoid bridge/NAT
networking
Test. Test. Test.
Useful Information
Links and information
© DataStax, All Rights Reserved. 24
Datastax.com
http://www.datastax.com/wp-
content/uploads/resources/DataStax-WP-
Best_Practices_Running_DSE_Within_Docker.
pdf
github.com/joeljacobson/dse-docker
academy.datastax.com
Thank you

More Related Content

PDF
Ruby Driver Explained: DataStax Webinar May 5th 2015
ODP
Guaranteeing Storage Performance by Mike Tutkowski
PDF
Building Scalable, Real Time Applications for Financial Services with DataStax
PPT
CloudStack and BigData
PDF
Boyan Krosnov - Building a software-defined cloud - our experience
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Wido den Hollander - building highly available cloud with Ceph and CloudStack
PPTX
Big Data on Cloud Native Platform
Ruby Driver Explained: DataStax Webinar May 5th 2015
Guaranteeing Storage Performance by Mike Tutkowski
Building Scalable, Real Time Applications for Financial Services with DataStax
CloudStack and BigData
Boyan Krosnov - Building a software-defined cloud - our experience
Cassandra on Docker @ Walmart Labs
Wido den Hollander - building highly available cloud with Ceph and CloudStack
Big Data on Cloud Native Platform

What's hot (20)

PPTX
mParticle's Journey to Scylla from Cassandra
PPTX
Introducing DataStax Enterprise 4.7
PPTX
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
PDF
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
PPTX
Build public private cloud using openstack
PDF
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
PPTX
State of the Container Ecosystem
PDF
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
PDF
Dag Sonstebo - CloudStack usage service
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
PPTX
Paul Angus - CloudStack Container Service
PPTX
Introduction to Container Storage Interface (CSI)
PPTX
Cloudian HyperStore 'Forever Live' Storage Platform
PDF
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
PDF
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
PPTX
Stratoscale Latest and Greatest
PDF
Keeping your application’s latency SLAs no matter what
PDF
Design Choices for Cloud Data Platforms
PDF
KubeCon_NA_2021
PPT
How to Protect Big Data in a Containerized Environment
mParticle's Journey to Scylla from Cassandra
Introducing DataStax Enterprise 4.7
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Build public private cloud using openstack
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
State of the Container Ecosystem
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Dag Sonstebo - CloudStack usage service
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Paul Angus - CloudStack Container Service
Introduction to Container Storage Interface (CSI)
Cloudian HyperStore 'Forever Live' Storage Platform
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Stratoscale Latest and Greatest
Keeping your application’s latency SLAs no matter what
Design Choices for Cloud Data Platforms
KubeCon_NA_2021
How to Protect Big Data in a Containerized Environment
Ad

Viewers also liked (20)

PDF
Building a Digital Bank
PDF
Cassandra and Docker Lessons Learned
PDF
Bucket List Item #1246
PDF
Introduction To Docker
PPT
CBD NOW - First Digital Only Bank UAE
PPTX
Cassandra Metrics
PDF
Everyday Bank: A Journey to Digital Transformation
PDF
The Journey to Digital Transformation with Touch Bank
PDF
Cassandra and docker
PDF
Cassandra Tutorial
PPTX
Cassandra via-docker
PDF
Designing the future bank for the digital era
PDF
DataStax: Dockerizing Cassandra on Modern Linux
PPTX
mBank - the most design-driven digital bank in the world - NetFinance, Miami ...
PDF
Docker Container Orchestration
PPTX
Building blocks of e-commerce sites
PDF
Building a Digital Transformation Roadmap
PDF
Cassandra Compression and Performance Evaluation
PDF
Developing a Roadmap for Digital Transformation
PPT
Digital Transformation: What it is and how to get there
Building a Digital Bank
Cassandra and Docker Lessons Learned
Bucket List Item #1246
Introduction To Docker
CBD NOW - First Digital Only Bank UAE
Cassandra Metrics
Everyday Bank: A Journey to Digital Transformation
The Journey to Digital Transformation with Touch Bank
Cassandra and docker
Cassandra Tutorial
Cassandra via-docker
Designing the future bank for the digital era
DataStax: Dockerizing Cassandra on Modern Linux
mBank - the most design-driven digital bank in the world - NetFinance, Miami ...
Docker Container Orchestration
Building blocks of e-commerce sites
Building a Digital Transformation Roadmap
Cassandra Compression and Performance Evaluation
Developing a Roadmap for Digital Transformation
Digital Transformation: What it is and how to get there
Ad

Similar to Scaling DataStax in Docker (20)

PDF
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
PDF
Docker Containers- Data Engineers' Arsenal.pdf
PDF
01282016 Aerospike-Docker webinar
PDF
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
PDF
Using Docker For Development
PPTX
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
PDF
Cloud Stack with Bare Metal, presented in Apache Con Europe 2016
PDF
EDB Postgres with Containers
 
PPTX
Best practices: running high-performance databases on Kubernetes
PPTX
There's More to Docker than the Container: The Docker Platform - Kendrick Col...
PPTX
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
 
PDF
Five Lessons in Distributed Databases
PPTX
Hadoop Technical Presentation
PDF
DDN Product Update from SC13
PDF
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
PPTX
OpenEBS Technical Workshop - KubeCon San Diego 2019
PDF
OpenEBS CAS SDC India - 2018
PDF
Zero-to-Hero: Running Postgres in Kubernetes
 
PPTX
Operating Kubernetes at Scale (Australia Presentation)
PDF
All Things Containers - Docker, Kubernetes, Helm, Istio, GitOps and more
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Docker Containers- Data Engineers' Arsenal.pdf
01282016 Aerospike-Docker webinar
002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...
Using Docker For Development
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
Cloud Stack with Bare Metal, presented in Apache Con Europe 2016
EDB Postgres with Containers
 
Best practices: running high-performance databases on Kubernetes
There's More to Docker than the Container: The Docker Platform - Kendrick Col...
Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day
 
Five Lessons in Distributed Databases
Hadoop Technical Presentation
DDN Product Update from SC13
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
OpenEBS Technical Workshop - KubeCon San Diego 2019
OpenEBS CAS SDC India - 2018
Zero-to-Hero: Running Postgres in Kubernetes
 
Operating Kubernetes at Scale (Australia Presentation)
All Things Containers - Docker, Kubernetes, Helm, Istio, GitOps and more

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
PDF
Designing a Distributed Cloud Database for Dummies
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PDF
How to Evaluate Cloud Databases for eCommerce
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
PPTX
Datastax - The Architect's guide to customer experience (CX)
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Is Your Enterprise Ready to Shine This Holiday Season?
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Datastax - The Architect's guide to customer experience (CX)
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?

Scaling DataStax in Docker

  • 2. How it started © DataStax, All Rights Reserved. 2 Internal project at dotCloud Pivoted to Docker Inc. Execution using libcontainer Huge adoption
  • 3. What is Docker? and why is it important?
  • 4. 3 key concepts © DataStax, All Rights Reserved. 4 Images Registries Containers
  • 5. Example Dockerfile image © DataStax, All Rights Reserved. 5
  • 6. Why are containers important? © DataStax, All Rights Reserved. 6 Speeding up application development Better resource utilization Mobility Faster provisioning Microservices
  • 7. Why are containers important? © DataStax, All Rights Reserved. 7 WEB UI BILLINGCUSTOMER MYSQL EXT SERVICE DB ADAPTER PAYMENTS SERVICE X SERVICE YREST API EXT SERVICE
  • 8. Why are containers important? © DataStax, All Rights Reserved. 8 WEB UI BILLING REST API CUSTOMER REST API CASSANDRA SPARKSOLR PAYMENTS REST API SERVICE X REST API SERVICE Y REST API EXT SERVICEEXT SERVICE
  • 9. Why are containers important? © DataStax, All Rights Reserved. 9
  • 10. Why are containers important? © DataStax, All Rights Reserved. 10
  • 12. Why are containers important? © DataStax, All Rights Reserved. 12 Build once, deploy anywhere Flexibility for sharing binaries and libraries across applications Process of managing, maintaing and deploying turn key Officially supported since DSE 4.8
  • 13. DSE processes © DataStax, All Rights Reserved. 13 Core DSE JVM One or more Spark executor processes Single Spark worker process Multiple processes for the Hadoop stack Ad-hoc process (Spark job server, SparkSQL, CLI etc.) OpsCenter agent
  • 14. DataStax Enterprise configuration © DataStax, All Rights Reserved. 14 Cassandra configuration (seeds, cluster_name etc) Where to manage Cassandra data Optimal JVM heap size Optimal garbage collector
  • 15. DataStax Enterprise configuration © DataStax, All Rights Reserved. 15 Default capability limits of Docker break mlockall Add –XX:+AlwaysPreTouch to the JVM arguments ulimits inherited from Docker daemon Disable swap on host OS
  • 16. Networking © DataStax, All Rights Reserved. 16 Default networking (via Linux bridge) not recommended Instead use docker run –net=host Use pipework or weave for consistent IP addresses
  • 17. Storage © DataStax, All Rights Reserved. 17 Everything in /var/lib/cassandra; commitlog saved_caches data directories Use supported filesystem
  • 18. Storage © DataStax, All Rights Reserved. 18 Data volumes can be shared and reused amoung containers Changes are made directly Changes to a volume will not be included when you update an image Data volumes persist if container is deleted
  • 19. Storage © DataStax, All Rights Reserved. 19 docker run –v <some root dir>/<dse_image_name>-data:/data –v <some root dir>/<dse_image_name>-conf:/conf –v <some root dir>/<dse_image_name>-logs:/logs –d <dse_image_name>
  • 21. Futures © DataStax, All Rights Reserved. 21 Splitting up DSE processes into separate containers Integration with Kubernetes, Mesos Deployment model on public/private clouds
  • 22. Summary © DataStax, All Rights Reserved. 22 Configure OS and JVM Map storage volumes Avoid bridge/NAT networking Test. Test. Test.
  • 24. Links and information © DataStax, All Rights Reserved. 24 Datastax.com http://www.datastax.com/wp- content/uploads/resources/DataStax-WP- Best_Practices_Running_DSE_Within_Docker. pdf github.com/joeljacobson/dse-docker academy.datastax.com

Editor's Notes

  • #2: Hi, I’m Joel, I like cats.
  • #3: Dotcloud were a paas provider who built Docker to automate the deployment  containers Docker containers use an execution environment called libcontainer, which is an interface to various Linux kernel isolation features, like namespaces and cgroups. Docker gives you this level of abstraction. Namespaces and cgroups are two of the main kernel technologies most of the new trend on software containerization Docker rides on. To put it simple, cgroups are a metering and limiting mechanism, they control how much of a system resource (CPU, memory) you can use. On the other hand, namespaces limit what you can see. Thanks to namespaces processes have their own view of the system’s resources. This architecture allows for multiple containers to be run in complete isolation from one another while sharing the same Linux kernel. Because a Docker container instance doesn’t require a dedicated OS, it is much more portable and lightweight than a virtual machine.
  • #4: I would like to spend a few minutes discussing what docker is, most of you would have at least heard of it, and I’d like to talk about why it is important.
  • #5: An image is the build component of a container. It is a read-only template from which one or more container instances can be launched. Conceptually, it’s similar to an AMI. Registries are used to store images. Registries can be local or remote. When we launch a container, Docker first searches the local registry for the image. If it’s not found locally, then it searches a public remote registry, called DockerHub. Finally, a container is a running instance of an image. Docker uses containers to execute and run the software contained in the image
  • #6: Here is an example docker Dockerfile, which includes all of the instructions for building the Docker images. Take the time to get this right from the beginning.
  • #7: Developers can add new application features more quickly by taking advantage of automated building, testing, integration, and packaging - at the speed of containers. Idle containers don’t take up computing, memory, and I/O resources. You can move workload between private and public clouds more quickly. Instead of moving gigabytes between clouds, you can move megabytes. Containerized applications can boot and restart in seconds, compared to minutes for virtual machines Instead of building one application (monolithic architecture), developers build a suite of components, called microservices, which come together over the network. Each component is written in the best programming language for the task, and each component can be deployed and scaled independently of one another.
  • #8: At the core of the application is the business logic, which is implemented by modules that define services, domain objects, and events. Surrounding the core are adapters that interface with the external world. Examples of adapters include database access components, messaging components that produce and consume messages, and web components that either expose APIs or implement a UI. Despite having a logically modular architecture, the application is packaged and deployed as a monolith. 
  • #9: Many organizations, such as eBay, and Netflix, have adopted a Microservices archtecture pattern. Instead of building a single, monolithic application, the idea is to split your application into set of smaller, interconnected services. Each microservice is a mini-application that has its own architecture consisting of business logic along with various adapters. Some microservices would expose an API that’s consumed by other microservices or by the application’s clients. Other microservices might implement a web UI. At runtime, each instance is often a cloud VM or a Docker container.
  • #10: Looking at the evolutions of deployment and application. 1 day to 15 minutes to 10 seconds. Only one host OS to manage. Smalll learining curcve.
  • #11: Rise of the container between 2013 – 2015; spearheaded by docker.
  • #14: A typical DSE node runs the following processes on a single instance within the cluster: A single core DSE JVM – including Apache Cassandra, integrated DSE Search, and Spark Master (for HA) One or more Spark executor processes A single Spark Worker process Multiple processes for the integrated Hadoop stack Multiple processes which may be started in an adhoc manner (e.g. Spark Job server, SparkSQL CLI, etc.) A single OpsCenter agent responsible for monitoring all processes on that DSE instance Container 2 - All the JVMs running on a single DSE node (uniformly deployed across the each machine within the cluster) The OpsCenter daemon is (logically) separate from the cluster and there is usually one7 instance for the entire deployment8.
  • #15: To provide cluster specific configuration, the following environment variables should be provided via the Docker run command: a. CLUSTER_NAME: the name of the cluster to create/connect to b. SEEDS:thecomma-separatedlistofseedIPaddresses, e.g. SEEDS=127.0.0.2,127.0.0.3
  • #16: mlockall to prevent swapping and page faults. The simplest workaround is to add -XX:+AlwaysPreTouch to the JVM arguments and disable swap on the host OS. All containers by default inherit ulimits from the Docker daemon. DSE containers should have them set to unlimited or reasonably high values (for e.g. for mem_locked_memory and max_memory_size). *Check*
  • #17: Docker’s default networking (via Linux bridge) is not recommended for the production use as it slows down networking considerably, up to 50% Development and testing benefit from running DSE clusters on a single Docker host and for such scenarios the default networking is just fine Instead, use the host networking (docker run --net=host) or a plugin that can manage IP ranges across clusters of hosts. The host networking limits the number of DSE nodes per a Docker host to one, but this is the recommended configuration to use in production. Using Docker doesn’t mean have it all on a host – think about the disks! . Use pipework or Weave if consistent IP address allocation is needed.
  • #18: Data volumes are required for the commitlog, saved_caches, and data directories (everything in /var/lib/cassandra). The data volume must use a supported file system (usually xfs or ext4).
  • #19: A data volume is a specially-designated directory within one or more containers that bypasses the Union filesystem. Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization. Data volumes can be shared and reused among containers. Changes to a data volume are made directly. Changes to a data volume will not be included when you update an image. Data volumes persist even if the container itself is deleted.
  • #21: All of this works great for test/dev/prop environments.
  • #22: Deploying DSE within Docker isn’t trivial, but with adequate guidance and pre-production validation, it’s not that difficult. As the container ecosystem evolves, it is expected that future DSE releases will have additional guidelines to make the most of DSE installations under Docker. Some future areas that DataStax is investigating are:  Further splitting up of DSE processes into separate containers (e.g. running Spark executors and DSE core JVM within a single container, and all other DSE processes within a separate containers)  Integration of container based deployment with workload management infrastructure components such as Kubernetes, Mesos, etc.  Enabling the deployment model on a variety of public and private clouds
  • #23: using volumes for the data storage is a must for durability and performance  avoiding the bridge/NAT networking and run containers with --net=host. This provides the simplest way to connect to the outside world and guarantees a stable IP address to the guest. Host networking also has the lowest overhead performance-wise so your cluster should perform nearly as well as it does on bare metal.
  • #25: DataStax acknowledges that containers have rapidly become one of the building blocks, guidelines and examples to reduce the amount of time required to run DSE in Docker.