SlideShare a Scribd company logo
AdamCloud (Part 2):
Lessons learned from
Docker
Sébastien Bonami, IT Engineering Student
and
David Lauzon, Researcher
École de technologie supérieure (ÉTS)
Presented at Big Data Montreal #32 + DevOps Montreal
January 12th 2015
1
Plan
● AdamCloud Project
● Docker Introduction
● Lessons learned from Docker
o Dockerfiles
o Data Storage
o Networking
o Monitoring
● Conclusion
2
AdamCloud Project
Brief overview
3
AdamCloud Goal
● Main goal: provide a portable infrastructure
for processing genomics data
● Requirements:
o A series of softwares must be chained in a pipeline
o Centralize configuration for multiple environments
o Simple installation procedure for new students
4
Potential solution
● For genomics: Adam project developed at
Berkeley AmpLab
o Snap, Adam, Avocado
o (uses Spark, HDFS)
● For infrastructure:
o Docker ?
5
Adam Genomic Pipeline
6
Fastq
File
(up to
250 GB)
Sam
File
Parquet
File
Parquet
File
(~10MB)
Sequencer
Machine
Snap AvocadoAdam
Hardware
AmpLab
Genomics
Projects
File
Formats
AdamCloud - Environments
3 different environments
● Development (laptop)
o All services in 1 single host
● Demo
o Mac mini cluster
● Testing
o ÉTS servers (for larger genomes)
7
Docker Introduction
From now on, we will talk about Docker leaving AdamCloud
aside.
For simplicity, we chose to use MySQL to demonstrate some
examples about learning Docker.
8
Docker Introduction - Key Concepts
Dockerfile Image
Docker
Hub
Registry
Internet
Container
build
push
pull
run commit
Text file
Size = ~ KB
Installation &
config instructions
Composed of many read-only layers
Typical size = ~ hundred(s) MB
Can have multiple versions (akin Git tags)
Shares the image’s read-only layers
1 private writeable layer (copy-on-write)
Initial size = 0 bytes
Can be stopped, started, paused, etc.
Free public hosting
9
Docker Introduction - How does it work?
Docker
Daemon Container 1
Host OS Kernel
Docker
Storage
Backend Container 2 ...
Hardware
Setups & manage the LXC containers.
Stores the image and container’s data layers
locally.
10
Lesson 0:
Playing with Docker
11
Lesson 0: Playing with Docker
$ sudo sh -c "echo deb https://get.docker.com/ubuntu docker main >
/etc/apt/sources.list.d/docker.list"
$ sudo apt-get update && sudo apt-get install -y --force-yes lxc-docker
12
$ docker run -ti --rm=true ubuntu bash
root@e0a1dad9f7fa:/# whoami; hostname
root
e0a1dad9f7fa
Creates a new interactive (-i)
container with a tty (-t) from the image
ubuntu, starts a bash shell, and
automatically remove the container
when it exits (--rm=true)
Install Docker
You are now “inside” the container
with the id e0a1dad9f7fa
Dockerfiles
13
Dockerfiles - MySQL Example (1/3)
$ mkdir mysql-docker/
$ vi mysql-docker/Dockerfile
# Contents of file mysql-docker/Dockerfile [1]
# Pull base image (from Docker Hub)
FROM ubuntu:14.04
# Install MySQL
RUN apt-get update
RUN apt-get install -y mysql-server
[1] Source: https://registry.hub.docker.com/u/dockerfile/mysql/dockerfile/ 14
Dockerfiles - MySQL Example (2/3)
# Contents of file mysql-docker/Dockerfile (continued)
# Configure MySQL: listening interface, log error, etc.
RUN sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf
RUN sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf
RUN echo "mysqld_safe &" > /tmp/config
RUN echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config
RUN echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT
OPTION;'" >> /tmp/config
RUN bash /tmp/config && rm -f /tmp/config
15
Dockerfiles - MySQL Example (3/3)
# Contents of file mysql-docker/Dockerfile (continued)
# Define default command
CMD ["mysqld_safe"]
# Expose guest port. Not required, but facilitates management
# NEVER expose the public port in the Dockerfile
EXPOSE 3306
16
Dockerfiles - Building MySQL image
$ docker build -t mysql-image mysql-docker/
Sending build context to Docker daemon 2.56 kB
[...]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
[...]
17
Lesson 1:
Dialog-less installs
18
Lesson 1: Dialog-less installs
# Contents of file mysql/Dockerfile (showing differences)
[...]
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y mysql-server
[...]
$ docker run -d mysql-image
5f3695d8f5e4dfc836156f645dbf6b647e264e58a25b4e2a9724b7522591b9bc
$ docker build -t mysql-image mysql-docker/
[...]
Successfully built d5cb85b206a4
That’s our image ID
That’s our container ID
(we can use a prefix as long as it is unique)
19
Lesson 1: Testing the connectivity
$ mysql -uroot -h 172.17.0.102 -e "SHOW DATABASES;"
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
+--------------------+
$ docker inspect 5f3695d8f5e4 |grep IPAddress |cut -d'"' -f4
172.17.0.102
Finding the IP address
of our container
From the host, we can now connect to
our MySQL box inside the container
using the Docker network bridge.
20
Lesson 2:
Layers
21
Lesson 2: Layers - Docker History
$ docker history mysql-image
IMAGE CREATED CREATED BY SIZE
d5cb85b206a4 41 minutes ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B
a3fcf7ad0e46 41 minutes ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B
e495928f5148 41 minutes ago /bin/sh -c bash /tmp/config && rm -f /tmp/con 5.245 MB
e81232406a48 41 minutes ago /bin/sh -c echo "mysql -e 'GRANT ALL PRIVILEG 131 B
3ed871742259 41 minutes ago /bin/sh -c echo "mysqladmin --silent --wait=3 59 B
7383675c6559 41 minutes ago /bin/sh -c echo "mysqld_safe &" > /tmp/config 14 B
dfa40ac0f314 45 minutes ago /bin/sh -c sed -i 's/^(log_errors.*)/# 1/ 3.509 kB
01a7a7904f29 45 minutes ago /bin/sh -c sed -i 's/^(bind-addresss.*)/# 3.507 kB
2709eaa06d42 About an hour ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 130.2 MB
6ca9716f2565 About an hour ago /bin/sh -c apt-get update 20.8 MB
86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B
dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B
78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB
3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B
61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB
5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB
511136ea3c5a 19 months ago 0 B
17 layers !
Every Docker instruction
creates a layer.
200 MB for Ubuntu
20 MB for apt-get update
130 MB for installing
MySQL
22
Time to
cleanup ?
23
Lesson 2: Layers - What are they?
● Think of a layer as directory of files (or blocks)
● All these “physical” layers are combined into a
“logical” file system for each individual container
o Union file system
o Copy-on-write
o Like a stack: higher layers may override lower layers
24
Lesson 2: Layers - Purpose (1/4)
● Blazing fast container instantiation
o To create a new instance from an image, Docker simply creates a
new empty read-write layer
Great, but we could achieve this goal
with 1 single layer per image + 1 layer
per container
Why 17 layers ?
25
Lesson 2: Layers - Purpose (2/4)
● Faster image modification
o Changing/adding a Dockerfile instruction causes only the modified
layer(s) and those following it to be rebuilt
How often do you plan on changing
your Dockerfiles ?
26
Lesson 2: Layers - Purpose (3/4)
● Faster distribution
o when distributing the image (via docker push) and downloading it
(via docker pull, or docker build), only the affected layer(s)
are sent.
27
Lesson 2: Layers - Purpose (4/4)
● Minimize disk space
o All the containers located on the same Docker host and parent of
the same image hierarchy will share layers.
o Ubuntu Docker image is 200 MB
o 1000 containers based on Ubuntu only takes 200 MB total
(+ the additional packages they require)
Will you have multiple variants (config and/or versions) of MySQL on
the same machine ?
How many MySQL servers will you have on the same machine ?
28
Lesson 2: Layers - Layer Genocide
$ cp -r mysql-docker/ mysql-docker-grouped
$ vi mysql-docker-grouped/Dockerfile
In this example, all our MySQL containers will be the same.
Therefore, we’ll only be needing 1 single layer.
29
Lesson 2: Layers - Combine multiple RUN instructions
# Contents of file mysql-docker-grouped/Dockerfile
[...]
RUN apt-get update && 
apt-get install -y mysql-server && 
sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf && 
sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf && 
echo "mysqld_safe &" > /tmp/config && 
echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config && 
echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT
OPTION;'" >> /tmp/config && 
bash /tmp/config && rm -f /tmp/config
[...]
30
Lesson 2: Layers - Docker History
$ docker build -t mysql-image-grouped mysql-docker-grouped/
[...]
Successfully built d5cb85b206a4
$ docker history mysql-image-grouped
IMAGE CREATED CREATED BY SIZE
11ccd4cc6c82 About an hour ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B
59c9467d3360 About an hour ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B
0993d316210d About an hour ago /bin/sh -c apt-get update && DEBIAN_FRONT 151 MB
86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B
dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B
78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB
3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B
61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB
5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB
511136ea3c5a 19 months ago 0 B
Freed 7 layers !
Our Docker now only
adds 3 layers on top of
the base image:
RUN, CMD, EXPOSE
31
Lesson 3:
Staying fit
32
Lesson 3: Staying fit - Compacting layers
$ cp -r mysql-docker-grouped/ mysql-docker-cleaned
$ vi mysql-docker-cleaned/Dockerfile
Some commands, like apt-get update, creates some
temporary files, which can be safely discarded after use.
We can save space and create smaller images by deleting
those files.
33
Lesson 3: Staying fit - Removing temporary files
# Contents of file mysql-docker-cleaned/Dockerfile (partial)
[...]
RUN apt-get update && 
apt-get install -y mysql-server && 
rm -fr /var/lib/apt/lists/* && 
[...]
$ docker build -t mysql-image-cleaned mysql-docker-cleaned/
[...]
Successfully built d5cb85b206a4
Remember: you’ll need to run
apt-get update again next time
you want to install something
34
Lesson 3: Staying fit - Local Docker images
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
mysql-image-cleaned latest 032798b8e064 2 hours ago 322.8 MB
mysql-image-grouped latest 11ccd4cc6c82 2 hours ago 343.6 MB
mysql-image latest d5cb85b206a4 3 hours ago 348.9 MB
ubuntu 14.04 86ce37374f40 6 weeks ago 192.7 MB
The cleaned image occupies 17% less space than the original
mysql-image (it’s a virtual size) [1].
MySQL is small; the impact can be much bigger for other
applications.
[1] ((348-192) - (322-192)) / (348-192) = 17%
35
Lesson 3: Staying fit - Smallest Docker base images
Image:Tag Size
scratch 0.0 B
busybox:ubuntu-14.04 5.6 MB
debian:7 85.0 MB
ubuntu:14.04 192.7 MB
centos:7 210.0 MB
fedora:21 241.3 MB
36
Lesson 3: Staying fit - docker diff
● Show differences between container and the image
o Useful to see which files have been modified/created when writing
your Dockerfile
37
Lesson 4:
Fixed as “worksforme”
38
Lesson 4: Reproducibility - Package
version
● Your Dockerfile may build a different image in a few
months than today’s image
RUN apt-get install -y mysql-server
RUN apt-get install -y mysql-server=5.5.40-0ubuntu0.14.04.1
Specify the package version explicitly is better
39
Lesson 4: Reproducibility - Dependency
version
RUN apt-get install -y libaio1=0.3.109-4 mysql-common=5.5.40-0ubuntu0.14.04.1
libmysqlclient18=5.5.40-0ubuntu0.14.04.1 libwrap0=7.6.q-25 libdbi-perl=1.630-
1 libdbd-mysql-perl=4.025-1 libterm-readkey-perl=2.31-1 mysql-client-core-
5.5=5.5.40-0ubuntu0.14.04.1 mysql-client-5.5=5.5.40-0ubuntu0.14.04.1 mysql-
server-core-5.5=5.5.40-0ubuntu0.14.04.1 psmisc=22.20-1ubuntu2 mysql-server-
5.5=5.5.40-0ubuntu0.14.04.1 libhtml-template-perl=2.95-1 mysql-server=5.5.40-
0ubuntu0.14.04.1 tcpd=7.6.q-25
Previous solution should be enough…
But if you need higher guarantee of reproducibility:
A. Specify the package version for the dependencies as well
B. And / or use a cache proxy, maven proxy, etc.
40
Lesson 5:
Prototry
A quick and dirty attempt to develop a working
model of software. The original intent is to
rewrite the ProtoTry, using lessons learned, but
schedules never permit. Also known as legacy
code. [1]
41[1] Michael Duell, Ailments of Unsuitable Project-Disoriented Software, http://www.fsfla.org/~lxoliva/fun/prog/resign-patterns
Lesson 5: Prototry - Docker Hub Registry
● Before writing your own Dockerfile, try a build from
someone else
o https://registry.hub.docker.com/
o Official builds
o Trusted (automated) builds
o Other builds
For advanced setup,
see these images:
● jenkins
● dockerfile/java
42
Lesson 5: Prototry - Using other people images
PROs CONs
● Faster to get started
● Better tested
● You may end up with a mixed stack to
support
○ e.g. different versions of Java
○ Ubuntu vs Debian vs CentOS
● Not all sources use all the best practices
described in this presentation
For medium - large organisations / heavy Docker users:
Best to fork and write your own Dockerfiles
43
Lesson 5: Prototry - Potential image hierarchy
FROM ubuntu:14.04
# Organization-wide tools (e.g. vim, etc.)
myorg-base
myorg-java
FROM myorg-base:1.0
# OpenJDK | OracleJDK
myorg-python
FROM myorg-base:1.0
# Install Python 2.7
python-app1
FROM myorg-python:2.7
# ...
java-app3
FROM myorg-java:oracle-jdk7
# ...
python-app2
FROM myorg-python:2.7
# ...
44
Lesson 6:
Volume Design Patterns
45
● Nothing to do - that’s the default Docker behavior
o Application data is stored along with the
infrastructure (container) data
● If the container is restarted, data is still there
● If the container is deleted, data is gone
Lesson 6: Inside Container Pattern
46
Lesson 6: Host Directory Pattern
● A directory on the host
● To share data across containers on the
same host
● For example, put the source code on the
host and mount it inside the container with
the “-v” flag
47
Lesson 6: Data-Only Container Pattern
● Run on a barebone image
● VOLUME command in the Dockerfile or “-v”
flag at run
● Just use the “--volumes-from” flag to
mount all the volumes in another container
48
Lesson 7:
Storage backend
49
Lesson 7: Storage backend - Overview
● Options:
o VFS
o AUFS (default, docker < 0.7)
o DeviceMapper
 Direct LVM
 Loop LVM (default in Red Hat)
o Btrfs (experimental)
o OverlayFS (experimental)
Red Hat[1] says the
fastest backends are:
1. OverlayFS
2. Direct LVM
3. BtrFS
4. Loop LVM
Lookup your current Docker backend
$ docker info |grep Driver
[1] http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
50
Lesson 7: Storage backend - VFS & AUFS
● Both are very basic (NOT for PROD)
● Both store each layer as a separate directory with
regular files
● VFS
o No Copy-on-Write (CoW)
● AUFS
o Original Docker backend
o File-level Copy-on-Write (CoW)
VFS & AUFS can be
useful to understand how
Docker works
Do not use in PROD
51
Lesson 7: Storage backend - DeviceMapper (1/2)
● Already used by linux kernel for LVM2 (logical volume management)
o Block-level Copy-on-Write (CoW)
o Unused blocks do not use space
● Uses thin pool provisioning to implement CoW snapshots
o Each pool requires 2 block devices: data & metadata
o By default, uses loop back mounts on sparse regular files
# ls -alhs /var/lib/docker/devicemapper/devicemapper
506M -rw-------. 1 root root 100G Sep 10 20:15 data
1.1M -rw-------. 1 root root 2.0G Sep 10 20:15 metadata
Loop LVM
52
Lesson 7: Storage backend - DeviceMapper (2/2)
● In production:
o Use real block devices! (Direct LVM)
o Ideally, data & metadata each on its own spindle
o Additional configuration is required
Docker does not
do that for you
53
Lesson 7: Storage backend - Btrfs & OverlayFS
Btrfs:
● Requires /var/lib/docker to be on a btrfs file system
● Block-level Copy-on-Write (CoW) using Btrfs’s snapshotting
● Each layer stored as a Btrfs subvolume
● No SELinux
OverlayFS:
● Support page cache sharing
● Lower FS contains the base image (XFS or EXT4)
● Upper FS contains the deltas
● No SELinux
Claims a huge
RAM saving
54
Lesson 8:
Networking
55
Docker
● Ethernet bridge “docker0” created when Docker boots
● Virtual subnet on the host (default: 172.17.42.1/16)
● Each container has a pair of virtual Ethernet interfaces
● You can remove “docker0” and use your own bridge if
you want
56
Weave
Why Weave?
● Docker built-in functionalities don’t provide
a solution for connecting containers on
multiple hosts
● Weave create a virtual network to permit a
distributed environment (common in the
real word)
57
Weave
How does it work?
● Virtual routers establish TCP connections to
each other with a handshake
● These connections are duplex
● Use “pcap” to capture packets
● Exclude traffic between local containers
58
Weave
Weave
Container
Container 1 Container 2 Container 3
Host A
Weave
Container
Container 1 Container 2 Container 3
Host B
59
Weave - images
Image:Tag Size
zettio/weave:0.8.0 11 MB
zettio/weavedns:0.8.0 9.4 MB
zettio/weavetools:0.8.0 3.7 MB
60
Weave - getting started
$ sudo weave launch
$ sudo weave run 10.0.0.1/24 -ti --name ubuntu-01 ubuntu:14.04
$ sudo weave launch weave-01
$ sudo weave run 10.0.0.2/24 -ti --name ubuntu-02 ubuntu:14.04
● First host: weave-01
● Second host: weave-02
Note: “weave run” invokes “docker run -d” (running as a daemon)
Starts the weave router in a container
Starts the weave router in a container and peers it
CIDR notation
61
Weave - testing the connectivity (1/2)
$ sudo weave status
weave router 0.8.0
Our name is 7a:ab:c1:21:f9:3b
Sniffing traffic on &{15 65535 ethwe 56:40:66:0b:a4:c6 up|broadcast|multicast}
MACs:
56:40:66:0b:a4:c6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:39.23846091 +0000 UTC)
7a:ab:c1:21:f9:3b -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.142183122 +0000 UTC)
a2:60:ab:8b:1f:b6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.716414595 +0000 UTC)
7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.204010927 +0000 UTC)
1e:b4:78:1e:dd:23 -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.42594994 +0000 UTC)
Peers:
Peer 7a:ab:c1:21:f9:3b (v1) (UID 17511927952474106279)
-> 7a:5a:98:6e:92:2e [192.168.1.30:47638]
Peer 7a:5a:98:6e:92:2e (v1) (UID 8527109358448991597)
-> 7a:ab:c1:21:f9:3b [192.168.1.195:6783]
Routes:
unicast:
7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e
7a:ab:c1:21:f9:3b -> 00:00:00:00:00:00
broadcast:
7a:ab:c1:21:f9:3b -> [7a:5a:98:6e:92:2e]
7a:5a:98:6e:92:2e -> []
Reconnects:
● First host: weave-01
Connected peers
Virtual interface used by Weave
Containers and
host points
62
Weave - testing the connectivity (2/2)
$ sudo docker attach ubuntu-02
$ ping -c 4 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=4.22 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=1.20 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=1.73 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=2.02 ms
--- 10.0.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3008ms
rtt min/avg/max/mdev = 1.206/2.299/4.226/1.150 ms
● Second host: weave-02
It pings!
63
Lesson 9:
Monitoring
64
cAdvisor
● New tool from Google
● Specialized for Docker containers
PROs CONs
● Great web interface
● Docker image available (18 MB)
to try it in seconds
● Stats can be export to InfluxDB
(data mining to do)
● Needs more maturity
● Missing metrics
○ No data for Disk I/O
● Only keep last 60 metrics locally (not
configurable)
65
Monitoring with cAdvisor
Web interface →
66
Conclusion
67
AdamCloud - The next steps
● Docker + Weave = success
● Open-source the project and merge it
upstream into the AmpLab genomic
pipeline.
● Support for Amazon EC2 environments
● Improve administration of Docker
containers
o Monitoring, orchestration, provisioning
68
Docker Conclusion
● 1 Docker container = 1 background daemon
● Container isolation is not like a VM
● Use correct versions of images and keep a trace
● Docker is less interesting for multi-tenants use cases (no SSH in the
containers)
● Docker is FAST and VERSATILE
● cAdvisor is an interesting monitoring tool, but limited
● Docker is perfect for short lived apps (no long term data persistence)
● Data intensive apps should review the Docker docs carefully. Start
looking at Direct LVM.
69
References
● Jonathan Bergknoff - Building good docker images, http://jonathan.bergknoff.com/journal/building-good-
docker-images
● Michael Crosby - Dockerfile Best Practices, http://crosbymichael.com/dockerfile-best-practices.html
● Michael Crosby - Dockerfile Best Practices - take 2, http://crosbymichael.com/dockerfile-best-practices-take-
2.html
● Nathan Leclaire - The Dockerfile is not the source of truth for your image,
http://nathanleclaire.com/blog/2014/09/29/the-dockerfile-is-not-the-source-of-truth-for-your-image/
● Docker Documentation - Understanding Docker, https://docs.docker.com/introduction/understanding-docker/
● Docker Documentation - Docker User Guide, https://docs.docker.com/userguide/
● Docker Documentation - Dockerfile Reference, https://docs.docker.com/reference/builder/
● Docker Documentation - Command Line (CLI) User Guide,
https://docs.docker.com/reference/commandline/cli/
● Docker Documentation - Advanced networking, http://docs.docker.com/articles/networking/
● Project Atomic - Supported Filesystems, http://www.projectatomic.io/docs/filesystems/
● Red Hat Developer Blog - Comprehensive Overview of Storage Scalability in Docker,
http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
● Linux Kernel Documentation - DeviceMapper Thin Provisioning,
https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt
● weave - the Docker network, http://zettio.github.io/weave/
● GitHub - google/cadvisor, https://github.com/google/cadvisor
70

More Related Content

PPTX
BDM29: AdamCloud Project - Part I
PDF
Debugging & Tuning in Spark
PDF
Introduction to Spark Internals
PDF
DTCC '14 Spark Runtime Internals
ODP
Apache Spark Internals
PDF
Why your Spark job is failing
PDF
Introduction to spark
PPTX
Apache Spark overview
BDM29: AdamCloud Project - Part I
Debugging & Tuning in Spark
Introduction to Spark Internals
DTCC '14 Spark Runtime Internals
Apache Spark Internals
Why your Spark job is failing
Introduction to spark
Apache Spark overview

What's hot (20)

PDF
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
PPTX
Why your Spark Job is Failing
PDF
Apache Spark RDDs
PDF
PDF
Apache Spark
PPTX
20130912 YTC_Reynold Xin_Spark and Shark
PDF
Introduction to Apache Spark
PPTX
Apache Spark RDD 101
PDF
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
PPT
Linux containers and docker
PDF
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
PPT
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
PDF
Introduction to Spark
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PPTX
Terraform Modules Restructured
PPTX
Apache spark Intro
PDF
Spark overview
PPTX
Tuning and Debugging in Apache Spark
PDF
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
BDM25 - Spark runtime internal
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Why your Spark Job is Failing
Apache Spark RDDs
Apache Spark
20130912 YTC_Reynold Xin_Spark and Shark
Introduction to Apache Spark
Apache Spark RDD 101
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
Linux containers and docker
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Introduction to Spark
Apache Spark in Depth: Core Concepts, Architecture & Internals
Terraform Modules Restructured
Apache spark Intro
Spark overview
Tuning and Debugging in Apache Spark
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
BDM25 - Spark runtime internal
Ad

Viewers also liked (20)

PPTX
BDM26: Spark Summit 2014 Debriefing
PPTX
BDM8 - Near-realtime Big Data Analytics using Impala
PPTX
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
PPTX
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
PDF
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
PDF
หนังสือภาษาไทย Spark Internal
PDF
Unified Big Data Processing with Apache Spark
PDF
QCon2016--Drive Best Spark Performance on AI
PPTX
Introduction to Spark - DataFactZ
PDF
Fun[ctional] spark with scala
POTX
Apache Spark Streaming: Architecture and Fault Tolerance
PPTX
Resilient Distributed DataSets - Apache SPARK
PPTX
Apache Spark
PDF
Cassandra Data Maintenance with Spark
PDF
Make 2016 your year of SMACK talk
PDF
Apache Spark: What's under the hood
ODP
Spark Deep Dive
PDF
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
PDF
Spark and the Future of Advanced Analytics by Thomas Dinsmore
PDF
Data processing platforms with SMACK: Spark and Mesos internals
BDM26: Spark Summit 2014 Debriefing
BDM8 - Near-realtime Big Data Analytics using Impala
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
หนังสือภาษาไทย Spark Internal
Unified Big Data Processing with Apache Spark
QCon2016--Drive Best Spark Performance on AI
Introduction to Spark - DataFactZ
Fun[ctional] spark with scala
Apache Spark Streaming: Architecture and Fault Tolerance
Resilient Distributed DataSets - Apache SPARK
Apache Spark
Cassandra Data Maintenance with Spark
Make 2016 your year of SMACK talk
Apache Spark: What's under the hood
Spark Deep Dive
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Data processing platforms with SMACK: Spark and Mesos internals
Ad

Similar to BDM32: AdamCloud Project - Part II (20)

PDF
手把手帶你學Docker 03042017
PDF
Introduction to Docker
PDF
時代在變 Docker 要會:台北 Docker 一日入門篇
PDF
手把手帶你學 Docker 入門篇
PDF
Docker workshop 0507 Taichung
PPTX
Dockerizing a Symfony2 application
PDF
桃園市教育局Docker技術入門與實作
PPTX
Docker for Web Developers: A Sneak Peek
PPTX
Real World Experience of Running Docker in Development and Production
PDF
Docker for mere mortals
PPTX
Introduction to Docker
PDF
Docker Essentials Workshop— Innovation Labs July 2020
PPTX
ABCs of docker
PDF
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PDF
Challenges of container configuration
ODP
Linux containers & Devops
PDF
Shipping Applications to Production in Containers with Docker
PDF
Puppet at Opera Sofware - PuppetCamp Oslo 2013
PDF
Docker dDessi november 2015
PDF
Introduction to Docker
手把手帶你學Docker 03042017
Introduction to Docker
時代在變 Docker 要會:台北 Docker 一日入門篇
手把手帶你學 Docker 入門篇
Docker workshop 0507 Taichung
Dockerizing a Symfony2 application
桃園市教育局Docker技術入門與實作
Docker for Web Developers: A Sneak Peek
Real World Experience of Running Docker in Development and Production
Docker for mere mortals
Introduction to Docker
Docker Essentials Workshop— Innovation Labs July 2020
ABCs of docker
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
Challenges of container configuration
Linux containers & Devops
Shipping Applications to Production in Containers with Docker
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Docker dDessi november 2015
Introduction to Docker

Recently uploaded (20)

PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
top salesforce developer skills in 2025.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPT
JAVA ppt tutorial basics to learn java programming
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
DOCX
The Five Best AI Cover Tools in 2025.docx
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Introduction to Artificial Intelligence
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Essential Infomation Tech presentation.pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
AI in Product Development-omnex systems
PPT
Introduction Database Management System for Course Database
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How Creative Agencies Leverage Project Management Software.pdf
Materi-Enum-and-Record-Data-Type (1).pptx
top salesforce developer skills in 2025.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
JAVA ppt tutorial basics to learn java programming
Materi_Pemrograman_Komputer-Looping.pptx
The Five Best AI Cover Tools in 2025.docx
Softaken Excel to vCard Converter Software.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Introduction to Artificial Intelligence
PTS Company Brochure 2025 (1).pdf.......
Essential Infomation Tech presentation.pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Design an Analysis of Algorithms II-SECS-1021-03
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
VVF-Customer-Presentation2025-Ver1.9.pptx
AI in Product Development-omnex systems
Introduction Database Management System for Course Database
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

BDM32: AdamCloud Project - Part II

  • 1. AdamCloud (Part 2): Lessons learned from Docker Sébastien Bonami, IT Engineering Student and David Lauzon, Researcher École de technologie supérieure (ÉTS) Presented at Big Data Montreal #32 + DevOps Montreal January 12th 2015 1
  • 2. Plan ● AdamCloud Project ● Docker Introduction ● Lessons learned from Docker o Dockerfiles o Data Storage o Networking o Monitoring ● Conclusion 2
  • 4. AdamCloud Goal ● Main goal: provide a portable infrastructure for processing genomics data ● Requirements: o A series of softwares must be chained in a pipeline o Centralize configuration for multiple environments o Simple installation procedure for new students 4
  • 5. Potential solution ● For genomics: Adam project developed at Berkeley AmpLab o Snap, Adam, Avocado o (uses Spark, HDFS) ● For infrastructure: o Docker ? 5
  • 6. Adam Genomic Pipeline 6 Fastq File (up to 250 GB) Sam File Parquet File Parquet File (~10MB) Sequencer Machine Snap AvocadoAdam Hardware AmpLab Genomics Projects File Formats
  • 7. AdamCloud - Environments 3 different environments ● Development (laptop) o All services in 1 single host ● Demo o Mac mini cluster ● Testing o ÉTS servers (for larger genomes) 7
  • 8. Docker Introduction From now on, we will talk about Docker leaving AdamCloud aside. For simplicity, we chose to use MySQL to demonstrate some examples about learning Docker. 8
  • 9. Docker Introduction - Key Concepts Dockerfile Image Docker Hub Registry Internet Container build push pull run commit Text file Size = ~ KB Installation & config instructions Composed of many read-only layers Typical size = ~ hundred(s) MB Can have multiple versions (akin Git tags) Shares the image’s read-only layers 1 private writeable layer (copy-on-write) Initial size = 0 bytes Can be stopped, started, paused, etc. Free public hosting 9
  • 10. Docker Introduction - How does it work? Docker Daemon Container 1 Host OS Kernel Docker Storage Backend Container 2 ... Hardware Setups & manage the LXC containers. Stores the image and container’s data layers locally. 10
  • 12. Lesson 0: Playing with Docker $ sudo sh -c "echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list" $ sudo apt-get update && sudo apt-get install -y --force-yes lxc-docker 12 $ docker run -ti --rm=true ubuntu bash root@e0a1dad9f7fa:/# whoami; hostname root e0a1dad9f7fa Creates a new interactive (-i) container with a tty (-t) from the image ubuntu, starts a bash shell, and automatically remove the container when it exits (--rm=true) Install Docker You are now “inside” the container with the id e0a1dad9f7fa
  • 14. Dockerfiles - MySQL Example (1/3) $ mkdir mysql-docker/ $ vi mysql-docker/Dockerfile # Contents of file mysql-docker/Dockerfile [1] # Pull base image (from Docker Hub) FROM ubuntu:14.04 # Install MySQL RUN apt-get update RUN apt-get install -y mysql-server [1] Source: https://registry.hub.docker.com/u/dockerfile/mysql/dockerfile/ 14
  • 15. Dockerfiles - MySQL Example (2/3) # Contents of file mysql-docker/Dockerfile (continued) # Configure MySQL: listening interface, log error, etc. RUN sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf RUN sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf RUN echo "mysqld_safe &" > /tmp/config RUN echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config RUN echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT OPTION;'" >> /tmp/config RUN bash /tmp/config && rm -f /tmp/config 15
  • 16. Dockerfiles - MySQL Example (3/3) # Contents of file mysql-docker/Dockerfile (continued) # Define default command CMD ["mysqld_safe"] # Expose guest port. Not required, but facilitates management # NEVER expose the public port in the Dockerfile EXPOSE 3306 16
  • 17. Dockerfiles - Building MySQL image $ docker build -t mysql-image mysql-docker/ Sending build context to Docker daemon 2.56 kB [...] debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype [...] 17
  • 19. Lesson 1: Dialog-less installs # Contents of file mysql/Dockerfile (showing differences) [...] RUN DEBIAN_FRONTEND=noninteractive apt-get install -y mysql-server [...] $ docker run -d mysql-image 5f3695d8f5e4dfc836156f645dbf6b647e264e58a25b4e2a9724b7522591b9bc $ docker build -t mysql-image mysql-docker/ [...] Successfully built d5cb85b206a4 That’s our image ID That’s our container ID (we can use a prefix as long as it is unique) 19
  • 20. Lesson 1: Testing the connectivity $ mysql -uroot -h 172.17.0.102 -e "SHOW DATABASES;" +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ $ docker inspect 5f3695d8f5e4 |grep IPAddress |cut -d'"' -f4 172.17.0.102 Finding the IP address of our container From the host, we can now connect to our MySQL box inside the container using the Docker network bridge. 20
  • 22. Lesson 2: Layers - Docker History $ docker history mysql-image IMAGE CREATED CREATED BY SIZE d5cb85b206a4 41 minutes ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B a3fcf7ad0e46 41 minutes ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B e495928f5148 41 minutes ago /bin/sh -c bash /tmp/config && rm -f /tmp/con 5.245 MB e81232406a48 41 minutes ago /bin/sh -c echo "mysql -e 'GRANT ALL PRIVILEG 131 B 3ed871742259 41 minutes ago /bin/sh -c echo "mysqladmin --silent --wait=3 59 B 7383675c6559 41 minutes ago /bin/sh -c echo "mysqld_safe &" > /tmp/config 14 B dfa40ac0f314 45 minutes ago /bin/sh -c sed -i 's/^(log_errors.*)/# 1/ 3.509 kB 01a7a7904f29 45 minutes ago /bin/sh -c sed -i 's/^(bind-addresss.*)/# 3.507 kB 2709eaa06d42 About an hour ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 130.2 MB 6ca9716f2565 About an hour ago /bin/sh -c apt-get update 20.8 MB 86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B 78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB 3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B 61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB 5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB 511136ea3c5a 19 months ago 0 B 17 layers ! Every Docker instruction creates a layer. 200 MB for Ubuntu 20 MB for apt-get update 130 MB for installing MySQL 22
  • 24. Lesson 2: Layers - What are they? ● Think of a layer as directory of files (or blocks) ● All these “physical” layers are combined into a “logical” file system for each individual container o Union file system o Copy-on-write o Like a stack: higher layers may override lower layers 24
  • 25. Lesson 2: Layers - Purpose (1/4) ● Blazing fast container instantiation o To create a new instance from an image, Docker simply creates a new empty read-write layer Great, but we could achieve this goal with 1 single layer per image + 1 layer per container Why 17 layers ? 25
  • 26. Lesson 2: Layers - Purpose (2/4) ● Faster image modification o Changing/adding a Dockerfile instruction causes only the modified layer(s) and those following it to be rebuilt How often do you plan on changing your Dockerfiles ? 26
  • 27. Lesson 2: Layers - Purpose (3/4) ● Faster distribution o when distributing the image (via docker push) and downloading it (via docker pull, or docker build), only the affected layer(s) are sent. 27
  • 28. Lesson 2: Layers - Purpose (4/4) ● Minimize disk space o All the containers located on the same Docker host and parent of the same image hierarchy will share layers. o Ubuntu Docker image is 200 MB o 1000 containers based on Ubuntu only takes 200 MB total (+ the additional packages they require) Will you have multiple variants (config and/or versions) of MySQL on the same machine ? How many MySQL servers will you have on the same machine ? 28
  • 29. Lesson 2: Layers - Layer Genocide $ cp -r mysql-docker/ mysql-docker-grouped $ vi mysql-docker-grouped/Dockerfile In this example, all our MySQL containers will be the same. Therefore, we’ll only be needing 1 single layer. 29
  • 30. Lesson 2: Layers - Combine multiple RUN instructions # Contents of file mysql-docker-grouped/Dockerfile [...] RUN apt-get update && apt-get install -y mysql-server && sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf && sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf && echo "mysqld_safe &" > /tmp/config && echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config && echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT OPTION;'" >> /tmp/config && bash /tmp/config && rm -f /tmp/config [...] 30
  • 31. Lesson 2: Layers - Docker History $ docker build -t mysql-image-grouped mysql-docker-grouped/ [...] Successfully built d5cb85b206a4 $ docker history mysql-image-grouped IMAGE CREATED CREATED BY SIZE 11ccd4cc6c82 About an hour ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B 59c9467d3360 About an hour ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B 0993d316210d About an hour ago /bin/sh -c apt-get update && DEBIAN_FRONT 151 MB 86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B 78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB 3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B 61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB 5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB 511136ea3c5a 19 months ago 0 B Freed 7 layers ! Our Docker now only adds 3 layers on top of the base image: RUN, CMD, EXPOSE 31
  • 33. Lesson 3: Staying fit - Compacting layers $ cp -r mysql-docker-grouped/ mysql-docker-cleaned $ vi mysql-docker-cleaned/Dockerfile Some commands, like apt-get update, creates some temporary files, which can be safely discarded after use. We can save space and create smaller images by deleting those files. 33
  • 34. Lesson 3: Staying fit - Removing temporary files # Contents of file mysql-docker-cleaned/Dockerfile (partial) [...] RUN apt-get update && apt-get install -y mysql-server && rm -fr /var/lib/apt/lists/* && [...] $ docker build -t mysql-image-cleaned mysql-docker-cleaned/ [...] Successfully built d5cb85b206a4 Remember: you’ll need to run apt-get update again next time you want to install something 34
  • 35. Lesson 3: Staying fit - Local Docker images $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE mysql-image-cleaned latest 032798b8e064 2 hours ago 322.8 MB mysql-image-grouped latest 11ccd4cc6c82 2 hours ago 343.6 MB mysql-image latest d5cb85b206a4 3 hours ago 348.9 MB ubuntu 14.04 86ce37374f40 6 weeks ago 192.7 MB The cleaned image occupies 17% less space than the original mysql-image (it’s a virtual size) [1]. MySQL is small; the impact can be much bigger for other applications. [1] ((348-192) - (322-192)) / (348-192) = 17% 35
  • 36. Lesson 3: Staying fit - Smallest Docker base images Image:Tag Size scratch 0.0 B busybox:ubuntu-14.04 5.6 MB debian:7 85.0 MB ubuntu:14.04 192.7 MB centos:7 210.0 MB fedora:21 241.3 MB 36
  • 37. Lesson 3: Staying fit - docker diff ● Show differences between container and the image o Useful to see which files have been modified/created when writing your Dockerfile 37
  • 38. Lesson 4: Fixed as “worksforme” 38
  • 39. Lesson 4: Reproducibility - Package version ● Your Dockerfile may build a different image in a few months than today’s image RUN apt-get install -y mysql-server RUN apt-get install -y mysql-server=5.5.40-0ubuntu0.14.04.1 Specify the package version explicitly is better 39
  • 40. Lesson 4: Reproducibility - Dependency version RUN apt-get install -y libaio1=0.3.109-4 mysql-common=5.5.40-0ubuntu0.14.04.1 libmysqlclient18=5.5.40-0ubuntu0.14.04.1 libwrap0=7.6.q-25 libdbi-perl=1.630- 1 libdbd-mysql-perl=4.025-1 libterm-readkey-perl=2.31-1 mysql-client-core- 5.5=5.5.40-0ubuntu0.14.04.1 mysql-client-5.5=5.5.40-0ubuntu0.14.04.1 mysql- server-core-5.5=5.5.40-0ubuntu0.14.04.1 psmisc=22.20-1ubuntu2 mysql-server- 5.5=5.5.40-0ubuntu0.14.04.1 libhtml-template-perl=2.95-1 mysql-server=5.5.40- 0ubuntu0.14.04.1 tcpd=7.6.q-25 Previous solution should be enough… But if you need higher guarantee of reproducibility: A. Specify the package version for the dependencies as well B. And / or use a cache proxy, maven proxy, etc. 40
  • 41. Lesson 5: Prototry A quick and dirty attempt to develop a working model of software. The original intent is to rewrite the ProtoTry, using lessons learned, but schedules never permit. Also known as legacy code. [1] 41[1] Michael Duell, Ailments of Unsuitable Project-Disoriented Software, http://www.fsfla.org/~lxoliva/fun/prog/resign-patterns
  • 42. Lesson 5: Prototry - Docker Hub Registry ● Before writing your own Dockerfile, try a build from someone else o https://registry.hub.docker.com/ o Official builds o Trusted (automated) builds o Other builds For advanced setup, see these images: ● jenkins ● dockerfile/java 42
  • 43. Lesson 5: Prototry - Using other people images PROs CONs ● Faster to get started ● Better tested ● You may end up with a mixed stack to support ○ e.g. different versions of Java ○ Ubuntu vs Debian vs CentOS ● Not all sources use all the best practices described in this presentation For medium - large organisations / heavy Docker users: Best to fork and write your own Dockerfiles 43
  • 44. Lesson 5: Prototry - Potential image hierarchy FROM ubuntu:14.04 # Organization-wide tools (e.g. vim, etc.) myorg-base myorg-java FROM myorg-base:1.0 # OpenJDK | OracleJDK myorg-python FROM myorg-base:1.0 # Install Python 2.7 python-app1 FROM myorg-python:2.7 # ... java-app3 FROM myorg-java:oracle-jdk7 # ... python-app2 FROM myorg-python:2.7 # ... 44
  • 45. Lesson 6: Volume Design Patterns 45
  • 46. ● Nothing to do - that’s the default Docker behavior o Application data is stored along with the infrastructure (container) data ● If the container is restarted, data is still there ● If the container is deleted, data is gone Lesson 6: Inside Container Pattern 46
  • 47. Lesson 6: Host Directory Pattern ● A directory on the host ● To share data across containers on the same host ● For example, put the source code on the host and mount it inside the container with the “-v” flag 47
  • 48. Lesson 6: Data-Only Container Pattern ● Run on a barebone image ● VOLUME command in the Dockerfile or “-v” flag at run ● Just use the “--volumes-from” flag to mount all the volumes in another container 48
  • 50. Lesson 7: Storage backend - Overview ● Options: o VFS o AUFS (default, docker < 0.7) o DeviceMapper  Direct LVM  Loop LVM (default in Red Hat) o Btrfs (experimental) o OverlayFS (experimental) Red Hat[1] says the fastest backends are: 1. OverlayFS 2. Direct LVM 3. BtrFS 4. Loop LVM Lookup your current Docker backend $ docker info |grep Driver [1] http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/ 50
  • 51. Lesson 7: Storage backend - VFS & AUFS ● Both are very basic (NOT for PROD) ● Both store each layer as a separate directory with regular files ● VFS o No Copy-on-Write (CoW) ● AUFS o Original Docker backend o File-level Copy-on-Write (CoW) VFS & AUFS can be useful to understand how Docker works Do not use in PROD 51
  • 52. Lesson 7: Storage backend - DeviceMapper (1/2) ● Already used by linux kernel for LVM2 (logical volume management) o Block-level Copy-on-Write (CoW) o Unused blocks do not use space ● Uses thin pool provisioning to implement CoW snapshots o Each pool requires 2 block devices: data & metadata o By default, uses loop back mounts on sparse regular files # ls -alhs /var/lib/docker/devicemapper/devicemapper 506M -rw-------. 1 root root 100G Sep 10 20:15 data 1.1M -rw-------. 1 root root 2.0G Sep 10 20:15 metadata Loop LVM 52
  • 53. Lesson 7: Storage backend - DeviceMapper (2/2) ● In production: o Use real block devices! (Direct LVM) o Ideally, data & metadata each on its own spindle o Additional configuration is required Docker does not do that for you 53
  • 54. Lesson 7: Storage backend - Btrfs & OverlayFS Btrfs: ● Requires /var/lib/docker to be on a btrfs file system ● Block-level Copy-on-Write (CoW) using Btrfs’s snapshotting ● Each layer stored as a Btrfs subvolume ● No SELinux OverlayFS: ● Support page cache sharing ● Lower FS contains the base image (XFS or EXT4) ● Upper FS contains the deltas ● No SELinux Claims a huge RAM saving 54
  • 56. Docker ● Ethernet bridge “docker0” created when Docker boots ● Virtual subnet on the host (default: 172.17.42.1/16) ● Each container has a pair of virtual Ethernet interfaces ● You can remove “docker0” and use your own bridge if you want 56
  • 57. Weave Why Weave? ● Docker built-in functionalities don’t provide a solution for connecting containers on multiple hosts ● Weave create a virtual network to permit a distributed environment (common in the real word) 57
  • 58. Weave How does it work? ● Virtual routers establish TCP connections to each other with a handshake ● These connections are duplex ● Use “pcap” to capture packets ● Exclude traffic between local containers 58
  • 59. Weave Weave Container Container 1 Container 2 Container 3 Host A Weave Container Container 1 Container 2 Container 3 Host B 59
  • 60. Weave - images Image:Tag Size zettio/weave:0.8.0 11 MB zettio/weavedns:0.8.0 9.4 MB zettio/weavetools:0.8.0 3.7 MB 60
  • 61. Weave - getting started $ sudo weave launch $ sudo weave run 10.0.0.1/24 -ti --name ubuntu-01 ubuntu:14.04 $ sudo weave launch weave-01 $ sudo weave run 10.0.0.2/24 -ti --name ubuntu-02 ubuntu:14.04 ● First host: weave-01 ● Second host: weave-02 Note: “weave run” invokes “docker run -d” (running as a daemon) Starts the weave router in a container Starts the weave router in a container and peers it CIDR notation 61
  • 62. Weave - testing the connectivity (1/2) $ sudo weave status weave router 0.8.0 Our name is 7a:ab:c1:21:f9:3b Sniffing traffic on &{15 65535 ethwe 56:40:66:0b:a4:c6 up|broadcast|multicast} MACs: 56:40:66:0b:a4:c6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:39.23846091 +0000 UTC) 7a:ab:c1:21:f9:3b -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.142183122 +0000 UTC) a2:60:ab:8b:1f:b6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.716414595 +0000 UTC) 7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.204010927 +0000 UTC) 1e:b4:78:1e:dd:23 -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.42594994 +0000 UTC) Peers: Peer 7a:ab:c1:21:f9:3b (v1) (UID 17511927952474106279) -> 7a:5a:98:6e:92:2e [192.168.1.30:47638] Peer 7a:5a:98:6e:92:2e (v1) (UID 8527109358448991597) -> 7a:ab:c1:21:f9:3b [192.168.1.195:6783] Routes: unicast: 7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e 7a:ab:c1:21:f9:3b -> 00:00:00:00:00:00 broadcast: 7a:ab:c1:21:f9:3b -> [7a:5a:98:6e:92:2e] 7a:5a:98:6e:92:2e -> [] Reconnects: ● First host: weave-01 Connected peers Virtual interface used by Weave Containers and host points 62
  • 63. Weave - testing the connectivity (2/2) $ sudo docker attach ubuntu-02 $ ping -c 4 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=4.22 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=1.20 ms 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=1.73 ms 64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=2.02 ms --- 10.0.0.1 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3008ms rtt min/avg/max/mdev = 1.206/2.299/4.226/1.150 ms ● Second host: weave-02 It pings! 63
  • 65. cAdvisor ● New tool from Google ● Specialized for Docker containers PROs CONs ● Great web interface ● Docker image available (18 MB) to try it in seconds ● Stats can be export to InfluxDB (data mining to do) ● Needs more maturity ● Missing metrics ○ No data for Disk I/O ● Only keep last 60 metrics locally (not configurable) 65
  • 66. Monitoring with cAdvisor Web interface → 66
  • 68. AdamCloud - The next steps ● Docker + Weave = success ● Open-source the project and merge it upstream into the AmpLab genomic pipeline. ● Support for Amazon EC2 environments ● Improve administration of Docker containers o Monitoring, orchestration, provisioning 68
  • 69. Docker Conclusion ● 1 Docker container = 1 background daemon ● Container isolation is not like a VM ● Use correct versions of images and keep a trace ● Docker is less interesting for multi-tenants use cases (no SSH in the containers) ● Docker is FAST and VERSATILE ● cAdvisor is an interesting monitoring tool, but limited ● Docker is perfect for short lived apps (no long term data persistence) ● Data intensive apps should review the Docker docs carefully. Start looking at Direct LVM. 69
  • 70. References ● Jonathan Bergknoff - Building good docker images, http://jonathan.bergknoff.com/journal/building-good- docker-images ● Michael Crosby - Dockerfile Best Practices, http://crosbymichael.com/dockerfile-best-practices.html ● Michael Crosby - Dockerfile Best Practices - take 2, http://crosbymichael.com/dockerfile-best-practices-take- 2.html ● Nathan Leclaire - The Dockerfile is not the source of truth for your image, http://nathanleclaire.com/blog/2014/09/29/the-dockerfile-is-not-the-source-of-truth-for-your-image/ ● Docker Documentation - Understanding Docker, https://docs.docker.com/introduction/understanding-docker/ ● Docker Documentation - Docker User Guide, https://docs.docker.com/userguide/ ● Docker Documentation - Dockerfile Reference, https://docs.docker.com/reference/builder/ ● Docker Documentation - Command Line (CLI) User Guide, https://docs.docker.com/reference/commandline/cli/ ● Docker Documentation - Advanced networking, http://docs.docker.com/articles/networking/ ● Project Atomic - Supported Filesystems, http://www.projectatomic.io/docs/filesystems/ ● Red Hat Developer Blog - Comprehensive Overview of Storage Scalability in Docker, http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/ ● Linux Kernel Documentation - DeviceMapper Thin Provisioning, https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt ● weave - the Docker network, http://zettio.github.io/weave/ ● GitHub - google/cadvisor, https://github.com/google/cadvisor 70