SlideShare a Scribd company logo
USING DOCKER FOR DATA 
SCIENCE
RECAP
WHY DOCKER 
Portable environment 
Isolated between projects 
Stateless 
Fast local file access 
Hetrogenous
GET DOCKER 
https://docs.docker.com/installation/ 
boot2docker .dmg or .exe 
apt-get install docker.io ...
RUN SCIPYSERVER 
$ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
ipython/scipyserver 
https://localhost:443 
https://{boot2docker ip}:443
CREATE DATA-ONLY CONTAINERS 
$ docker run  
-d  
-v ~/notebooks:/notebooks  
--name notebooks_container  
ubuntu 
echo notebooks 
$ docker run -d -v ~/data:/data --name data_container ubuntu echo
MOUNT DATA-ONLY CONTAINERS 
$ docker stop dev_notebook 
$ docker rm dev_notebook 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
ipython/scipyserver
CREATE A DOCKERFILE 
FROM ipython/scipyserver 
MAINTAINER Calvin Giles <calvin.giles@gmail.com> 
COPY requirements.txt /requirements.txt 
RUN pip2 install -r /requirements.txt 
RUN pip3 install -r /requirements.txt 
$ docker build  
-t calvingiles/ds-notebook  
. 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
THIS TIME 
Creating and connecting to local database containers 
Tweaking the boot2docker vm memory from 2GB to 8 (or 
more...) 
Automated builds with github linking 
Forget everything and use fig
CREATE LOCAL DATABASE CONTAINERS 
$ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu 
$ docker run -d --name=dev_postgres postgres 
$ docker run -d --name=dev_mongo mongo 
$ docker run  
-d  
-e "PASSWORD=YourPassword?"  
--link dev_postgres:dev_postgres 
--link dev_mongo:dev_mongo 
--name dev_notebook  
-p 443:8888  
--volumes-from data_container  
--volumes-from notebooks_container  
calvingiles/ds-notebook
TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB 
Either: 
$ boot2docker delete 
$ boot2docker init -m 5555 
... lots of output ... 
$ boot2docker info 
{ ... "Memory":5555 ...} 
Or (doesn't loose non-host data persistence): 
$ VBoxManage modifyvm boot2docker-vm --memory 5555 
$ boot2docker stop 
$ boot2docker start 
$ boot2docker info 
{ ... "Memory":5555 ...}
AUTOMATED BUILDS WITH GITHUB LINKING 
Commit Dockerfile, requirements.txt etc. to a github 
repo 
Add an "Automated Buld" on 
docker hub 
Select the repo and accept defaults 
Check the "Build Details" for your repo build to finish 
$ docker run <dockername>/<reponame>
FORGET EVERYTHING AND USE FIG 
http://www.fig.sh/install.html 
$ curl -L https://github.com/docker/fig/releases/download/ 
1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig 
$ chmod +x ~/bin/fig
FIG.YML -- DATA 
notebooks: 
command: echo created 
image: busybox 
volumes: 
- "~/Google Drive/notebooks:/notebooks/analysis" 
data: 
command: echo created 
image: busybox 
volumes: 
- "~/Google Drive/data:/data/analysis" 
...
FIG.YML -- POSTGRES 
... 
devpostgresdata: 
command: echo created 
image: busybox 
volumes: 
- /var/lib/postgresql/data 
devpostgres: 
environment: 
- POSTGRES_PASSWORD 
image: postgres 
links: 
ports: 
- "5432:5432" 
volumes_from: 
- devpostgresdata 
...
FIG.YML -- NOTEBOOK SERVER 
... 
ds_server: 
environment: 
- PASSWORD 
image: calvingiles/data-science-environment 
links: 
- devpostgres:postgres 
ports: 
- "443:8888" 
volumes_from: 
- notebooks 
- data
FIG UP 
In the same directory as fig.yml: 
$ fig rm 
$ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
HERE'S ONE I MADE EARLIER 
$ curl -L http://goo.gl/rW47v3 > fig.yml 
$ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
NEXT TIME 
Linking to private git repositories 
Lessons learnt from using fig 
Resizing boot2docker volume (to fix "no space left on device") 
Fixing "Error response from daemon: client and server don't 
have same version" 
TLS and CA certs to fix "Your connection is not private" 
Whatever other pain I have had to deal with before then 
Whatever pain you feel -- let me know @cavingiles
MORE? 
Docker: 
http://docs.docker.com/userguide/ 
http://docs.docker.com/reference/commandline/cli/ 
Fig: 
http://www.fig.sh/ 
ipython docker images: 
https://registry.hub.docker.com/repos/ipython/ 
my docker image: 
https://github.com/calvingiles/data-science-environment 
https://registry.hub.docker.com/u/calvingiles/data-science-environment/ 
fig.yml gist: 
http://goo.gl/rW47v3
ABOUT ME 
Calvin Giles 
Data Scientist at Adthena 
PyData Meetup Organiser 
untangleconsulting.io 
calvin.giles@gmail.com 
@calvingiles on twitter, github, docker hub (and many more)

More Related Content

PDF
Docker for data science
PDF
Docker @ Data Science Meetup
PDF
Docker, c'est bonheur !
PDF
Docker Demo @ IuK Seminar
PDF
Configuration Surgery with Augeas
PPTX
2012 coscup - Build your PHP application on Heroku
PDF
Puppet at Opera Sofware - PuppetCamp Oslo 2013
PDF
Shared Object images in Docker: What you need is what you want.
Docker for data science
Docker @ Data Science Meetup
Docker, c'est bonheur !
Docker Demo @ IuK Seminar
Configuration Surgery with Augeas
2012 coscup - Build your PHP application on Heroku
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Shared Object images in Docker: What you need is what you want.

What's hot (18)

PDF
Manage WordPress with Awesome using wp cli
PPTX
2009 cluster user training
DOCX
Hadoop installation
PDF
Ops for everyone - John Britton
PDF
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
PDF
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
PDF
Medicine show2 Drupal Bristol Camp 2015
PPTX
Drupal from scratch
PDF
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
PDF
Ansible, Simplicity, and the Zen of Python
PDF
Ansible - Swiss Army Knife Orchestration
PDF
PuppetCamp SEA 1 - Use of Puppet
PDF
Top Node.js Metrics to Watch
PDF
The Puppet Debugging Kit: Building Blocks for Exploration and Problem Solving...
PDF
Puppet Camp Phoenix 2015: Managing Files via Puppet: Let Me Count The Ways (B...
PDF
Docker & FieldAware
PDF
Configuration surgery with Augeas (OggCamp 12)
PDF
AnsibleFest 2014 - Role Tips and Tricks
Manage WordPress with Awesome using wp cli
2009 cluster user training
Hadoop installation
Ops for everyone - John Britton
Drupal Camp Brighton 2015: Ansible Drupal Medicine show
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
Medicine show2 Drupal Bristol Camp 2015
Drupal from scratch
JDD 2017: Nginx + Lua = OpenResty (Marcin Stożek)
Ansible, Simplicity, and the Zen of Python
Ansible - Swiss Army Knife Orchestration
PuppetCamp SEA 1 - Use of Puppet
Top Node.js Metrics to Watch
The Puppet Debugging Kit: Building Blocks for Exploration and Problem Solving...
Puppet Camp Phoenix 2015: Managing Files via Puppet: Let Me Count The Ways (B...
Docker & FieldAware
Configuration surgery with Augeas (OggCamp 12)
AnsibleFest 2014 - Role Tips and Tricks
Ad

Viewers also liked (20)

PDF
Using python and docker for data science
PDF
BIG DATA サービス と ツール
PDF
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
PDF
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
PDF
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
PDF
Growing the Mesos Ecosystem
PDF
Time Series Processing with Solr and Spark
PDF
Overview of DataStax OpsCenter
PPTX
High Performance Processing of Streaming Data
PPTX
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
PDF
Data analysis with Pandas and Spark
PDF
The basics of fluentd
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
PDF
Fluentd and Kafka
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
PPTX
Hadoop on Docker
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
PPTX
I Heart Log: Real-time Data and Apache Kafka
PDF
Data processing platforms with SMACK: Spark and Mesos internals
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using python and docker for data science
BIG DATA サービス と ツール
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Growing the Mesos Ecosystem
Time Series Processing with Solr and Spark
Overview of DataStax OpsCenter
High Performance Processing of Streaming Data
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data analysis with Pandas and Spark
The basics of fluentd
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Fluentd and Kafka
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Hadoop on Docker
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
I Heart Log: Real-time Data and Apache Kafka
Data processing platforms with SMACK: Spark and Mesos internals
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Ad

Similar to Using docker for data science - part 2 (20)

PDF
Data Science Workflows using Docker Containers
PDF
Docker 1.9 Workshop
PDF
Docker Containers- Data Engineers' Arsenal.pdf
PPTX
Introduction to Docker
PPTX
Docker DANS workshop
 
PDF
Learning Docker with Thomas
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
PDF
Docker primer and tips
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
PDF
The Docker Ecosystem
PPTX
Learn enough Docker to be dangerous
PDF
PDF
Deploying deep learning models with Docker and Kubernetes
PDF
Docker Volumes - Everything about docker Volumes
PPTX
Docker training
PDF
Docker for Ruby Developers
PDF
PDXPortland - Dockerize Django
PDF
Dockerize a Django app elegantly
PDF
Docker for Deep Learning (Andrea Panizza)
Data Science Workflows using Docker Containers
Docker 1.9 Workshop
Docker Containers- Data Engineers' Arsenal.pdf
Introduction to Docker
Docker DANS workshop
 
Learning Docker with Thomas
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker primer and tips
Lessons Learned Running Hadoop and Spark in Docker Containers
The Docker Ecosystem
Learn enough Docker to be dangerous
Deploying deep learning models with Docker and Kubernetes
Docker Volumes - Everything about docker Volumes
Docker training
Docker for Ruby Developers
PDXPortland - Dockerize Django
Dockerize a Django app elegantly
Docker for Deep Learning (Andrea Panizza)

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
top salesforce developer skills in 2025.pdf
PDF
AI in Product Development-omnex systems
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPT
Introduction Database Management System for Course Database
PPTX
Introduction to Artificial Intelligence
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Digital Strategies for Manufacturing Companies
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
System and Network Administraation Chapter 3
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Odoo Companies in India – Driving Business Transformation.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Choose the Right IT Partner for Your Business in Malaysia
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Wondershare Filmora 15 Crack With Activation Key [2025
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
top salesforce developer skills in 2025.pdf
AI in Product Development-omnex systems
CHAPTER 2 - PM Management and IT Context
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Introduction Database Management System for Course Database
Introduction to Artificial Intelligence
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms II-SECS-1021-03
Digital Strategies for Manufacturing Companies
Odoo POS Development Services by CandidRoot Solutions
Softaken Excel to vCard Converter Software.pdf
System and Network Administraation Chapter 3
Adobe Illustrator 28.6 Crack My Vision of Vector Design

Using docker for data science - part 2

  • 1. USING DOCKER FOR DATA SCIENCE
  • 3. WHY DOCKER Portable environment Isolated between projects Stateless Fast local file access Hetrogenous
  • 4. GET DOCKER https://docs.docker.com/installation/ boot2docker .dmg or .exe apt-get install docker.io ...
  • 5. RUN SCIPYSERVER $ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 ipython/scipyserver https://localhost:443 https://{boot2docker ip}:443
  • 6. CREATE DATA-ONLY CONTAINERS $ docker run -d -v ~/notebooks:/notebooks --name notebooks_container ubuntu echo notebooks $ docker run -d -v ~/data:/data --name data_container ubuntu echo
  • 7. MOUNT DATA-ONLY CONTAINERS $ docker stop dev_notebook $ docker rm dev_notebook $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container ipython/scipyserver
  • 8. CREATE A DOCKERFILE FROM ipython/scipyserver MAINTAINER Calvin Giles <calvin.giles@gmail.com> COPY requirements.txt /requirements.txt RUN pip2 install -r /requirements.txt RUN pip3 install -r /requirements.txt $ docker build -t calvingiles/ds-notebook . $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 9. THIS TIME Creating and connecting to local database containers Tweaking the boot2docker vm memory from 2GB to 8 (or more...) Automated builds with github linking Forget everything and use fig
  • 10. CREATE LOCAL DATABASE CONTAINERS $ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu $ docker run -d --name=dev_postgres postgres $ docker run -d --name=dev_mongo mongo $ docker run -d -e "PASSWORD=YourPassword?" --link dev_postgres:dev_postgres --link dev_mongo:dev_mongo --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  • 11. TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB Either: $ boot2docker delete $ boot2docker init -m 5555 ... lots of output ... $ boot2docker info { ... "Memory":5555 ...} Or (doesn't loose non-host data persistence): $ VBoxManage modifyvm boot2docker-vm --memory 5555 $ boot2docker stop $ boot2docker start $ boot2docker info { ... "Memory":5555 ...}
  • 12. AUTOMATED BUILDS WITH GITHUB LINKING Commit Dockerfile, requirements.txt etc. to a github repo Add an "Automated Buld" on docker hub Select the repo and accept defaults Check the "Build Details" for your repo build to finish $ docker run <dockername>/<reponame>
  • 13. FORGET EVERYTHING AND USE FIG http://www.fig.sh/install.html $ curl -L https://github.com/docker/fig/releases/download/ 1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig $ chmod +x ~/bin/fig
  • 14. FIG.YML -- DATA notebooks: command: echo created image: busybox volumes: - "~/Google Drive/notebooks:/notebooks/analysis" data: command: echo created image: busybox volumes: - "~/Google Drive/data:/data/analysis" ...
  • 15. FIG.YML -- POSTGRES ... devpostgresdata: command: echo created image: busybox volumes: - /var/lib/postgresql/data devpostgres: environment: - POSTGRES_PASSWORD image: postgres links: ports: - "5432:5432" volumes_from: - devpostgresdata ...
  • 16. FIG.YML -- NOTEBOOK SERVER ... ds_server: environment: - PASSWORD image: calvingiles/data-science-environment links: - devpostgres:postgres ports: - "443:8888" volumes_from: - notebooks - data
  • 17. FIG UP In the same directory as fig.yml: $ fig rm $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  • 18. HERE'S ONE I MADE EARLIER $ curl -L http://goo.gl/rW47v3 > fig.yml $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  • 19. NEXT TIME Linking to private git repositories Lessons learnt from using fig Resizing boot2docker volume (to fix "no space left on device") Fixing "Error response from daemon: client and server don't have same version" TLS and CA certs to fix "Your connection is not private" Whatever other pain I have had to deal with before then Whatever pain you feel -- let me know @cavingiles
  • 20. MORE? Docker: http://docs.docker.com/userguide/ http://docs.docker.com/reference/commandline/cli/ Fig: http://www.fig.sh/ ipython docker images: https://registry.hub.docker.com/repos/ipython/ my docker image: https://github.com/calvingiles/data-science-environment https://registry.hub.docker.com/u/calvingiles/data-science-environment/ fig.yml gist: http://goo.gl/rW47v3
  • 21. ABOUT ME Calvin Giles Data Scientist at Adthena PyData Meetup Organiser untangleconsulting.io calvin.giles@gmail.com @calvingiles on twitter, github, docker hub (and many more)