SlideShare a Scribd company logo
Ferry - Share & Deploy Big
Data Applications with Docker
James Horey
• Writing a simple application with Bokeh
• Packaging our application with Docker
• Orchestrating our application with Ferry
Technical material can be found at:
https://github.com/jhorey/pydata
Bokeh
U.S. Census
http://api.census.gov/data/2011/acs5?get=DP03_0062E&for=county:*&in=state:06
Median income All counties California
Download some data
Let’s install Bokeh
$ pip install bokeh
>> Downloading/unpacking bokeh
>> SystemError: Cannot compile 'Python.h'. Perhaps you need
to install python-dev|python-devel.
$ apt-get install python-dev & pip install bokeh
>> "gcc: error trying to exec 'cc1plus': execvp: No such file
or directory
$ apt-get install g++
$ pip install bokeh
RuntimeError: bokeh sample data directory does not exist, please
execute bokeh.sampledata.download()
$ python
>>> import bokeh.sampledata
A simple application
$ python plot.py Kentucky
Louisville
Let’s share
#!/bin/bash
!
# Make sure we have ‘pip’ installed
apt-get install python-pip
!
# Install packages in right order
apt-get —-yes install g++ python-dev
pip install bokeh
!
# Now download the data
python geography.py data/
python population economic Kentucky
data/
!
# Start the web server
python webserver data/
• Your script didn’t work
• Oh, I was supposed to run this as
sudo?
• Ok, it still didn’t work
• I get this funny error
• Oh yeah, I’m running Redhat
• Ok I’m at my desk, just use my
computer
• Encapsulates applications in isolated containers
• Makes it easy and safe to distribute applications
• Easy to get started
Our Dockerfile
Start from a
clean Precise
image
Install stuff
Add our files
Run this when
starting
$ docker build -t ferry/pydata .
$ docker push ferry/pydata
Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
p1
Kernel
Hardware
Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
$ docker run -p 8001:8000 -name p2 —d ferry/pydata
$ docker run -p 8002:8000 -name p3 —d ferry/pydata
p1 p2 p3
Kernel
Hardware
• Containers share basic kernel
and H.W. capabilities
• No virtualization
• Containers are isolated
• Access via port forwarding
You can run these commands now!
• Highly scalable and fault-tolerant
• Great for storing streaming data (sensors,
messages)
CREATE KEYSPACE census WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1 };
!
USE census;
!
CREATE TABLE acs_economic_data (
state_cd TEXT,
state_name TEXT,
county_cd TEXT,
county_name TEXT,
median INT,
mean INT,
capita INT,
PRIMARY KEY(count_cd, state_cd)
);
Orchestration
Web DB
Web + DB
• Simple
• Full control
• More work for you
• Simpler Dockerfile
• More extensible
• How to orchestrate?
• Specify the containers that constitute your
application in YAML
• Support for Hadoop, Cassandra, GlusterFS, and
OpenMPI
• It’s a little bit like pip for your Docker-based
runtime environment
Ferry
http://ferry.opencore.io
Our Application
backend:
- storage:
personality: "cassandra"
instances: 1
connectors:
- personality: "ferry/pydata-cassandra"
ports: ["8000:8000"]
# The cassandra-client base comes with the various drivers
# pre-installed.
FROM ferry/cassandra-client
NAME ferry/pydata-cassandra
!
# Place the start scripts in the events directories so they
# are started when the connector is brought up.
ADD ./scripts/startcas.sh /service/runscripts/start/
ADD ./scripts/restartcas.sh /service/runscripts/restart/
RUN chmod a+x /service/runscripts/start/startcas.sh
RUN chmod a+x /service/runscripts/restart/restartcas.sh
+
Easy to share (again)
$ ferry start cassandra.yml
sa-df8d0aa6
$ ferry ps
UUID Storage Compute Connectors Status Base Time
---- ------- ------- ---------- ------ ---- ----
sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml
$ ferry ssh sa-df8d0aa6
root@client-se-a5350a8d:~# ps -eaf | grep python
root 144 1 0 19:49 ? 00:00:00 python /home/ferry/
pydata/bokeh/webserver.py /home/ferry/pydata/data
What’s it doing?
$ ferry start cassandra.yml
Web C* C*
root@client-se-a5350a8d:~# env | grep BACK
BACKEND_STORAGE_TYPE=cassandra
BACKEND_STORAGE_IP=10.1.0.12
Generate!
Config
What’s it doing?
$ ferry start yarn
Client
Y Y
root@client-se-b597cb21:~# env | grep BACK
BACKEND_STORAGE_TYPE=gluster
BACKEND_STORAGE_IP=10.1.0.18
BACKEND_COMPUTE_TYPE=yarn
BACKEND_COMPUTE_IP=10.1.0.15
G G
What’s it doing?
$ ferry stop sa-c6cbb572
Client
Y Y
G G
Next steps
$ ferry share sa-df8d0aa6
w c* c*
Hardware
w c* c*
Hardware
w c* c*
Hardware
Next steps
$ ferry deploy sa-df8d0aa6
w c* c*
Hardware
w
c* c*
Hardware
Hardware Hardware
VPCEC2
S3
• Even simple applications can be complicated to
install and run
• Docker helps quite a bit with this
• Ferry helps build out big data applications
Thank you!
!
James
jlh@opencore.io
!
Ferry
ferry.opencore.io
@open_core_io

More Related Content

PDF
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
PDF
Ansible at work
PDF
Chef-Zero & Local Mode
PDF
Rapid Infrastructure Provisioning
PPTX
Boost your website by running PHP on Nginx
PDF
TIAD : Automating the aplication lifecycle
PDF
Running PHP on Nginx
PDF
Improve Magento Performance
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Ansible at work
Chef-Zero & Local Mode
Rapid Infrastructure Provisioning
Boost your website by running PHP on Nginx
TIAD : Automating the aplication lifecycle
Running PHP on Nginx
Improve Magento Performance

What's hot (17)

PPTX
YARN Services
PDF
Fixing Growing Pains With Puppet Data Patterns
PDF
Mitchell Hashimoto, HashiCorp
PPTX
Hashicorp: Delivering the Tao of DevOps
PDF
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...
PDF
Phoenix for Rails Devs
ODP
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experiment
PDF
Best Practices of Infrastructure as Code with Terraform
PPT
Python Deployment with Fabric
PDF
Build Automation 101
PPTX
Network automation (NetDevOps) with Ansible
PDF
Creating and Deploying Static Sites with Hugo
PPTX
Dockerizing Windows Server Applications by Ender Barillas and Taylor Brown
PDF
Spark Summit EU talk by William Benton
PDF
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
PPTX
PDF
Chef ignited a DevOps revolution – BK Box
YARN Services
Fixing Growing Pains With Puppet Data Patterns
Mitchell Hashimoto, HashiCorp
Hashicorp: Delivering the Tao of DevOps
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...
Phoenix for Rails Devs
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experiment
Best Practices of Infrastructure as Code with Terraform
Python Deployment with Fabric
Build Automation 101
Network automation (NetDevOps) with Ansible
Creating and Deploying Static Sites with Hugo
Dockerizing Windows Server Applications by Ender Barillas and Taylor Brown
Spark Summit EU talk by William Benton
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
Chef ignited a DevOps revolution – BK Box
Ad

Similar to Pydata2014 (20)

PDF
Deploying MariaDB for HA on Google Cloud Platform
PDF
Using docker for data science - part 2
PDF
Python on Cloud Foundry
PDF
presentation @ docker meetup
PDF
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
PDF
Docker Application to Scientific Computing
PDF
Bring Your Own Container: Using Docker Images In Production
PDF
PDF
Data Science Workflows using Docker Containers
PDF
Docker and Django Meet For A Tango - London Meetup
PDF
PyconUK-2015
PDF
Docker for Ruby Developers
PPTX
Storage and-compute-hdfs-map reduce
PDF
Scaling PyData Up and Out
PDF
Docker Tips And Tricks at the Docker Beijing Meetup
PDF
Dockerizing OpenStack for High Availability
PPTX
Docker DANS workshop
 
PDF
Troubleshooting tips from docker support engineers
Deploying MariaDB for HA on Google Cloud Platform
Using docker for data science - part 2
Python on Cloud Foundry
presentation @ docker meetup
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
Lessons Learned Running Hadoop and Spark in Docker Containers
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker Application to Scientific Computing
Bring Your Own Container: Using Docker Images In Production
Data Science Workflows using Docker Containers
Docker and Django Meet For A Tango - London Meetup
PyconUK-2015
Docker for Ruby Developers
Storage and-compute-hdfs-map reduce
Scaling PyData Up and Out
Docker Tips And Tricks at the Docker Beijing Meetup
Dockerizing OpenStack for High Availability
Docker DANS workshop
 
Troubleshooting tips from docker support engineers
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
Reliability_Chapter_ presentation 1221.5784
Introduction-to-Cloud-ComputingFinal.pptx
Business Analytics and business intelligence.pdf
Qualitative Qantitative and Mixed Methods.pptx
Business Acumen Training GuidePresentation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
1_Introduction to advance data techniques.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to machine learning and Linear Models
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Pydata2014

  • 1. Ferry - Share & Deploy Big Data Applications with Docker James Horey
  • 2. • Writing a simple application with Bokeh • Packaging our application with Docker • Orchestrating our application with Ferry Technical material can be found at: https://github.com/jhorey/pydata
  • 6. Let’s install Bokeh $ pip install bokeh >> Downloading/unpacking bokeh >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. $ apt-get install python-dev & pip install bokeh >> "gcc: error trying to exec 'cc1plus': execvp: No such file or directory $ apt-get install g++ $ pip install bokeh RuntimeError: bokeh sample data directory does not exist, please execute bokeh.sampledata.download() $ python >>> import bokeh.sampledata
  • 7. A simple application $ python plot.py Kentucky Louisville
  • 8. Let’s share #!/bin/bash ! # Make sure we have ‘pip’ installed apt-get install python-pip ! # Install packages in right order apt-get —-yes install g++ python-dev pip install bokeh ! # Now download the data python geography.py data/ python population economic Kentucky data/ ! # Start the web server python webserver data/ • Your script didn’t work • Oh, I was supposed to run this as sudo? • Ok, it still didn’t work • I get this funny error • Oh yeah, I’m running Redhat • Ok I’m at my desk, just use my computer
  • 9. • Encapsulates applications in isolated containers • Makes it easy and safe to distribute applications • Easy to get started
  • 10. Our Dockerfile Start from a clean Precise image Install stuff Add our files Run this when starting $ docker build -t ferry/pydata . $ docker push ferry/pydata
  • 11. Sharing made simple $ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata p1 Kernel Hardware
  • 12. Sharing made simple $ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata $ docker run -p 8001:8000 -name p2 —d ferry/pydata $ docker run -p 8002:8000 -name p3 —d ferry/pydata p1 p2 p3 Kernel Hardware • Containers share basic kernel and H.W. capabilities • No virtualization • Containers are isolated • Access via port forwarding You can run these commands now!
  • 13. • Highly scalable and fault-tolerant • Great for storing streaming data (sensors, messages) CREATE KEYSPACE census WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; ! USE census; ! CREATE TABLE acs_economic_data ( state_cd TEXT, state_name TEXT, county_cd TEXT, county_name TEXT, median INT, mean INT, capita INT, PRIMARY KEY(count_cd, state_cd) );
  • 14. Orchestration Web DB Web + DB • Simple • Full control • More work for you • Simpler Dockerfile • More extensible • How to orchestrate?
  • 15. • Specify the containers that constitute your application in YAML • Support for Hadoop, Cassandra, GlusterFS, and OpenMPI • It’s a little bit like pip for your Docker-based runtime environment Ferry http://ferry.opencore.io
  • 16. Our Application backend: - storage: personality: "cassandra" instances: 1 connectors: - personality: "ferry/pydata-cassandra" ports: ["8000:8000"] # The cassandra-client base comes with the various drivers # pre-installed. FROM ferry/cassandra-client NAME ferry/pydata-cassandra ! # Place the start scripts in the events directories so they # are started when the connector is brought up. ADD ./scripts/startcas.sh /service/runscripts/start/ ADD ./scripts/restartcas.sh /service/runscripts/restart/ RUN chmod a+x /service/runscripts/start/startcas.sh RUN chmod a+x /service/runscripts/restart/restartcas.sh +
  • 17. Easy to share (again) $ ferry start cassandra.yml sa-df8d0aa6 $ ferry ps UUID Storage Compute Connectors Status Base Time ---- ------- ------- ---------- ------ ---- ---- sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml $ ferry ssh sa-df8d0aa6 root@client-se-a5350a8d:~# ps -eaf | grep python root 144 1 0 19:49 ? 00:00:00 python /home/ferry/ pydata/bokeh/webserver.py /home/ferry/pydata/data
  • 18. What’s it doing? $ ferry start cassandra.yml Web C* C* root@client-se-a5350a8d:~# env | grep BACK BACKEND_STORAGE_TYPE=cassandra BACKEND_STORAGE_IP=10.1.0.12 Generate! Config
  • 19. What’s it doing? $ ferry start yarn Client Y Y root@client-se-b597cb21:~# env | grep BACK BACKEND_STORAGE_TYPE=gluster BACKEND_STORAGE_IP=10.1.0.18 BACKEND_COMPUTE_TYPE=yarn BACKEND_COMPUTE_IP=10.1.0.15 G G
  • 20. What’s it doing? $ ferry stop sa-c6cbb572 Client Y Y G G
  • 21. Next steps $ ferry share sa-df8d0aa6 w c* c* Hardware w c* c* Hardware w c* c* Hardware
  • 22. Next steps $ ferry deploy sa-df8d0aa6 w c* c* Hardware w c* c* Hardware Hardware Hardware VPCEC2 S3
  • 23. • Even simple applications can be complicated to install and run • Docker helps quite a bit with this • Ferry helps build out big data applications