SlideShare a Scribd company logo
Real-Time Data Processing Pipeline &
Visualization with Docker, Spark, Kafka
and Cassandra
Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
Personal Information
• Roberto Gandolfo Hashioka
• @rogaha (Github) e @rhashioka (Twitter)
• Finance -> Software Engineer
• Growth & Data Engineer at Docker
Summary
• Background / Motivation
• Project Goals
• How to build it?
• DEMO
Background
• Gather of data from multiple sources and process them in “real-time”
• Transform raw data into meaningful and useful information used to enable more effective
decision-making process
• Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities
for future investments
• Data transparency and standardization
Project Goals
• Create a data processing pipeline that can handle a huge amount of events per second
• Automate the development environment — Docker compose.
• Automate the remote machines management — Docker for AWS / Machine.
• Reduce the time to market / time to development — New hires / new features.
Project / Language Stack
How to build it?
• Step 1: Install Docker for Mac/Win and dockerize all the applications
link: https://www.docker.com/products/docker
Exemplo de Dockerfile
-----------------------------------------------------------------------------------------------------------
FROM ubuntu:14.04
MAINTAINER Roberto Hashioka (roberto@docker.com)
RUN apt-get update && apt-get install -y nginx
RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html
EXPOSE 80
------------------------------------------------------------------------------------------------------------
$ docker build –t rogaha/web_demotiad2016 .
$ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
How to build it?
• Step 2: Define your services stack with a docker-compose file
Docker Compose
containers:
web:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis
environment:
- PYTHONUNBUFFERED=1
redis:
image: redis:latest
command: redis-server --appendonly yes
How to build it?
• Step 3: Test the applications locally from your laptop using containers
How to build it?
How to build it?
• Step 4: Provision your remote servers and deploy your containers
How to build it?
How to build it?
• Step 5: Scale your services with Docker swarm
DEMO
source code: https://github.com/rogaha/data-processing-pipeline
Open Source Projects Used
• Docker (https://github.com/docker/docker)
• An open platform for distributed applications for developers and sysadmins
• Apache Spark / Spark SQL (https://github.com/apache/spark)
• A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD)
• Apache Kafka (https://github.com/apache/kafka)
• A fast and scalable pub-sub messaging service
• Apache Zookeeper (https://github.com/apache/zookeeper)
• A distributed configuration service, synchronization service, and naming registry for large distributed systems
• Apache Cassandra (https://github.com/apache/cassandra)
• Scalable, high-available and distributed columnar NoSQL database
• D3 (https://github.com/mbostock/d3)
• A JavaScript visualization library for HTML and SVG.
Thanks!
Questions?
@rhashioka

More Related Content

PDF
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
PDF
Ansible @ Red Hat | December 2015 Ansible Meetup in Melbourne
PDF
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
PDF
The 2nd half. Scaling to the next^2
PDF
Kubernetes 101 and Fun
PDF
DevEx | there’s no place like k3s
PDF
DCEU 18: Docker Container Networking
PDF
Red hat cloud platforms
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Ansible @ Red Hat | December 2015 Ansible Meetup in Melbourne
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
The 2nd half. Scaling to the next^2
Kubernetes 101 and Fun
DevEx | there’s no place like k3s
DCEU 18: Docker Container Networking
Red hat cloud platforms

What's hot (20)

PPTX
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
PDF
Git ops & Continuous Infrastructure with terra*
PDF
Setup Hybrid Clusters Using Kubernetes Federation
PDF
Cloud infrastructure as code
PDF
Cloud Native Unleashed
PDF
Scaling i/o bound Microservices
PDF
Die große Cloud-native FaaS-Hitparade
PPTX
使用 Prometheus 監控 Kubernetes Cluster
PDF
Gitlab ci, cncf.sk
PDF
Terraform Code Reviews: Supercharged with Conftest
PDF
Zero downtime deployment of micro-services with Kubernetes
PDF
Kubernetes or OpenShift - choosing your container platform for Dev and Ops
PPTX
Introduction to Kubernetes
PDF
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
PDF
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
PDF
DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...
PDF
Building streaming applications using a managed Kafka service | DevNation Tec...
PDF
The Big Cloud native FaaS Lebowski
PDF
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
PDF
Kubernetes extensibility: crd & operators
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Git ops & Continuous Infrastructure with terra*
Setup Hybrid Clusters Using Kubernetes Federation
Cloud infrastructure as code
Cloud Native Unleashed
Scaling i/o bound Microservices
Die große Cloud-native FaaS-Hitparade
使用 Prometheus 監控 Kubernetes Cluster
Gitlab ci, cncf.sk
Terraform Code Reviews: Supercharged with Conftest
Zero downtime deployment of micro-services with Kubernetes
Kubernetes or OpenShift - choosing your container platform for Dev and Ops
Introduction to Kubernetes
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...
Building streaming applications using a managed Kafka service | DevNation Tec...
The Big Cloud native FaaS Lebowski
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
Kubernetes extensibility: crd & operators
Ad

Similar to TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra (20)

PPTX
Weave User Group Talk - DockerCon 2017 Recap
PDF
PyconUK-2015
PDF
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
PDF
Docker at Djangocon 2013 | Talk by Ken Cochrane
PDF
Django and Docker
PPTX
The challenge of application distribution - Introduction to Docker (2014 dec ...
PDF
Getting Started with Docker - Nick Stinemates
PDF
Docker - From Walking To Running
PDF
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
PDF
presentation @ docker meetup
PDF
Docker_AGH_v0.1.3
PPTX
Programming the world with Docker
PPTX
Docker for everything
PDF
Containers, Docker, and Microservices: the Terrific Trio
PDF
Docker Tips And Tricks at the Docker Beijing Meetup
PDF
Docker Introduction
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
PDF
Docker-v3.pdf
PPTX
Duke Docker Day 2014: Research Applications with Docker
Weave User Group Talk - DockerCon 2017 Recap
PyconUK-2015
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
Docker at Djangocon 2013 | Talk by Ken Cochrane
Django and Docker
The challenge of application distribution - Introduction to Docker (2014 dec ...
Getting Started with Docker - Nick Stinemates
Docker - From Walking To Running
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
presentation @ docker meetup
Docker_AGH_v0.1.3
Programming the world with Docker
Docker for everything
Containers, Docker, and Microservices: the Terrific Trio
Docker Tips And Tricks at the Docker Beijing Meetup
Docker Introduction
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker-v3.pdf
Duke Docker Day 2014: Research Applications with Docker
Ad

More from The Incredible Automation Day (20)

PDF
A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
PDF
Docker in real life and in the Cloud - TIAD Camp Docker
PDF
Orchestrating Docker in production - TIAD Camp Docker
PDF
Monitoring in 2017 - TIAD Camp Docker
PDF
Strategy, planning and governance for enterprise deployments of containers - ...
PPTX
Cluster SQL - TIAD Camp Microsoft Cloud Readiness
PPTX
Build the VPC - TIAD Camp Microsoft Cloud Readiness
PPTX
Opening Keynote - TIAD Camp Microsoft Cloud Readiness
PPTX
Replatforming - TIAD Camp Microsoft Cloud Readiness
PPTX
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
PPTX
Active Directory - TIAD Camp Microsoft Cloud Readiness
PPTX
Application Stack - TIAD Camp Microsoft Cloud Readiness
PPTX
Keynote TIAD Camp Serverless
PPTX
From AIX to Zero-ops by Pierre Baillet
PDF
Serverless low cost analytics by Adways y Audric Guigon
PPTX
Operationnal challenges behind Serverless architectures by Laurent Bernaille
PDF
Build chatbots with api.ai and Google cloud functions
PDF
Real time serverless data pipelines on AWS
PPTX
PPTX
TIAD 2016 - Beyond windowsautomation
A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
Docker in real life and in the Cloud - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
Strategy, planning and governance for enterprise deployments of containers - ...
Cluster SQL - TIAD Camp Microsoft Cloud Readiness
Build the VPC - TIAD Camp Microsoft Cloud Readiness
Opening Keynote - TIAD Camp Microsoft Cloud Readiness
Replatforming - TIAD Camp Microsoft Cloud Readiness
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
Active Directory - TIAD Camp Microsoft Cloud Readiness
Application Stack - TIAD Camp Microsoft Cloud Readiness
Keynote TIAD Camp Serverless
From AIX to Zero-ops by Pierre Baillet
Serverless low cost analytics by Adways y Audric Guigon
Operationnal challenges behind Serverless architectures by Laurent Bernaille
Build chatbots with api.ai and Google cloud functions
Real time serverless data pipelines on AWS
TIAD 2016 - Beyond windowsautomation

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Chapter 3 Spatial Domain Image Processing.pdf
Machine Learning_overview_presentation.pptx
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
sap open course for s4hana steps from ECC to s4
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Assigned Numbers - 2025 - Bluetooth® Document
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

  • 1. Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
  • 2. Personal Information • Roberto Gandolfo Hashioka • @rogaha (Github) e @rhashioka (Twitter) • Finance -> Software Engineer • Growth & Data Engineer at Docker
  • 3. Summary • Background / Motivation • Project Goals • How to build it? • DEMO
  • 4. Background • Gather of data from multiple sources and process them in “real-time” • Transform raw data into meaningful and useful information used to enable more effective decision-making process • Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities for future investments • Data transparency and standardization
  • 5. Project Goals • Create a data processing pipeline that can handle a huge amount of events per second • Automate the development environment — Docker compose. • Automate the remote machines management — Docker for AWS / Machine. • Reduce the time to market / time to development — New hires / new features.
  • 7. How to build it? • Step 1: Install Docker for Mac/Win and dockerize all the applications link: https://www.docker.com/products/docker
  • 8. Exemplo de Dockerfile ----------------------------------------------------------------------------------------------------------- FROM ubuntu:14.04 MAINTAINER Roberto Hashioka (roberto@docker.com) RUN apt-get update && apt-get install -y nginx RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html EXPOSE 80 ------------------------------------------------------------------------------------------------------------ $ docker build –t rogaha/web_demotiad2016 . $ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
  • 9. How to build it? • Step 2: Define your services stack with a docker-compose file
  • 10. Docker Compose containers: web: build: . command: python app.py ports: - "5000:5000" volumes: - .:/code links: - redis environment: - PYTHONUNBUFFERED=1 redis: image: redis:latest command: redis-server --appendonly yes
  • 11. How to build it? • Step 3: Test the applications locally from your laptop using containers
  • 13. How to build it? • Step 4: Provision your remote servers and deploy your containers
  • 15. How to build it? • Step 5: Scale your services with Docker swarm
  • 17. Open Source Projects Used • Docker (https://github.com/docker/docker) • An open platform for distributed applications for developers and sysadmins • Apache Spark / Spark SQL (https://github.com/apache/spark) • A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD) • Apache Kafka (https://github.com/apache/kafka) • A fast and scalable pub-sub messaging service • Apache Zookeeper (https://github.com/apache/zookeeper) • A distributed configuration service, synchronization service, and naming registry for large distributed systems • Apache Cassandra (https://github.com/apache/cassandra) • Scalable, high-available and distributed columnar NoSQL database • D3 (https://github.com/mbostock/d3) • A JavaScript visualization library for HTML and SVG.