SlideShare a Scribd company logo
Microservices vs
Hadoop ecosystem
Marton Elek
2017 february
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservice definition
”An approach to developing a single application as a
 suite of small services, each running in its own process
 and communicating with lightweight mechanisms, often an HTTP resource
API.
 These services are built around business capabilities and independently
deployable by fully automated deployment machinery.”
– https://martinfowler.com/articles/microservices.html
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hadoop cluster
 The definition is almost true for a Hadoop cluster as well
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Dockerized Hadoop cluster
 How can we use the tools from microservice architecture in hadoop
ecosystem?
 A possible approach to install cluster (hadoop, spark, kafka, hive) based on
– separated docker containers
– Smart configuration management (using well-known tooling from microservices
architectures)
 Goal: rapid prototyping platform
 Easy switch between
– versions (official HDP, snapshot build, apache build)
– configuration (ha, kerberos, metrics, htrace…)
 Developers/Ops tool
– Easy != easy for any user without knowledge about the tool
 Not goal:
– replace current management plaforms (eg. Ambari)
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What are the Microservices (Theory)
Collection of patterns/best practices
 II. Dependencies
– Explicitly declare and isolate dependencies
 III. Config
– Store config in the environment
 VI. Processes
– Execute the app as one or more stateless processes
 VIII. Concurrency
– Scale out via the process model
 XII. Admin processes
– Run admin/management tasks as one-off processes
12 Factory apps (http://12factor.net)
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What are the Microservices (Practice)
 Spring started as a
– Dependency injection framework
 Spring Boot ecosystem
– Easy to use starter projects
– Lego bricks for various problems
• JDBC access
• Database access
• REST
• Health check
 Spring Cloud -- elements to build microservices (based on Netflix stack)
– API gateway
– Service registry
– Configuration server
– Distributed tracing
– Client side load balancing
public class TimeStarter {
@Autowired
TimeService timerService;
public Date now() {
long timeService = timerService.now();
}
}
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservices with Spring Cloud
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monolith application
 Monolith but modular application example
auth service
timer service
upload service
report service
Rest call
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monolith application
 Monolith but modular application example
auth service
timer service
upload service
report service
Rest call
@EnableAutoConfiguration
@RestController
@ComponentScan
public class TimeStarter {
@Autowired
TimeService timerService;
@RequestMapping("/now")
public Date now() {
return timerService.now();
}
public static void main(String[] args) {
SpringApplication.run(TimeStarter.class, args);
}
}
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservice version
 First problem: how can we find the right backend port form the frontend?
auth service
timer service
upload service
report service
Rest call
Rest call
Rest call
Rest call
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Solution: API Gateway
 First problem: how can we find the right backend port form the frontend?
auth service
timer service
upload service
report service
API gateway
Rest call
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
API Gateway
 Goals: Hide available microservices behind a service facade pattern
– Routing, Authorization
– Deployment handling, Canary testing, Blue/Green deployment
– Logging, SLA, Auditing
 Implementation examples:
– Spring cloud Api Gateway (based on Netflix Zuul)
– Netflix Zuul based implementation
– Twitter Finagle based implementation
– Amazon API gateway
– Simple Nginx reverse proxy configuration
– Traefik, Kong
 Usage in Hadoop ecosystem
– For prototyping: Only if the scheduler/orchestrator starts the service on a random host
– For security: Apache Knox
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry
 Problem: how to configure API gateway to automatically route to all the
services
auth service
timer service
upload service
report service
API gateway
Rest call
?
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry
 Solution: Use service registry
– Components should be registered to the service registry automatically
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry
 Goal: Store the location and state of the available services
– Health check
– DNS interface
 Implementation examples:
– Spring cloud: Eureka
– Netflix eureka based implementation
– Consul.io
– etcd, zookeeper
– Simple workaround: DNS or hosts file
 Usage in Hadoop ecosystem
– Most of the components needs info about the location of nameserver(s) and other
master components
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Configuration server
 Problem: how can we configure multiple components
– ”Store config in the environment” (12factor)
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
Config
?
Config
?
Config
?
Config
?
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Configuration server
 Problem: how can we configure multiple components
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
Configuration
Config server
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Config server
 Goals: One common place for all of the configuration
– Versioning
– Auditing
– Multiple environment support: Use (almost) the same configuration from DEV to PROD
environment
– Solution for sensitive data
 Solution examples:
– Spring Cloud config service
– Zookeeper
– Most of the service registry have key->value store (Consul, etcd)
– Any persistence datastore (But the versioning is a question)
 For Hadoop ecosystem:
– Most painful point: the same configuration elements (eg. core-site.xml) is needed at
multiple location
– Ambari and other management tools try to solve the problem (but not with the focus of
rapid prototyping)
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Config server – configuration management
 Config server structure: [branch]/name-profile.extension
 Merge properties for name=timer and profile(environment)=dev
 URL from the config server
– http://config:8888/timer-dev.properties
• server.port=6767
• aws.secret.key=zzz
• exit.code=-23
 Local file system structure (master branch)
– timer.properties
• server.port=6767
– dev.properties
• aws.secret.key=xxx
– application.properties
• exit.code=-23
Config server
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Summary
 Tools used in microservice architecture
 Key components:
– Config server
– Service registry
– API gateway
 Configuration server
– Versioning
– One common place to distribute configuration
– Configuration preprocessing!!!
• transformation
• the content of the configuration should be defined, it could be format
independent
• But the final configuration should be visible
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Docker based Hadoop cluster
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Configuration server
2. Service registry
3. API gatway
Microservice architecture elements
How to do it with Hadoop?
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Configuration server
2. Service registry
3. API gatway
4. +1 Packaging
Microservice architecture elements
Do it with Hadoop
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Packaging: Docker
 Packaging: Docker
– Docker Engine:
• a portable,
• lightweight runtime and
• packaging tool
– Docker Hub,
• a cloud service for sharing applications
– Docker Compose:
• Predefined recipes (environment variables, network, …)
 My docker containers: http://hub.docker.com/elek/
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Docker decisions
 One application per container
– More flexible
– More simple (configuration preprocess + start)
– One deployable unit
 Microservice-like: prefer more similar units against smaller but bigger one
 Using host network for clusters
10.8.0.5
172.13.0.1
172.13.0.5
172.13.0.2
10.8.0.6
172.13.0.3
172.13.0.4
172.13.0.9
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.6
10.8.0.6
10.8.0.6
10.8.0.6
Host networkBridge network
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Repositories
 elek/bigdata-docker:
– example configuration
– docker-compose files
– ansible scripts
– getting started
entrypoint
 elek/docker-bigdata-base (base image for all the containers)
– Contains all the configuration loading (and some documentation)
– Use CONFIG_TYPE environment variable to select configuration method
• CONFIG_TYPE=simple (configuration from environment variables – for local env)
• CONFIG_TYPE=consul (configuration from consul – for distributed environment)
 elek/docker-…. (hadoop/spark/hive/...)
– Docker images for the components
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Local demo
 Local run, using host network
– More configuration is needed
– Auto scaling is supported
– https://github.com/elek/bigdata-docker/tree/master/compose
bridge network
172.13.0.1
172.13.0.5
172.13.0.2
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Packaging
2. Configuration server
3. Service registry
4. API gateway
Components
Do it with Hadoop
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry/configuration server
 Service registry
– Health check support
– DNS support
 Key-value store
– Binary data is supported
 Based on agents and servers
 Easy to use REST API
 RAFT based consensus protocol
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry/configuration server
 Git2Consul
– Mirror git repositories to
consul
 Consul template
– Advanced Template engine
– Renders a template
(configuration file) based on
the information from the
consul
– Run/restart a process on
change
 Registrator
– Listen on docker event
stream
– Register new components to
consul
hdfs-namenode
Consul
Configuration (git)
datanode
datanode
datanode
hdfs-datanode
consul-template
git2consul
Registrator
docker event
stream
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Weave scope
 Agents to monitor
– network connections between components
– cpu
– memory
 Supports Docker, Swarm, Weave network, …
 Easy install
 Transparent
 Pluggable
 Only problems:
– Temporary docker containers
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Distributed demo
 Distributed run with host network
– https://github.com/elek/bigdata-docker/tree/master/consul
– Configuration is hosted in a consul instance
– Dynamic update
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.5
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
TODO
 More profiles and configuration set
– Ready to use kerberos/HA environments
– On the fly keytab/keystore generation (?)
 Scripting/tool improvement
– Autorestart in case of service registration change
 Configuration for more orcherstration/scheduling
– Nomad?
– Docker Swarm?
 Easy image creation for specific builds
 Improve docker images
– Predefined volume/port definition
– Consolidate default values
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank You

More Related Content

PDF
Getting Started with Kubernetes
PDF
Jenkins를 활용한 Openshift CI/CD 구성
PDF
Best practices for Terraform with Vault
PDF
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
ODP
DevOps @ OpenShift Online
PDF
What Is Helm
PDF
YAML Tips For Kubernetes by Neependra Khare
PDF
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Getting Started with Kubernetes
Jenkins를 활용한 Openshift CI/CD 구성
Best practices for Terraform with Vault
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
DevOps @ OpenShift Online
What Is Helm
YAML Tips For Kubernetes by Neependra Khare
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...

What's hot (20)

PPTX
Introduction to helm
PPTX
Intro to Helm for Kubernetes
PPTX
Dockers and containers basics
PPTX
Cloud Computing_Unit 1- Part 2.pptx
PDF
Knative로 서버리스 워크로드 구현
PPTX
API Management for Software Defined Network (SDN)
PDF
CI/CD with Github Actions
PPTX
Kubernetes
PPTX
Ansible Tutorial For Beginners | What Is Ansible And How It Works? | Ansible ...
PPTX
SignalR Overview
PPTX
Android AIDL Concept
PDF
Kubernetes: A Short Introduction (2019)
PPTX
Dangling DNS records takeover at scale
PDF
Kubernetes 101
PPTX
Kubernetes workshop
ODP
ansible why ?
ODP
Introduction to Ansible
PPTX
Kubernetes Helm: Why It Matters
PPTX
Kubernetes best practices with GKE
PPTX
MSA ( Microservices Architecture ) 발표 자료 다운로드
Introduction to helm
Intro to Helm for Kubernetes
Dockers and containers basics
Cloud Computing_Unit 1- Part 2.pptx
Knative로 서버리스 워크로드 구현
API Management for Software Defined Network (SDN)
CI/CD with Github Actions
Kubernetes
Ansible Tutorial For Beginners | What Is Ansible And How It Works? | Ansible ...
SignalR Overview
Android AIDL Concept
Kubernetes: A Short Introduction (2019)
Dangling DNS records takeover at scale
Kubernetes 101
Kubernetes workshop
ansible why ?
Introduction to Ansible
Kubernetes Helm: Why It Matters
Kubernetes best practices with GKE
MSA ( Microservices Architecture ) 발표 자료 다운로드
Ad

Viewers also liked (20)

PDF
Deep learning - Part I
PDF
Deep learning and Apache Spark
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
PPTX
Deep learning Tutorial - Part II
PPTX
Ansible + Hadoop
PDF
Introduction to Deep Learning (NVIDIA)
PPTX
Top 5 Deep Learning Stories 2/24
PPTX
Tugas 4 0317-imelda felicia-1412510545
PPTX
Top 5 Strategies for Retail Data Analytics
PPTX
Dynamic Column Masking and Row-Level Filtering in HDP
PPTX
A Multi Colored YARN
PPTX
Edw Optimization Solution
PDF
2015 Internet Trends Report
PDF
Hortonworks Technical Workshop - Operational Best Practices Workshop
PDF
Web engineering notes unit 3
PDF
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
PDF
Real-time Analytics in Financial: Use Case, Architecture and Challenges
PPTX
SQL Server on Linux - march 2017
PPTX
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
PDF
How to Become a Thought Leader in Your Niche
Deep learning - Part I
Deep learning and Apache Spark
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep learning Tutorial - Part II
Ansible + Hadoop
Introduction to Deep Learning (NVIDIA)
Top 5 Deep Learning Stories 2/24
Tugas 4 0317-imelda felicia-1412510545
Top 5 Strategies for Retail Data Analytics
Dynamic Column Masking and Row-Level Filtering in HDP
A Multi Colored YARN
Edw Optimization Solution
2015 Internet Trends Report
Hortonworks Technical Workshop - Operational Best Practices Workshop
Web engineering notes unit 3
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Real-time Analytics in Financial: Use Case, Architecture and Challenges
SQL Server on Linux - march 2017
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
How to Become a Thought Leader in Your Niche
Ad

Similar to Micro services vs hadoop (20)

PPTX
Microservices deck
PDF
Building Microservices Software practics
PPTX
Springboot Microservices
ODP
Developing Microservices using Spring - Beginner's Guide
PDF
Microservices on a budget meetup
PDF
Full lifecycle of a microservice
PPTX
Ultimate Guide to Microservice Architecture on Kubernetes
PDF
Microservices - not just with Java
PDF
Spring Microservices In Action 1st Edition John Carnell
PDF
Microservices architecture: practical aspects
PPTX
Microservice architecture
PDF
Microservices Interview Questions and Answers PDF By ScholarHat
ODP
Microservices Patterns and Anti-Patterns
PDF
The Need of Cloud-Native Application
PPTX
Microservices pros and cons
PPTX
Intro to spring cloud &microservices by Eugene Hanikblum
PDF
Микросервисы со Spring Boot & Spring Cloud
PDF
Production-Ready_Microservices_excerpt.pdf
PDF
Resilient Microservices with Spring Cloud
PPTX
Event Bus as Backbone for Decoupled Microservice Choreography - Lecture and W...
Microservices deck
Building Microservices Software practics
Springboot Microservices
Developing Microservices using Spring - Beginner's Guide
Microservices on a budget meetup
Full lifecycle of a microservice
Ultimate Guide to Microservice Architecture on Kubernetes
Microservices - not just with Java
Spring Microservices In Action 1st Edition John Carnell
Microservices architecture: practical aspects
Microservice architecture
Microservices Interview Questions and Answers PDF By ScholarHat
Microservices Patterns and Anti-Patterns
The Need of Cloud-Native Application
Microservices pros and cons
Intro to spring cloud &microservices by Eugene Hanikblum
Микросервисы со Spring Boot & Spring Cloud
Production-Ready_Microservices_excerpt.pdf
Resilient Microservices with Spring Cloud
Event Bus as Backbone for Decoupled Microservice Choreography - Lecture and W...

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Miokarditis (Inflamasi pada Otot Jantung)
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
STUDY DESIGN details- Lt Col Maksud (21).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Major-Components-ofNKJNNKNKNKNKronment.pptx
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Moving the Public Sector (Government) to a Digital Adoption
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data

Micro services vs hadoop

  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Microservice definition ”An approach to developing a single application as a  suite of small services, each running in its own process  and communicating with lightweight mechanisms, often an HTTP resource API.  These services are built around business capabilities and independently deployable by fully automated deployment machinery.” – https://martinfowler.com/articles/microservices.html
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hadoop cluster  The definition is almost true for a Hadoop cluster as well
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Dockerized Hadoop cluster  How can we use the tools from microservice architecture in hadoop ecosystem?  A possible approach to install cluster (hadoop, spark, kafka, hive) based on – separated docker containers – Smart configuration management (using well-known tooling from microservices architectures)  Goal: rapid prototyping platform  Easy switch between – versions (official HDP, snapshot build, apache build) – configuration (ha, kerberos, metrics, htrace…)  Developers/Ops tool – Easy != easy for any user without knowledge about the tool  Not goal: – replace current management plaforms (eg. Ambari)
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What are the Microservices (Theory) Collection of patterns/best practices  II. Dependencies – Explicitly declare and isolate dependencies  III. Config – Store config in the environment  VI. Processes – Execute the app as one or more stateless processes  VIII. Concurrency – Scale out via the process model  XII. Admin processes – Run admin/management tasks as one-off processes 12 Factory apps (http://12factor.net)
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What are the Microservices (Practice)  Spring started as a – Dependency injection framework  Spring Boot ecosystem – Easy to use starter projects – Lego bricks for various problems • JDBC access • Database access • REST • Health check  Spring Cloud -- elements to build microservices (based on Netflix stack) – API gateway – Service registry – Configuration server – Distributed tracing – Client side load balancing public class TimeStarter { @Autowired TimeService timerService; public Date now() { long timeService = timerService.now(); } }
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Microservices with Spring Cloud
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monolith application  Monolith but modular application example auth service timer service upload service report service Rest call
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monolith application  Monolith but modular application example auth service timer service upload service report service Rest call @EnableAutoConfiguration @RestController @ComponentScan public class TimeStarter { @Autowired TimeService timerService; @RequestMapping("/now") public Date now() { return timerService.now(); } public static void main(String[] args) { SpringApplication.run(TimeStarter.class, args); } }
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Microservice version  First problem: how can we find the right backend port form the frontend? auth service timer service upload service report service Rest call Rest call Rest call Rest call
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Solution: API Gateway  First problem: how can we find the right backend port form the frontend? auth service timer service upload service report service API gateway Rest call
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved API Gateway  Goals: Hide available microservices behind a service facade pattern – Routing, Authorization – Deployment handling, Canary testing, Blue/Green deployment – Logging, SLA, Auditing  Implementation examples: – Spring cloud Api Gateway (based on Netflix Zuul) – Netflix Zuul based implementation – Twitter Finagle based implementation – Amazon API gateway – Simple Nginx reverse proxy configuration – Traefik, Kong  Usage in Hadoop ecosystem – For prototyping: Only if the scheduler/orchestrator starts the service on a random host – For security: Apache Knox
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry  Problem: how to configure API gateway to automatically route to all the services auth service timer service upload service report service API gateway Rest call ?
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry  Solution: Use service registry – Components should be registered to the service registry automatically auth service timer service upload service report service Rest call Service registry API gateway
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry  Goal: Store the location and state of the available services – Health check – DNS interface  Implementation examples: – Spring cloud: Eureka – Netflix eureka based implementation – Consul.io – etcd, zookeeper – Simple workaround: DNS or hosts file  Usage in Hadoop ecosystem – Most of the components needs info about the location of nameserver(s) and other master components
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Configuration server  Problem: how can we configure multiple components – ”Store config in the environment” (12factor) auth service timer service upload service report service Rest call Service registry API gateway Config ? Config ? Config ? Config ?
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Configuration server  Problem: how can we configure multiple components auth service timer service upload service report service Rest call Service registry API gateway Configuration Config server
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Config server  Goals: One common place for all of the configuration – Versioning – Auditing – Multiple environment support: Use (almost) the same configuration from DEV to PROD environment – Solution for sensitive data  Solution examples: – Spring Cloud config service – Zookeeper – Most of the service registry have key->value store (Consul, etcd) – Any persistence datastore (But the versioning is a question)  For Hadoop ecosystem: – Most painful point: the same configuration elements (eg. core-site.xml) is needed at multiple location – Ambari and other management tools try to solve the problem (but not with the focus of rapid prototyping)
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Config server – configuration management  Config server structure: [branch]/name-profile.extension  Merge properties for name=timer and profile(environment)=dev  URL from the config server – http://config:8888/timer-dev.properties • server.port=6767 • aws.secret.key=zzz • exit.code=-23  Local file system structure (master branch) – timer.properties • server.port=6767 – dev.properties • aws.secret.key=xxx – application.properties • exit.code=-23 Config server
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Summary  Tools used in microservice architecture  Key components: – Config server – Service registry – API gateway  Configuration server – Versioning – One common place to distribute configuration – Configuration preprocessing!!! • transformation • the content of the configuration should be defined, it could be format independent • But the final configuration should be visible
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Docker based Hadoop cluster
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  bin – hdfs – yarn – mapred  etc/hadoop – core-site.xml – mapred-site.xml – hdfs-site.xml  include  lib  libexec  sbin  share apache-hadoop-X.X.tar.gz 1. Configuration server 2. Service registry 3. API gatway Microservice architecture elements How to do it with Hadoop?
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  bin – hdfs – yarn – mapred  etc/hadoop – core-site.xml – mapred-site.xml – hdfs-site.xml  include  lib  libexec  sbin  share apache-hadoop-X.X.tar.gz 1. Configuration server 2. Service registry 3. API gatway 4. +1 Packaging Microservice architecture elements Do it with Hadoop
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Packaging: Docker  Packaging: Docker – Docker Engine: • a portable, • lightweight runtime and • packaging tool – Docker Hub, • a cloud service for sharing applications – Docker Compose: • Predefined recipes (environment variables, network, …)  My docker containers: http://hub.docker.com/elek/
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Docker decisions  One application per container – More flexible – More simple (configuration preprocess + start) – One deployable unit  Microservice-like: prefer more similar units against smaller but bigger one  Using host network for clusters 10.8.0.5 172.13.0.1 172.13.0.5 172.13.0.2 10.8.0.6 172.13.0.3 172.13.0.4 172.13.0.9 10.8.0.5 10.8.0.5 10.8.0.5 10.8.0.5 10.8.0.6 10.8.0.6 10.8.0.6 10.8.0.6 Host networkBridge network
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Repositories  elek/bigdata-docker: – example configuration – docker-compose files – ansible scripts – getting started entrypoint  elek/docker-bigdata-base (base image for all the containers) – Contains all the configuration loading (and some documentation) – Use CONFIG_TYPE environment variable to select configuration method • CONFIG_TYPE=simple (configuration from environment variables – for local env) • CONFIG_TYPE=consul (configuration from consul – for distributed environment)  elek/docker-…. (hadoop/spark/hive/...) – Docker images for the components
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Local demo  Local run, using host network – More configuration is needed – Auto scaling is supported – https://github.com/elek/bigdata-docker/tree/master/compose bridge network 172.13.0.1 172.13.0.5 172.13.0.2
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  bin – hdfs – yarn – mapred  etc/hadoop – core-site.xml – mapred-site.xml – hdfs-site.xml  include  lib  libexec  sbin  share apache-hadoop-X.X.tar.gz 1. Packaging 2. Configuration server 3. Service registry 4. API gateway Components Do it with Hadoop
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry/configuration server  Service registry – Health check support – DNS support  Key-value store – Binary data is supported  Based on agents and servers  Easy to use REST API  RAFT based consensus protocol
  • 30. 30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry/configuration server  Git2Consul – Mirror git repositories to consul  Consul template – Advanced Template engine – Renders a template (configuration file) based on the information from the consul – Run/restart a process on change  Registrator – Listen on docker event stream – Register new components to consul hdfs-namenode Consul Configuration (git) datanode datanode datanode hdfs-datanode consul-template git2consul Registrator docker event stream
  • 31. 31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Weave scope  Agents to monitor – network connections between components – cpu – memory  Supports Docker, Swarm, Weave network, …  Easy install  Transparent  Pluggable  Only problems: – Temporary docker containers
  • 32. 32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Distributed demo  Distributed run with host network – https://github.com/elek/bigdata-docker/tree/master/consul – Configuration is hosted in a consul instance – Dynamic update 10.8.0.5 10.8.0.5 10.8.0.5 10.8.0.5
  • 33. 33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved TODO  More profiles and configuration set – Ready to use kerberos/HA environments – On the fly keytab/keystore generation (?)  Scripting/tool improvement – Autorestart in case of service registration change  Configuration for more orcherstration/scheduling – Nomad? – Docker Swarm?  Easy image creation for specific builds  Improve docker images – Predefined volume/port definition – Consolidate default values
  • 34. 34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank You