SlideShare a Scribd company logo
SMACK Stack 1.1
Elodina is a big data as a service platform built on top
of open source software.
The Elodina platform solves today’s data
analytics needs by providing the tools and
support necessary to utilize open source
technologies.
http://www.elodina.net/
Whats SMACK Stack?
SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and
Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https:
//www.google.com/webhp?q=smack%20stack
Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic
compute, micro services, orchestration, micro segmentation all part of what you
can do now with Streaming, Mesos, Analytics, Cassandra and Kafka
The free lunch is over!
http://www.gotw.ca/publications/concurrency-ddj.htm
Many industries still don’t get it
XML is everywhere but we have alternatives!
We can support XML interface but don’t have to take on the burden of the extra
data. You can save A LOT of overheard just by having a pre-processing step
taking the XML, turning it into Avro and processing and storing that.
It works https://github.com/elodina/xml-avro
You can even process the response in Avro but return the result in XML, more on
that later though!
You need to be running Mesos. Lots of options here!
What is most important is that you abstract your “Provider” from your “Grid”.
What is “The Grid”?
It is your PaaS layer you deploy too that runs your software. (aka your new
awesome super computer)
The grid is your mesos cluster. You are likely going to have more than one so plan
accordingly. Think of it as immutable infrastructure, the computer does.
Step 1
“Provider” of compute resources
The Grid … 2.0 ...
https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md
Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU,
memory, storage, and other compute resources away from machines (physical or virtual), enabling
fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data
Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter
or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of
deploying applications, services, and big data infrastructure on shared resources. DCOS is based on
Apache Mesos and includes a distributed systems kernel with enterprise-grade security.
SMACK Stack 1.1
SMACK Stack 1.1
Data Center Optimization!
SMACK Stack 1.1
But there is more!
● Provisioning
● Micro Segmentation
● Orchestration
● Configuration Management
● Service Discovery
● Deployment Isolation and Identification
● Telemetry, Tracing, Ops Stuff, Etc
● Oh My!
It boils back down into stacks! https://github.com/elodina/stack-deploy and how
you are working with your schedulers in your cluster ultimatlly.
Stack Deploy to the rescue!
SMACK Stack 1.1
In the Grid you need Schedulers!
● Kafka – Producer/Consumer-based message queue management
● Exhibitor – Supervisor for distributed persistence (like ZooKeeper)
● Cassandra/DSE – HA, scalable, distributed NoSQL data storage
● Storm – Topology-based Real-time distributed data streaming
● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository
● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos
● HDFS – Configure, launch and manage HDFS on Mesos (coming soon)
● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now)
● MirrorMaker – Consumer to make a mirror copy of data to destination
● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers
● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers
https://github.com/elodina/
SMACK Stack 1.1
Virtual Telemetry “Data Center” In the Grid
ZipkinQATeamBuild92
● 1x Exhibitor-Mesos
● 1x Exhibitor
● 1x DSE-Mesos
● 1x Cassandra node
● 1x Kafka-Mesos
● 1x Kafka 0.8 broker
● 1x Zipkin-Mesos
● 1x Zipkin Collector
● 1x Zipkin Query
● 1x Zipkin Web
“cluster”
“zone”
“Stack” - defaultSimpleZipkinFull
“data center”
Stack Deploy In Action
./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter
./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc
./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster
./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster
./stack-deploy add --file stacks/cassandra.stack
./stack-deploy run cassandra --zone cassandra_zone1
SMACK Stack 1.1
SMACK Stack 1.1
SMACK Stack 1.1
SMACK Stack 1.1
Full Stack Deployments
SMACK Stack 1.1
Cassandra
Cassandra Multi DC
SMACK Stack 1.1
SMACK Stack 1.1
Casandra https://github.com/elodina/datastax-enterprise-mesos
SMACK Stack 1.1
Start your nodes!
SMACK Stack 1.1
Apache Kafka
• Apache Kafka
o http://kafka.apache.org
• Apache Kafka Source Code
o https://github.com/apache/kafka
• Documentation
o http://kafka.apache.org/documentation.html
• Wiki
o https://cwiki.apache.org/confluence/display/KAFKA/Index
It often starts with just one data pipeline
Reuse of data pipelines for new producers
Reuse of existing providers for new consumers
Eventually the solution becomes the problem
Kafka decouples data-pipelines
SMACK Stack 1.1
Topics & Partitions
A high-throughput distributed messaging system
rethought as a distributed commit log.
Intra Cluster Replication
Mesos Kafka http://github.com/mesos/kafka
SMACK Stack 1.1
SMACK Stack 1.1
SMACK Stack 1.1
SMACK Stack 1.1
Streaming & Analytics
● The landscape of streaming is about to get more fragmented and harder to
navigate. This is not all bad news and it is not much different than where we
were with NoSQL 6 years ago or so.
● Different systems are getting really (really (really)) good at different things.
○ Dag based systems
○ Event based systems
○ Query & Execution Engines
○ Streaming Engines
○ Etc!
GearPump
SMACK Stack 1.1
Airflow
Spring Cloud Data Flow
Storm (and Storm Topology based systems)
Storm Nimbus
{
"id": "storm-nimbus",
"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk:
//zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology.
mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c
topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws.
com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs",
"cpus": 1.0,
"mem": 1024,
"ports": [31056],
"requirePorts": true,
"instances": 1,
"uris": [
"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",
"http://repo.elodina.s3.amazonaws.com/storm.yaml"
]
}
Storm UI
{
"id": "storm-ui",
"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus.
host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs",
"cpus": 0.2,
"mem": 512,
"ports": [31067],
"requirePorts": true,
"instances": 1,
"uris": [
"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",
"http://repo.elodina.s3.amazonaws.com/storm.yaml"
],
"healthChecks": [
{
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 120,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
}
]
}
Storm Kafka - new spouts & bolts for Kafka 8, 9, ...
Apache Kafka Streams
SMACK Stack 1.1
Go Kafka Client - Fan Out Processing
https://github.com/elodina/go-kafka-client-mesos
● Dynamic Kafka Log workers
● Blue/Green Deploy Support
● Fan Out Processing
● Auditable
● Batches
● Scalable/Auto-Scalable
Questions?
http://www.elodina.net

More Related Content

PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
PPTX
Intro to Apache Spark
PDF
Getting Started Running Apache Spark on Apache Mesos
PPTX
Kafka Lambda architecture with mirroring
PDF
Reactive dashboard’s using apache spark
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
PDF
Reactive app using actor model & apache spark
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Intro to Apache Spark
Getting Started Running Apache Spark on Apache Mesos
Kafka Lambda architecture with mirroring
Reactive dashboard’s using apache spark
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Reactive app using actor model & apache spark

What's hot (20)

PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
PDF
The How and Why of Fast Data Analytics with Apache Spark
PDF
Tachyon and Apache Spark
PDF
Sa introduction to big data pipelining with cassandra & spark west mins...
PDF
Fully fault tolerant real time data pipeline with docker and mesos
PDF
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
PDF
Feeding Cassandra with Spark-Streaming and Kafka
PPTX
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
PDF
Lambda architecture
PDF
Using the SDACK Architecture to Build a Big Data Product
PDF
Spark streaming , Spark SQL
PDF
Cassandra & Spark for IoT
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
PPTX
Real time data viz with Spark Streaming, Kafka and D3.js
PDF
Data processing platforms with SMACK: Spark and Mesos internals
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
The How and Why of Fast Data Analytics with Apache Spark
Tachyon and Apache Spark
Sa introduction to big data pipelining with cassandra & spark west mins...
Fully fault tolerant real time data pipeline with docker and mesos
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Feeding Cassandra with Spark-Streaming and Kafka
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Lambda architecture
Using the SDACK Architecture to Build a Big Data Product
Spark streaming , Spark SQL
Cassandra & Spark for IoT
Analyzing Time Series Data with Apache Spark and Cassandra
Real time data viz with Spark Streaming, Kafka and D3.js
Data processing platforms with SMACK: Spark and Mesos internals
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Ad

Similar to SMACK Stack 1.1 (20)

PDF
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
PDF
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
PPTX
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
PPTX
Introduction To Apache Mesos
PDF
Apache Mesos Overview and Integration
PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
PPTX
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
PDF
Make 2016 your year of SMACK talk
PDF
Webinar - Big Data: Let's SMACK - Jorg Schad
PPTX
Apache mesos
PPTX
Hadoop: Components and Key Ideas, -part1
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
PDF
[DO16] Mesosphere : Microservices meet Fast Data on Azure
PDF
Introducing Apache Mesos
PDF
Streaming Processing with a Distributed Commit Log
PPTX
An adaptive and eventually self healing framework for geo-distributed real-ti...
PDF
Mesos and the Architecture of the New Datacenter
PPTX
MANTL Data Platform, Microservices and BigData Services
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Introduction To Apache Mesos
Apache Mesos Overview and Integration
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Make 2016 your year of SMACK talk
Webinar - Big Data: Let's SMACK - Jorg Schad
Apache mesos
Hadoop: Components and Key Ideas, -part1
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
[DO16] Mesosphere : Microservices meet Fast Data on Azure
Introducing Apache Mesos
Streaming Processing with a Distributed Commit Log
An adaptive and eventually self healing framework for geo-distributed real-ti...
Mesos and the Architecture of the New Datacenter
MANTL Data Platform, Microservices and BigData Services
Ad

More from Joe Stein (19)

PDF
Get started with Developing Frameworks in Go on Apache Mesos
PPTX
Developing Real-Time Data Pipelines with Apache Kafka
PPTX
Developing Frameworks for Apache Mesos
PPTX
Containerized Data Persistence on Mesos
PPTX
Making Apache Kafka Elastic with Apache Mesos
PPTX
Building and Deploying Application to Apache Mesos
PPTX
Apache Kafka, HDFS, Accumulo and more on Mesos
PPTX
Developing with the Go client for Apache Kafka
PPTX
Current and Future of Apache Kafka
PPTX
Introduction Apache Kafka
PPTX
Introduction to Apache Mesos
PDF
Developing Realtime Data Pipelines With Apache Kafka
PDF
Developing Real-Time Data Pipelines with Apache Kafka
PPTX
Real-time streaming and data pipelines with Apache Kafka
PPTX
Apache Cassandra 2.0
PPTX
Storing Time Series Metrics With Cassandra and Composite Columns
PPTX
Apache Kafka
PPTX
Hadoop Streaming Tutorial With Python
PPTX
jstein.cassandra.nyc.2011
Get started with Developing Frameworks in Go on Apache Mesos
Developing Real-Time Data Pipelines with Apache Kafka
Developing Frameworks for Apache Mesos
Containerized Data Persistence on Mesos
Making Apache Kafka Elastic with Apache Mesos
Building and Deploying Application to Apache Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
Developing with the Go client for Apache Kafka
Current and Future of Apache Kafka
Introduction Apache Kafka
Introduction to Apache Mesos
Developing Realtime Data Pipelines With Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Apache Cassandra 2.0
Storing Time Series Metrics With Cassandra and Composite Columns
Apache Kafka
Hadoop Streaming Tutorial With Python
jstein.cassandra.nyc.2011

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
KodekX | Application Modernization Development
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Advanced Soft Computing BINUS July 2025.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KodekX | Application Modernization Development
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”

SMACK Stack 1.1

  • 2. Elodina is a big data as a service platform built on top of open source software. The Elodina platform solves today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. http://www.elodina.net/
  • 3. Whats SMACK Stack? SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https: //www.google.com/webhp?q=smack%20stack Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic compute, micro services, orchestration, micro segmentation all part of what you can do now with Streaming, Mesos, Analytics, Cassandra and Kafka
  • 4. The free lunch is over! http://www.gotw.ca/publications/concurrency-ddj.htm
  • 5. Many industries still don’t get it XML is everywhere but we have alternatives! We can support XML interface but don’t have to take on the burden of the extra data. You can save A LOT of overheard just by having a pre-processing step taking the XML, turning it into Avro and processing and storing that. It works https://github.com/elodina/xml-avro You can even process the response in Avro but return the result in XML, more on that later though!
  • 6. You need to be running Mesos. Lots of options here! What is most important is that you abstract your “Provider” from your “Grid”. What is “The Grid”? It is your PaaS layer you deploy too that runs your software. (aka your new awesome super computer) The grid is your mesos cluster. You are likely going to have more than one so plan accordingly. Think of it as immutable infrastructure, the computer does. Step 1
  • 8. The Grid … 2.0 ... https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of deploying applications, services, and big data infrastructure on shared resources. DCOS is based on Apache Mesos and includes a distributed systems kernel with enterprise-grade security.
  • 13. But there is more! ● Provisioning ● Micro Segmentation ● Orchestration ● Configuration Management ● Service Discovery ● Deployment Isolation and Identification ● Telemetry, Tracing, Ops Stuff, Etc ● Oh My! It boils back down into stacks! https://github.com/elodina/stack-deploy and how you are working with your schedulers in your cluster ultimatlly.
  • 14. Stack Deploy to the rescue!
  • 16. In the Grid you need Schedulers! ● Kafka – Producer/Consumer-based message queue management ● Exhibitor – Supervisor for distributed persistence (like ZooKeeper) ● Cassandra/DSE – HA, scalable, distributed NoSQL data storage ● Storm – Topology-based Real-time distributed data streaming ● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository ● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos ● HDFS – Configure, launch and manage HDFS on Mesos (coming soon) ● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now) ● MirrorMaker – Consumer to make a mirror copy of data to destination ● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers ● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers https://github.com/elodina/
  • 18. Virtual Telemetry “Data Center” In the Grid ZipkinQATeamBuild92 ● 1x Exhibitor-Mesos ● 1x Exhibitor ● 1x DSE-Mesos ● 1x Cassandra node ● 1x Kafka-Mesos ● 1x Kafka 0.8 broker ● 1x Zipkin-Mesos ● 1x Zipkin Collector ● 1x Zipkin Query ● 1x Zipkin Web “cluster” “zone” “Stack” - defaultSimpleZipkinFull “data center”
  • 19. Stack Deploy In Action ./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter ./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc ./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster ./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster ./stack-deploy add --file stacks/cassandra.stack ./stack-deploy run cassandra --zone cassandra_zone1
  • 34. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 35. It often starts with just one data pipeline
  • 36. Reuse of data pipelines for new producers
  • 37. Reuse of existing providers for new consumers
  • 38. Eventually the solution becomes the problem
  • 42. A high-throughput distributed messaging system rethought as a distributed commit log.
  • 49. Streaming & Analytics ● The landscape of streaming is about to get more fragmented and harder to navigate. This is not all bad news and it is not much different than where we were with NoSQL 6 years ago or so. ● Different systems are getting really (really (really)) good at different things. ○ Dag based systems ○ Event based systems ○ Query & Execution Engines ○ Streaming Engines ○ Etc!
  • 54. Storm (and Storm Topology based systems)
  • 55. Storm Nimbus { "id": "storm-nimbus", "cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk: //zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology. mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws. com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs", "cpus": 1.0, "mem": 1024, "ports": [31056], "requirePorts": true, "instances": 1, "uris": [ "http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz", "http://repo.elodina.s3.amazonaws.com/storm.yaml" ] }
  • 56. Storm UI { "id": "storm-ui", "cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus. host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs", "cpus": 0.2, "mem": 512, "ports": [31067], "requirePorts": true, "instances": 1, "uris": [ "http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz", "http://repo.elodina.s3.amazonaws.com/storm.yaml" ], "healthChecks": [ { "protocol": "HTTP", "portIndex": 0, "path": "/", "gracePeriodSeconds": 120, "intervalSeconds": 20, "maxConsecutiveFailures": 3 } ] }
  • 57. Storm Kafka - new spouts & bolts for Kafka 8, 9, ...
  • 60. Go Kafka Client - Fan Out Processing https://github.com/elodina/go-kafka-client-mesos ● Dynamic Kafka Log workers ● Blue/Green Deploy Support ● Fan Out Processing ● Auditable ● Batches ● Scalable/Auto-Scalable