Episode 3: Kubernetes and Big Data Services

Kubernetes and Big Data
Services
@joerg_schad @gaunetes @dcos

Chris Gaun
PMM at Mesosphere /
Kubernetes Expert /
CNCF Ambasador
● Previous to that
Gartner analyst
covering public IaaS
● Kubernetes
community for 3 years

© 2018 Mesosphere, Inc. All Rights Reserved.
Mesosphere DC/OS at KubeCon EU
● Mesosphere - Platinum
Sponsor
● Many presentation:
container storage, ML,
HDFS
● Demoing smart city
application
3

Jörg Schad
Technical Community
Lead / Developer
● Core Mesos
developer at
Mesosphere
● Passions are deep
learning, distributed
data systems, and
data analytics

Bootcamp: Building Kubernetes-as-a-Service at
Scale, Anywhere
● Episode 1: Building Kubernetes-as-a-Service
at Scale
● Episode 2: Deploying Kubernetes at Scale
with DC/OS
● Episode 3: Kubernetes and Big Data
Services
● Episode 4: Operating Kubernetes at Scale
with DC/OS
● End-to-end components
and best practices
● Automated management
of Kubernetes
● Connecting Kubernetes
to Big Data services
● Delivering an entire
Kubernetes solution

6
Star / Clone Github
1. Go to Kubernetes DC/OS
quickstart
2. Search “DC/OS Kubernetes
Quickstart Github” or
https://github.com/mesospher
e/dcos-kubernetes-quickstart
3. Live demo
https://github.com/dcos/demos
/tree/master/flink-k8s/1.11

7
Sign Up For Slack
1. Slack URL: https://chat.dcos.io/
2. Join #kubernetes channel
3. OSS support / feedback

© 2017 Mesosphere, Inc. All Rights Reserved. 8
MapReduce is
crunching Data
Ancient
Times...

But then business
demanded
FAST DATA
We need to turn faster!
Today...

Fast Data
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics
Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product recommendations

The SMACK Stack
EVENTS
Ubiquitous data streams
from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events
per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data and build
data driven applications
Apache Mesos/ DC/OS
Sensors
Devices
Clients

Episode 3: Kubernetes and Big Data Services

The SMACK Stack
EVENTS
Ubiquitous data streams
from connected devices
INGEST
Apache Kafka
STORE
Apache Flink
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events
per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data and build
data driven applications
Apache Mesos/ DC/OS
Sensors
Devices
Clients

Challenges

Datacenter
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Kubernetes
Jenkins
Kafka
Spark
Cassandra

3 AM
Typical Datacenter
low utilization
Kubernetes
Jenkins
Kafka
Spark
Cassandra

Datacenter
Typical Datacenter
low utilization
Mesos/ DC/OS
automated schedulers, workload multiplexing onto the
same machines
Kubernetes
Jenkins
Kafka
Spark
Cassandra

• Brings “as-a-Service”
automation to any application
technology on any
infrastructure
• Organizations Run All Types of
Container Management as-a-
Service Using Mesos:
"(Netflix) launches up to 500,000
containers and 200,000
clusters/day"
-Netflix OSS, on using Titus container
management ontop Mesos

DC/OS
PHYSICAL INFRASTRUCTURE
MICROSERVICES, CONTAINERS, & DEV TOOLS
VIRTUAL MACHINES PUBLIC CLOUDS
DATA SERVICES, MACHINE LEARNING, & AI
Security &
Compliance
Application-Aware
Automation
Multitenancy
Hybrid Cloud
Management
100+
MORE
DatacenterEdge
Datacenter and Cloud as a Single Computing Resource
Powered by Apache Mesos
20+
MORE

Two-level Scheduling
1. Agents advertise resources to Master
2. Master offers resources to Framework
3. Framework rejects / uses resources
4. Agent reports task status to Master
23
MESOS ARCHITECTURE
Mesos
Master
Mesos
Master
Mesos
Master
Mesos AgentMesos Agent Service
Cassandra
Executor
Cassandra
Task
Kubernetes
Scheduler
Spark
Executor
Spark
Task
Mesos AgentMesos Agent Service
Docker
Executor
Docker
Task
K8s Executor
Kubelet
Task
Marathon
Scheduler
Kafka
Scheduler

Distributed Systems are ...
HDFS Scheduler

Plans
dcos hdfs --name=hdfs plan status deploy
deploy (serial strategy) (COMPLETE)
├─ journal (serial strategy) (COMPLETE)
│ ├─ journal-0:[node] (COMPLETE)
│ ├─ journal-1:[node] (COMPLETE)
│ └─ journal-2:[node] (COMPLETE)
├─ name (serial strategy) (COMPLETE)
│ ├─ name-0:[node, zkfc] (COMPLETE)
│ └─ name-1:[node, zkfc] (COMPLETE)
└─ data (serial strategy) (COMPLETE)
├─ data-0:[node] (COMPLETE)
├─ data-1:[node] (COMPLETE)
└─ data-2:[node] (COMPLETE)

Server Server Server Server Server
as-a-Service
Installation
KubernetesOne-Click
20+
MORE
Cloud Native
Services

ServerServer Server
as-a-Service
Automated Self Healing
Server Server Server
KubernetesZero
Touch
20+
MORE
Cloud Native
Services

Why {Spark, HDFS, ..} on K8s today?
Kelsey Hightower
Kubernetes Thought
Leader
Ranked #1 K8s Influencer
Staff Developer
Advocate
PM & Chief Advocate
Today Big Data on K8s is more DIY
Top
Kubernetes
Advocate

SMACK Stack
Generator Display
1. Financial data created
by generator
2. Written to
Kafka topics
3. Kafka Topics
consumed by Spark or
Flink
4. Results written back into Kafka
stream (another topic)
7. Results displayed

30
Star / Clone Github
1. Go to Kubernetes DC/OS
quickstart
2. Search “DC/OS Kubernetes
Quickstart Github” or
https://github.com/mesospher
e/dcos-kubernetes-quickstart
3. Live demo
https://github.com/dcos/demos
/tree/master/flink-k8s/1.11

SMACK Stack
Generator Display
1. Financial data created
by generator
2. Written to
Kafka topics
4. Results written back into Kafka
stream (another topic)
5. Results displayed
3. Kafka Topics
consumed and analyzed
by Flink
Kubernetes Cluster
(running on top of DC/OS)

Download Now
https://mesosphere.com/resource/category/ebook/

THANK YOU!
ANY
QUESTION
S?
@dcos
users@dcos.io
/groups/8295652
/dcos
/dcos/examples
/dcos/demos
chat.dcos.io
https://github.com/mesosphere/dcos-kubernetes-quickstart
https://mesosphere.com/blog/another-kubernetes-service/

Episode 3: Kubernetes and Big Data Services

More Related Content

What's hot (20)

Similar to Episode 3: Kubernetes and Big Data Services (20)

More from Mesosphere Inc. (20)

Recently uploaded (20)

Episode 3: Kubernetes and Big Data Services