SlideShare a Scribd company logo
1
Zeppelin Meetup
Moonsoo Lee / Creator of Zeppelin
moon@zepl.com
@apachezeppelin
2
Agenda
⬢ Demo: Real-time Streaming
⬢ Demo: Zeppelin on Kubernetes
⬢ Zeppelin Roadmap
⬢ Q&A
3
DEMO
Real-time
Streaming
4
+ +
5
DEMO
Zeppelin on Kubernetes
6
Zeppelin server
nginx
DNS
resolver
Pod
Kubernetes
ApiServer
Pod
Python
Interpreter
python-intp
rpc 12321
Pod
Spark
Interpreter
spark-intp
rpc 12321
spark-driver 22321
spark-block
manager
22322
spark-ui 4040
Service
Spark
exec
Spark
execzeppelin-server
http 80
rpc 12320
Create interpreter pod Create spark executor pod
Ingress
Service
Service
7
Benefits
MULTI-TENANCY
Each note and/or user has own
container for interpreters
SCALABILITY
Single host does not run all
interpreters anymore
SECURITY
Each container is isolated
(filesystem, process etc.)
8
Usage
$ kubectl apply -f ${ZEPPELIN_HOME}/k8s/zeppelin-server.yaml
* Need to build your own Zeppelin and Spark docker image before 0.9.0 is released
1. Build Zeppelin distribution package mvn package -Pbuild-distr …
2. Build Zeppelin docker image cd scripts/docker/zeppelin/bin; docker build -t …
3. Build Spark docker image <spark-distribution>/bin/docker-image-tool.sh -m -t 2.4.0 build
Available in 0.9.0-SNAPSHOT
http://zeppelin.apache.org/docs/0.9.0-SNAPSHOT/quickstart/kubernetes.html
Run
9
Zeppelin Roadmap
- Zeppelin on Kubernetes
- Apply network policy to isolate Interpreter Pod
- Schedule note on background as a Job in Kubernetes
- Run extra application such as terminal, tensorboard, the sameway SparkUI works
- Modernize front-end stack
- Currently AngularJS
- Dark theme?
- Visualization
- Realtime data visualization
- Pivot in the backend side, instead of doing it in a front-end that require transfer all data to front-end
- Sidebar
- Sidebar with widgets, such as ToC (Table of Contents, list of data, etc)
- Online widget registry (Helium)
- Collaboration
- Multi-cursor edit
- Comment!
10
Zeppelin Roadmap
Modernize
front-end stack
• Currently AngularJS
• Dark theme
Zeppelin on
Kubernetes
• Apply network policy to isolate
Interpreter Pod
• Schedule note on background as a
Job in Kubernetes
• Run extra application such as
terminal, tensorboard, the sameway
SparkUI works
Collaboration
• Multi-cursor edit
• Comment!
Sidebar
• Sidebar with widgets, such as ToC
(Table of Contents, list of data, etc)
• Online widget registry (Helium)
Visualization
• Realtime data visualization
• Pivot in the backend side,
instead of doing it in a front-end
that require transfer all data to
front-end
11
Mailing list
- Users: users@zeppelin.apache.org
- Dev: dev@zeppelin.apache.org
JIRA
- https://issues.apache.org/jira/projects/ZEPPELIN
Github
- https://github.com/apache/zeppelin
Questions,
Suggestions,
Discussions, Votes!
Bug report, Track
development/release
progress
Fixes, improvements,
new features
Join Apache Zeppelin community.
12
www.zepl.com
Q&A
https://zeppelin.apache.org/
Moonsoo Lee / Creator of Zeppelin
moon@zepl.com
@issuefreaks
Send Mei Long your email for Apache Zeppelin
Slack invite: mlong@zepl.com
@meitrappist1
@ApacheZeppelin
13
Backup slides
14
Visualization
15
Transformation on browser (current)
Zeppelin Server
{
title: ….
text: “select job, count(1) from data”,
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
http
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Transform (pivot)
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
16
Problem
- Entire result dataset need to be transferred to browser, even though not all of
them are rendered.
- Browser CPU, memory is limitation of transforming / rendering data
17
Transformation on Server Zeppelin Server
{
title: ….
text: “select job, count(1) from data”,
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
Note update
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
Transform (pivot)
job count
Student 2
Engineer 3
Teacher 1
job count
Student 2
Engineer 3
Teacher 1
Transform request (pivot)
Result dataset fetch
18
Transformation on Interpreter Zeppelin Server
{
title: ….
text: “select job, count(1) from data”,
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
Result dataset fetch
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
Transform (pivot)
job count
Student 2
Engineer 3
Teacher 1
Transform request (pivot)
job count
Student 2
Engineer 3
Teacher 1
Transform request
(pivot)
job count
Student 2
Engineer 3
Teacher 1
Note update
19
Transformation on where data is Zeppelin Server
{
title: ….
text: “select job, count(1) from data”,
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
Transform
pushdown
job count
Student 2
Engineer 3
Teacher 1
Transform request (pivot)
job count
Student 2
Engineer 3
Teacher 1
Transform request
(pivot)
Result dataset fetch
job count
Student 2
Engineer 3
Teacher 1
Note update
20
Related work
- Streaming data update (without refresh notebook)
- Separate transfer for result dataset and note to browser
- Partial data fetch for table display
- Extending TableData API
21
Sidebar and plugin widget
22
>
Sidebar show button.
23
Sidebar widget #1
Sidebar widget #2
Group1 Group2 <
Sidebar hide button
Sidebar widgets
Sidebar widget can
be grouped
24
Contents
1. This is notebook
a. First
b. Second
2. Next
a. Next
One of the most popular feature in Jupyter.
Google Colab also supports it.
Zeppelin has SPELL
See https://www.npmjs.com/package/zeppelin-toc-spell
TOC (table of contents) widget
25
Displays list of table, schema of table, preview of data
recognized by Interpreter
Table data widget
Name Temporary
table1 no
bank yes
Tables
Column Type
age INT
job TEXT
Schema
Preview
26
Drag and drop paragraph to the clipboard.
In the same or in another notebook and drag and drop
paragraph from clipboard.
Clipboard
Drop paragraph here
Paragraph a
Paragraph b
27
Widget on Helium registry
28
Thank you!
Please contact Mei Long mlong@zepl.com with your email
address for an invite to Apache Zeppelin Slack workspace

More Related Content

PDF
Maxscale switchover, failover, and auto rejoin
PDF
Introduction to Apache Storm
PDF
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
PDF
Advanced SQL injection to operating system full control (slides)
PDF
MariaDB 마이그레이션 - 네오클로바
PDF
Introduction to Cassandra
PPTX
One sink to rule them all: Introducing the new Async Sink
PDF
Event Driven Microservices with Spring Cloud Stream #jjug_ccc #ccc_ab3
Maxscale switchover, failover, and auto rejoin
Introduction to Apache Storm
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
Advanced SQL injection to operating system full control (slides)
MariaDB 마이그레이션 - 네오클로바
Introduction to Cassandra
One sink to rule them all: Introducing the new Async Sink
Event Driven Microservices with Spring Cloud Stream #jjug_ccc #ccc_ab3

What's hot (20)

PDF
SQL Server 2014 In Memory OLTP Overview
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
PostgreSQL Tutorial For Beginners | Edureka
PDF
Apache kafka performance(latency)_benchmark_v0.3
PPTX
Apache kafka
PDF
SQL injection: Not Only AND 1=1 (updated)
PDF
sqlmap - security development in Python
PDF
Introduction to Spark
PPT
Open HFT libraries in @Java
PDF
Real time analytics at uber @ strata data 2019
PDF
State of the Trino Project
PDF
爆速クエリエンジン”Presto”を使いたくなる話
PDF
MySQL Shell for DBAs
PPTX
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
PDF
MySQL User Group NL - MySQL 8
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
PPTX
Sql server concurrency
PDF
Cookpad TechConf 2016 - DWHに必要なこと
PPT
Sql injection
SQL Server 2014 In Memory OLTP Overview
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Scaling your Data Pipelines with Apache Spark on Kubernetes
PostgreSQL Tutorial For Beginners | Edureka
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka
SQL injection: Not Only AND 1=1 (updated)
sqlmap - security development in Python
Introduction to Spark
Open HFT libraries in @Java
Real time analytics at uber @ strata data 2019
State of the Trino Project
爆速クエリエンジン”Presto”を使いたくなる話
MySQL Shell for DBAs
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
MySQL User Group NL - MySQL 8
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Sql server concurrency
Cookpad TechConf 2016 - DWHに必要なこと
Sql injection
Ad

Similar to Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter (20)

PDF
Why Airflow? & What's new in Airflow 2.3?
PDF
Apache Flink internals
PPTX
Building Stream Processing as a Service
PPT
40043 claborn
PDF
BenchFlow: A Platform for End-to-end Automation of Performance Testing and An...
PDF
Testing Persistent Storage Performance in Kubernetes with Sherlock
PDF
Sprint 17
PDF
Google Cloud Dataflow
PPTX
Flink internals web
PPTX
Profiling & Testing with Spark
PDF
Tech Talk: DevOps at LeanIX @ Startup Camp Berlin
PDF
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
PDF
OpenFaaS JeffConf 2017 - Milan
PDF
Complex Made Simple: Sleep Better with TorqueBox
PDF
SCM Puppet: from an intro to the scaling
PDF
DCEU 18: App-in-a-Box with Docker Application Packages
PDF
From Kubernetes to OpenStack in Sydney
PDF
Camel on Cloud by Christina Lin
PDF
Cloud Native Serverless Java — Orkhan Gasimov
PPT
All Change
Why Airflow? & What's new in Airflow 2.3?
Apache Flink internals
Building Stream Processing as a Service
40043 claborn
BenchFlow: A Platform for End-to-end Automation of Performance Testing and An...
Testing Persistent Storage Performance in Kubernetes with Sherlock
Sprint 17
Google Cloud Dataflow
Flink internals web
Profiling & Testing with Spark
Tech Talk: DevOps at LeanIX @ Startup Camp Berlin
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
OpenFaaS JeffConf 2017 - Milan
Complex Made Simple: Sleep Better with TorqueBox
SCM Puppet: from an intro to the scaling
DCEU 18: App-in-a-Box with Docker Application Packages
From Kubernetes to OpenStack in Sydney
Camel on Cloud by Christina Lin
Cloud Native Serverless Java — Orkhan Gasimov
All Change
Ad

Recently uploaded (20)

PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Global journeys: estimating international migration
PPTX
Introduction to Knowledge Engineering Part 1
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
IB Computer Science - Internal Assessment.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data
Global journeys: estimating international migration
Introduction to Knowledge Engineering Part 1
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Acumen Training GuidePresentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Clinical guidelines as a resource for EBP(1).pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Galatica Smart Energy Infrastructure Startup Pitch Deck
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Foundation of Data Science unit number two notes
IB Computer Science - Internal Assessment.pptx

Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter

  • 1. 1 Zeppelin Meetup Moonsoo Lee / Creator of Zeppelin moon@zepl.com @apachezeppelin
  • 2. 2 Agenda ⬢ Demo: Real-time Streaming ⬢ Demo: Zeppelin on Kubernetes ⬢ Zeppelin Roadmap ⬢ Q&A
  • 6. 6 Zeppelin server nginx DNS resolver Pod Kubernetes ApiServer Pod Python Interpreter python-intp rpc 12321 Pod Spark Interpreter spark-intp rpc 12321 spark-driver 22321 spark-block manager 22322 spark-ui 4040 Service Spark exec Spark execzeppelin-server http 80 rpc 12320 Create interpreter pod Create spark executor pod Ingress Service Service
  • 7. 7 Benefits MULTI-TENANCY Each note and/or user has own container for interpreters SCALABILITY Single host does not run all interpreters anymore SECURITY Each container is isolated (filesystem, process etc.)
  • 8. 8 Usage $ kubectl apply -f ${ZEPPELIN_HOME}/k8s/zeppelin-server.yaml * Need to build your own Zeppelin and Spark docker image before 0.9.0 is released 1. Build Zeppelin distribution package mvn package -Pbuild-distr … 2. Build Zeppelin docker image cd scripts/docker/zeppelin/bin; docker build -t … 3. Build Spark docker image <spark-distribution>/bin/docker-image-tool.sh -m -t 2.4.0 build Available in 0.9.0-SNAPSHOT http://zeppelin.apache.org/docs/0.9.0-SNAPSHOT/quickstart/kubernetes.html Run
  • 9. 9 Zeppelin Roadmap - Zeppelin on Kubernetes - Apply network policy to isolate Interpreter Pod - Schedule note on background as a Job in Kubernetes - Run extra application such as terminal, tensorboard, the sameway SparkUI works - Modernize front-end stack - Currently AngularJS - Dark theme? - Visualization - Realtime data visualization - Pivot in the backend side, instead of doing it in a front-end that require transfer all data to front-end - Sidebar - Sidebar with widgets, such as ToC (Table of Contents, list of data, etc) - Online widget registry (Helium) - Collaboration - Multi-cursor edit - Comment!
  • 10. 10 Zeppelin Roadmap Modernize front-end stack • Currently AngularJS • Dark theme Zeppelin on Kubernetes • Apply network policy to isolate Interpreter Pod • Schedule note on background as a Job in Kubernetes • Run extra application such as terminal, tensorboard, the sameway SparkUI works Collaboration • Multi-cursor edit • Comment! Sidebar • Sidebar with widgets, such as ToC (Table of Contents, list of data, etc) • Online widget registry (Helium) Visualization • Realtime data visualization • Pivot in the backend side, instead of doing it in a front-end that require transfer all data to front-end
  • 11. 11 Mailing list - Users: users@zeppelin.apache.org - Dev: dev@zeppelin.apache.org JIRA - https://issues.apache.org/jira/projects/ZEPPELIN Github - https://github.com/apache/zeppelin Questions, Suggestions, Discussions, Votes! Bug report, Track development/release progress Fixes, improvements, new features Join Apache Zeppelin community.
  • 12. 12 www.zepl.com Q&A https://zeppelin.apache.org/ Moonsoo Lee / Creator of Zeppelin moon@zepl.com @issuefreaks Send Mei Long your email for Apache Zeppelin Slack invite: mlong@zepl.com @meitrappist1 @ApacheZeppelin
  • 15. 15 Transformation on browser (current) Zeppelin Server { title: …. text: “select job, count(1) from data”, paragraphs: [ { results: { code: SUCCESS, msg: [ type: TABLE, data: http thrift age balance job 21 1030 Student 34 20331 Engineer 50 30193 Engineer 33 12019 Teacher 23 23211 Engineer 29 92327 Student ... ... ... age balance job 21 1030 Student 34 20331 Engineer 50 30193 Marketing 33 12019 Engineer 23 23211 Engineer 29 92327 Student ... ... ... age balance job 21 1030 Student 34 20331 Engineer 50 30193 Engineer 33 12019 Teacher 23 23211 Engineer 29 92327 Student ... ... ... Interpreter Transform (pivot) age balance job 21 1030 Student 34 20331 Engineer 50 30193 Engineer 33 12019 Teacher 23 23211 Engineer 29 92327 Student ... ... ... Browser job count Student 2 Engineer 3 Teacher 1 Render
  • 16. 16 Problem - Entire result dataset need to be transferred to browser, even though not all of them are rendered. - Browser CPU, memory is limitation of transforming / rendering data
  • 17. 17 Transformation on Server Zeppelin Server { title: …. text: “select job, count(1) from data”, paragraphs: [ { results: { code: SUCCESS, msg: [ type: TABLE, data: Note update thrift age balance job 21 1030 Student 34 20331 Engineer 50 30193 Marketing 33 12019 Engineer 23 23211 Engineer 29 92327 Student ... ... ... age balance job 21 1030 Student 34 20331 Engineer 50 30193 Engineer 33 12019 Teacher 23 23211 Engineer 29 92327 Student ... ... ... Interpreter Browser job count Student 2 Engineer 3 Teacher 1 Render Transform (pivot) job count Student 2 Engineer 3 Teacher 1 job count Student 2 Engineer 3 Teacher 1 Transform request (pivot) Result dataset fetch
  • 18. 18 Transformation on Interpreter Zeppelin Server { title: …. text: “select job, count(1) from data”, paragraphs: [ { results: { code: SUCCESS, msg: [ type: TABLE, data: Result dataset fetch thrift age balance job 21 1030 Student 34 20331 Engineer 50 30193 Marketing 33 12019 Engineer 23 23211 Engineer 29 92327 Student ... ... ... Interpreter Browser job count Student 2 Engineer 3 Teacher 1 Render Transform (pivot) job count Student 2 Engineer 3 Teacher 1 Transform request (pivot) job count Student 2 Engineer 3 Teacher 1 Transform request (pivot) job count Student 2 Engineer 3 Teacher 1 Note update
  • 19. 19 Transformation on where data is Zeppelin Server { title: …. text: “select job, count(1) from data”, paragraphs: [ { results: { code: SUCCESS, msg: [ type: TABLE, data: thrift age balance job 21 1030 Student 34 20331 Engineer 50 30193 Marketing 33 12019 Engineer 23 23211 Engineer 29 92327 Student ... ... ... Interpreter Browser job count Student 2 Engineer 3 Teacher 1 Render Transform pushdown job count Student 2 Engineer 3 Teacher 1 Transform request (pivot) job count Student 2 Engineer 3 Teacher 1 Transform request (pivot) Result dataset fetch job count Student 2 Engineer 3 Teacher 1 Note update
  • 20. 20 Related work - Streaming data update (without refresh notebook) - Separate transfer for result dataset and note to browser - Partial data fetch for table display - Extending TableData API
  • 23. 23 Sidebar widget #1 Sidebar widget #2 Group1 Group2 < Sidebar hide button Sidebar widgets Sidebar widget can be grouped
  • 24. 24 Contents 1. This is notebook a. First b. Second 2. Next a. Next One of the most popular feature in Jupyter. Google Colab also supports it. Zeppelin has SPELL See https://www.npmjs.com/package/zeppelin-toc-spell TOC (table of contents) widget
  • 25. 25 Displays list of table, schema of table, preview of data recognized by Interpreter Table data widget Name Temporary table1 no bank yes Tables Column Type age INT job TEXT Schema Preview
  • 26. 26 Drag and drop paragraph to the clipboard. In the same or in another notebook and drag and drop paragraph from clipboard. Clipboard Drop paragraph here Paragraph a Paragraph b
  • 28. 28 Thank you! Please contact Mei Long mlong@zepl.com with your email address for an invite to Apache Zeppelin Slack workspace