SlideShare a Scribd company logo
Apache Apex as
YARN Application
Chinmay Kolhatkar (chinmay@apache.org)
Mar 22, 2016
Apache Apex Meetup
Agenda
• Directed Acyclic Graph
• Apex as a YARN Application
• Application Components of Apex
• Lifecycle of Apex as a YARN Application
Apache Apex Meetup
Directed Acyclic Graph (DAG)
• Defines compute stages of streaming application
• Defines tuple flow across Operators via Stream
Compute
1
Apache Apex Meetup
Compute
3
Compute
2
Compute
4
DAG Components
• Tuple
● Atomic data that flows over a stream
• Operator
● Basic compute unit per tuple
• Stream
● Connector abstraction between operators
● Tuples flow over this
Operator
1
Operator
2
Apache Apex Meetup
Stream
tuple
3
tuple
1
tuple
2
DAG Types
O1 O2
O3
O4
Physical DAG
Apache Apex Meetup
O5
Logical DAG
• Logical Plan
● Logical representation of computation
● Defines operators, streams and dataflow
• Physical Plan
● Deployable plan on cluster
● Contains partition information
of operators
● Has ready-to-deploy serialized operator
instances
O1
P1
O1
P2
O1
P3
O2
P1
O2
P2
O2
P3
U
O3
O4
O5
Apex as YARN application
Node
ResourceManager
(AsM + Scheduler)
NM Node NM Node NM
YarnClient
AppMaster
YarnContainer
YarnContainer
YarnContainer
StrAM
(AppMaster)
YarnContainer
StrAMChild
O1 O2
YarnContainer
StrAMChild
O3
DTCLI
StrAMClient
YarnClient
Apache Apex Meetup
ClientRM
Protocol
AMRM
Protocol
ContainerManager
Protocol
ContainerManager
Protocol
ClientRM
Protocol
AMRM
Protocol
ContainerManager
Protocol
Application Components of Apex - StrAMClient
• Part of dtcli client interface
• Invoked by “launch” command of dtcli
• Tasks:
● Copy required the application package files into HDFS
● Validate Logical Plan
● Serialize Logical plan to HDFS
● Launch Application Master i.e. StrAM
Apache Apex Meetup
Application Components of Apex - StrAM
• Streaming Application Master
• Started by StrAMClient on a YarnContainer
• Tasks:
● Convert logical plan to physical plan
● Serialize operators to HDFS
● Request for resources to ResourceManager
● Start StrAMChild in YarnContainer(s)
● Monitor StrAMChild using ContainerManager protocol
● Generate Application statistics
● Host results on WebService (dtManage)
● Fault Tolerance
● Checkpointing/Committing Application States
● Support Security
● Shutdown Application
Apache Apex Meetup
Application Components of Apex - StrAMChild
• Deployed on YarnContainer
• Started by NodeManager as instructed by StrAM
• Instance of StreamingContainer
• Contains Operators (compute-related)
• Contains BufferServer (stream-related)
• Tasks:
● Regularly send heartbeat to StrAM
● Execute commands from StrAM
● Shutdown or Kill self if instructed
● Manage lifecycle of an Operator
● Network communication using BufferServer
Apache Apex Meetup
Lifecycle of Apex/YARN Application - Start
Node
ResourceManager
(AsM + Scheduler)
NM Node NM Node NM
DTCLI/
StrAMClient
(YarnClient)
1) Access cluster information
HDFS
3) Submit Application to RM
StrAM
(AppMaster)
4) StrAM Registers with RM
5) StrAM sends heartbeats regularly
6) StrAM request containers with specifications
7) StrAMChild reads
serialized operator
from HDFS
8) StrAMChild starts
operator lifecycle
Apache Apex Meetup
2) Copies files from HDFS
ClientRMProtocol
AMRMProtocol
YarnContainer
StrAMChild
O2
O1
YarnContainer
StrAMChild
O3
YarnContainer
StrAMChild
O4ContainerManager
Protocol
ContainerManager
Protocol
Lifecycle of Apex/YARN Application - Running
Node
ResourceManager
(AsM + Scheduler)
NM Node NM Node NM
DTCLI/
StrAMClient
(YarnClient)
HDFS
StrAM
(AppMaster)
Apache Apex Meetup
ClientRMProtocol
AMRMProtocol
YarnContainer
StrAMChild
O2
O1
YarnContainer
StrAMChild
O3
YarnContainer
StrAMChild
O4ContainerManager
Protocol
ContainerManager
Protocol
1) StrAMChild sends
heartbeats
2) StrAMChild sends operator
data
3) StrAM send regular
heartbeats to RM
4) Query status of application
Lifecycle of Apex/YARN Application - Shutdown
Node
ResourceManager
(AsM + Scheduler)
NM Node NM Node NM
DTCLI/
StrAMClient
(YarnClient)
HDFS
StrAM
(AppMaster)
Apache Apex Meetup
ClientRMProtocol
AMRMProtocol
YarnContainer
StrAMChild
O2
O1
YarnContainer
StrAMChild
O3
YarnContainer
StrAMChild
O4ContainerManager
Protocol
ContainerManager
Protocol
1) Connect on WebService
REST API
3) Send shutdown signal to
StrAMChild
4) StrAMChild finishes
operator lifecycle
5) Check if all containers are freed
6) StrAM unregisters itself
7) StrAM exits
8) Check if application has
shutdown
2) Send command to StrAM
Lifecycle of Apex/YARN Application - Kill
Node
ResourceManager
(AsM + Scheduler)
NM Node NM Node NM
DTCLI/
StrAMClient
(YarnClient)
HDFS
StrAM
(AppMaster)
Apache Apex Meetup
ClientRMProtocol
AMRMProtocol
YarnContainer
StrAMChild
O2
O1
YarnContainer
StrAMChild
O3
YarnContainer
StrAMChild
O4ContainerManager
Protocol
ContainerManager
Protocol
1) Send kill-app command to YARN
2) RM kills all containers
Summary – Apex platform
• Enables YARN to be used for Streaming Applications
• Takes care of YARN specific work
• User can focus on business logic defined in Operators
Apache Apex Meetup
15
Apache Apex Meetup
Resources
Apache Apex Meetup
• Apache Apex website - http://apex.incubator.apache.org/
• Subscribe - http://apex.incubator.apache.org/community.html
• Download - http://apex.incubator.apache.org/downloads.html
• Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex
• Facebook - https://www.facebook.com/ApacheApex/
• Meetup - http://www.meetup.com/topics/apache-apex
• Startup Program – Free Enterprise License for startups, Universities, Non-Profits
Upcoming events...
Apache Apex Meetup
• March 24th 9am PST - Fault Tolerance and Processing Semantics with Apache
Apex
• March 28th 6pm PST - Low-latency ingestion and analytics with Apache Kafka
and Apache Apex (Hadoop)
• ...

More Related Content

PPTX
Apache Apex: Stream Processing Architecture and Applications
PPTX
Apache Apex Fault Tolerance and Processing Semantics
PPTX
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
PDF
Introduction to Apache Apex - CoDS 2016
PPTX
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
PPTX
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
PPTX
Deep Dive into Apache Apex App Development
PPTX
University program - writing an apache apex application
Apache Apex: Stream Processing Architecture and Applications
Apache Apex Fault Tolerance and Processing Semantics
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Introduction to Apache Apex - CoDS 2016
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Deep Dive into Apache Apex App Development
University program - writing an apache apex application

What's hot (20)

PPTX
DataTorrent Presentation @ Big Data Application Meetup
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
PPTX
Introduction to Apache Apex
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
PPTX
Smart Partitioning with Apache Apex (Webinar)
PPTX
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
PPTX
Java High Level Stream API
PDF
Developing streaming applications with apache apex (strata + hadoop world)
PPTX
Introduction to Apache Apex
PPTX
Intro to Apache Apex @ Women in Big Data
PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
PDF
Building your first aplication using Apache Apex
PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Ingestion file copy using apex
PDF
Extending The Yahoo Streaming Benchmark to Apache Apex
PPTX
Apache Apex Meetup at Cask
PPTX
Fault Tolerance and Processing Semantics in Apache Apex
PPTX
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
PPTX
Introduction to Apache Apex and writing a big data streaming application
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
DataTorrent Presentation @ Big Data Application Meetup
Architectual Comparison of Apache Apex and Spark Streaming
Introduction to Apache Apex
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Smart Partitioning with Apache Apex (Webinar)
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Java High Level Stream API
Developing streaming applications with apache apex (strata + hadoop world)
Introduction to Apache Apex
Intro to Apache Apex @ Women in Big Data
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Building your first aplication using Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Ingestion file copy using apex
Extending The Yahoo Streaming Benchmark to Apache Apex
Apache Apex Meetup at Cask
Fault Tolerance and Processing Semantics in Apache Apex
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Introduction to Apache Apex and writing a big data streaming application
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ad

Similar to Apache Apex as a YARN Apllication (20)

PDF
Apache Apex as YARN Application
PDF
Spark on yarn
PPTX
Ingestion and Dimensions Compute and Enrich using Apache Apex
PDF
Introduction to Apache Apex
PDF
BigDataSpain 2016: Stream Processing Applications with Apache Apex
PPTX
Apache Apex: Stream Processing Architecture and Applications
PPTX
Flink Streaming @BudapestData
PDF
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PPTX
Apache Apex - BufferServer
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
PPTX
Apache Arrow Flight Overview
PDF
Porting a Streaming Pipeline from Scala to Rust
PPT
Acl Tcam
PDF
BigDataSpain 2016: Introduction to Apache Apex
PPTX
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
PDF
BKK16-106 ODP Project Update
PDF
Apache Storm
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex as YARN Application
Spark on yarn
Ingestion and Dimensions Compute and Enrich using Apache Apex
Introduction to Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
Apache Apex: Stream Processing Architecture and Applications
Flink Streaming @BudapestData
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Apache Apex - BufferServer
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Apache Arrow Flight Overview
Porting a Streaming Pipeline from Scala to Rust
Acl Tcam
BigDataSpain 2016: Introduction to Apache Apex
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
BKK16-106 ODP Project Update
Apache Storm
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Ad

More from Apache Apex (16)

PDF
Low Latency Polyglot Model Scoring using Apache Apex
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
PPTX
Hadoop Interacting with HDFS
PPTX
Introduction to Real-Time Data Processing
PPTX
Introduction to Yarn
PPTX
Introduction to Map Reduce
PPTX
HDFS Internals
PPTX
Intro to Big Data Hadoop
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PPTX
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
PPTX
Apache Beam (incubating)
PPTX
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
PPTX
Apache Apex & Bigtop
PDF
Building Your First Apache Apex Application
Low Latency Polyglot Model Scoring using Apache Apex
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Hadoop Interacting with HDFS
Introduction to Real-Time Data Processing
Introduction to Yarn
Introduction to Map Reduce
HDFS Internals
Intro to Big Data Hadoop
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Beam (incubating)
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Apache Apex & Bigtop
Building Your First Apache Apex Application

Recently uploaded (20)

PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
AI in Product Development-omnex systems
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administraation Chapter 3
PDF
Digital Strategies for Manufacturing Companies
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
Odoo POS Development Services by CandidRoot Solutions
VVF-Customer-Presentation2025-Ver1.9.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
2025 Textile ERP Trends: SAP, Odoo & Oracle
Understanding Forklifts - TECH EHS Solution
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
AI in Product Development-omnex systems
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administraation Chapter 3
Digital Strategies for Manufacturing Companies
wealthsignaloriginal-com-DS-text-... (1).pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
CHAPTER 2 - PM Management and IT Context
Design an Analysis of Algorithms II-SECS-1021-03
Navsoft: AI-Powered Business Solutions & Custom Software Development
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Upgrade and Innovation Strategies for SAP ERP Customers

Apache Apex as a YARN Apllication

  • 1. Apache Apex as YARN Application Chinmay Kolhatkar (chinmay@apache.org) Mar 22, 2016 Apache Apex Meetup
  • 2. Agenda • Directed Acyclic Graph • Apex as a YARN Application • Application Components of Apex • Lifecycle of Apex as a YARN Application Apache Apex Meetup
  • 3. Directed Acyclic Graph (DAG) • Defines compute stages of streaming application • Defines tuple flow across Operators via Stream Compute 1 Apache Apex Meetup Compute 3 Compute 2 Compute 4
  • 4. DAG Components • Tuple ● Atomic data that flows over a stream • Operator ● Basic compute unit per tuple • Stream ● Connector abstraction between operators ● Tuples flow over this Operator 1 Operator 2 Apache Apex Meetup Stream tuple 3 tuple 1 tuple 2
  • 5. DAG Types O1 O2 O3 O4 Physical DAG Apache Apex Meetup O5 Logical DAG • Logical Plan ● Logical representation of computation ● Defines operators, streams and dataflow • Physical Plan ● Deployable plan on cluster ● Contains partition information of operators ● Has ready-to-deploy serialized operator instances O1 P1 O1 P2 O1 P3 O2 P1 O2 P2 O2 P3 U O3 O4 O5
  • 6. Apex as YARN application Node ResourceManager (AsM + Scheduler) NM Node NM Node NM YarnClient AppMaster YarnContainer YarnContainer YarnContainer StrAM (AppMaster) YarnContainer StrAMChild O1 O2 YarnContainer StrAMChild O3 DTCLI StrAMClient YarnClient Apache Apex Meetup ClientRM Protocol AMRM Protocol ContainerManager Protocol ContainerManager Protocol ClientRM Protocol AMRM Protocol ContainerManager Protocol
  • 7. Application Components of Apex - StrAMClient • Part of dtcli client interface • Invoked by “launch” command of dtcli • Tasks: ● Copy required the application package files into HDFS ● Validate Logical Plan ● Serialize Logical plan to HDFS ● Launch Application Master i.e. StrAM Apache Apex Meetup
  • 8. Application Components of Apex - StrAM • Streaming Application Master • Started by StrAMClient on a YarnContainer • Tasks: ● Convert logical plan to physical plan ● Serialize operators to HDFS ● Request for resources to ResourceManager ● Start StrAMChild in YarnContainer(s) ● Monitor StrAMChild using ContainerManager protocol ● Generate Application statistics ● Host results on WebService (dtManage) ● Fault Tolerance ● Checkpointing/Committing Application States ● Support Security ● Shutdown Application Apache Apex Meetup
  • 9. Application Components of Apex - StrAMChild • Deployed on YarnContainer • Started by NodeManager as instructed by StrAM • Instance of StreamingContainer • Contains Operators (compute-related) • Contains BufferServer (stream-related) • Tasks: ● Regularly send heartbeat to StrAM ● Execute commands from StrAM ● Shutdown or Kill self if instructed ● Manage lifecycle of an Operator ● Network communication using BufferServer Apache Apex Meetup
  • 10. Lifecycle of Apex/YARN Application - Start Node ResourceManager (AsM + Scheduler) NM Node NM Node NM DTCLI/ StrAMClient (YarnClient) 1) Access cluster information HDFS 3) Submit Application to RM StrAM (AppMaster) 4) StrAM Registers with RM 5) StrAM sends heartbeats regularly 6) StrAM request containers with specifications 7) StrAMChild reads serialized operator from HDFS 8) StrAMChild starts operator lifecycle Apache Apex Meetup 2) Copies files from HDFS ClientRMProtocol AMRMProtocol YarnContainer StrAMChild O2 O1 YarnContainer StrAMChild O3 YarnContainer StrAMChild O4ContainerManager Protocol ContainerManager Protocol
  • 11. Lifecycle of Apex/YARN Application - Running Node ResourceManager (AsM + Scheduler) NM Node NM Node NM DTCLI/ StrAMClient (YarnClient) HDFS StrAM (AppMaster) Apache Apex Meetup ClientRMProtocol AMRMProtocol YarnContainer StrAMChild O2 O1 YarnContainer StrAMChild O3 YarnContainer StrAMChild O4ContainerManager Protocol ContainerManager Protocol 1) StrAMChild sends heartbeats 2) StrAMChild sends operator data 3) StrAM send regular heartbeats to RM 4) Query status of application
  • 12. Lifecycle of Apex/YARN Application - Shutdown Node ResourceManager (AsM + Scheduler) NM Node NM Node NM DTCLI/ StrAMClient (YarnClient) HDFS StrAM (AppMaster) Apache Apex Meetup ClientRMProtocol AMRMProtocol YarnContainer StrAMChild O2 O1 YarnContainer StrAMChild O3 YarnContainer StrAMChild O4ContainerManager Protocol ContainerManager Protocol 1) Connect on WebService REST API 3) Send shutdown signal to StrAMChild 4) StrAMChild finishes operator lifecycle 5) Check if all containers are freed 6) StrAM unregisters itself 7) StrAM exits 8) Check if application has shutdown 2) Send command to StrAM
  • 13. Lifecycle of Apex/YARN Application - Kill Node ResourceManager (AsM + Scheduler) NM Node NM Node NM DTCLI/ StrAMClient (YarnClient) HDFS StrAM (AppMaster) Apache Apex Meetup ClientRMProtocol AMRMProtocol YarnContainer StrAMChild O2 O1 YarnContainer StrAMChild O3 YarnContainer StrAMChild O4ContainerManager Protocol ContainerManager Protocol 1) Send kill-app command to YARN 2) RM kills all containers
  • 14. Summary – Apex platform • Enables YARN to be used for Streaming Applications • Takes care of YARN specific work • User can focus on business logic defined in Operators Apache Apex Meetup
  • 16. Resources Apache Apex Meetup • Apache Apex website - http://apex.incubator.apache.org/ • Subscribe - http://apex.incubator.apache.org/community.html • Download - http://apex.incubator.apache.org/downloads.html • Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex • Facebook - https://www.facebook.com/ApacheApex/ • Meetup - http://www.meetup.com/topics/apache-apex • Startup Program – Free Enterprise License for startups, Universities, Non-Profits
  • 17. Upcoming events... Apache Apex Meetup • March 24th 9am PST - Fault Tolerance and Processing Semantics with Apache Apex • March 28th 6pm PST - Low-latency ingestion and analytics with Apache Kafka and Apache Apex (Hadoop) • ...