SlideShare a Scribd company logo
Apache Apex Meetup
Introduction to Apache Apex
Real time streaming.. Really!!!
Chinmay Kolhatkar
chinmay@apache.org
February 13, 2016
Apache Apex Meetup
Agenda
➔ Project History
➔ What is Apache Apex?
➔ Directed Acyclic Graph (DAG)
➔ Components of DAG
➔ Windowing
➔ Operator Lifecycle
➔ Apache Apex Architecture
➔ Other features
Apache Apex Meetup
Project History
➔ Started development at DataTorrent in 2012
➔ Open-sourced under ASF in 2015
➔ Currently Have 50+ committers
➔ Free to use Streaming Application platform
Apache Apex Meetup
What is Apache Apex?
➔ Apex project is under Apache Software Foundation
➔ Apex is a Streaming Application platform
➔ YARN-native application
➔ Complete implementation is done in Java
➔ Consist of 2 primary components
◆ Apex Core - Engine which facilitates Real time processing
◆ Apex Malhar - Out-of-the-box operators that can be used with Apex
Core
Apache Apex Meetup
➔ Defines compute stages
➔ Defined how tuple flow over compute stages over stream
Directed Acyclic Graph (DAG)
Filtered
Stream
Output StreamTuple Tuple
FilteredStream
Enriched
Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
Apache Apex Meetup
➔ Smallest atomic data that flows over a
stream
➔ Emitted by Operators after processing
➔ Received by next Operator for
processing
➔ Java objects which are serializable
➔ Types:
◆ Data Tuple
◆ Control Tuple
Components of DAG - Tuple
Apache Apex Meetup
➔ Logical compute unit
➔ Java code which processes a tuple
➔ Runs inside a JVM
➔ Types
◆ Input Adapter
◆ Generic Operator
◆ Output Adapter
Components of DAG - Operator
Apache Apex Meetup
➔ Connect operators
➔ Channel that carries the tuples from
one operator to another
Components of DAG - Stream
Apache Apex Meetup
➔ Ends of a stream
➔ Part of operator
➔ Types of ports
◆ Input Port
◆ Output Port
Components of DAG - Ports
Apache Apex Meetup
Windowing
➔ Tuples divided into time slices
➔ Windows are given ids (type:long)
➔ Also called as Streaming Window
● Default 500ms
Apache Apex Meetup
➔ Input Operator inserts control tuple
➔ Control tuple marks window boundary
➔ Different operator may be processing
different windows
➔ All management activities of data
happens at the boundary of window
Windowing (contd…)
BeginWindow
Control Tuple
EndWindow
Control Tuple
Data
Tuples
Window nWindow n+1 Output
Adapter
Input
Adapter
Generic
Operator
Apache Apex Meetup
➔ Called by Apex Platform
➔ Simple unit test like lifecycle
➔ Governed by control tuples
➔ All operators in DAG go through
this life-cycle
Operator Lifecycle
Apache Apex Meetup
➔ Setup
◆ Start of operator lifecycle
◆ Do any initialization here
➔ beginWindow
◆ Marks starting of window
➔ endWindow
◆ Marks end of window
➔ teardown
◆ Do any finalization here
◆ End of operator lifecycle
Operator Lifecycle (contd...)
Apache Apex Meetup
➔ emitTuples
◆ Called for Input Adapters
◆ Called in an infinite while
loop by platform
➔ process
◆ Called for Generic Operators
and Output Adapters
◆ Associated to to a port
◆ Called for every incoming
tuple
Operator Lifecycle (contd...)
Apache Apex Meetup
➔ OutputPort::emit
◆ Special method not part of
operator lifecycle
◆ To be called by operator
code
◆ Emits the tuples to next
operator
◆ Bound by Window
Operator Lifecycle (contd...)
Apache Apex Meetup
Apache Apex Architecture
Machine nodes (Physical or Virtual)
Hadoop (YARN)
Distributed File System
(e.g. HDFS)
Apache Apex Core (Streaming Engine)
Streaming Application Streaming Application
RESTAPI
External
Data
Sources
Apache Apex Malhar
(Reusable Operators, Connectors)
Custom Operators
Apache Apex Meetup
➔ Ease of Use
➔ Locality
➔ Fault Tolerance
➔ Scalability
◆ Partitioning
◆ Auto-scaling
Other features of platform
Apache Apex Meetup
● Apache Apex Page
○ http://apex.incubator.apache.org
● Mailing Lists
○ dev@apex.incubator.apache.org
○ users@apex.incubator.apache.org
● Repository
○ https://github.com/apache/incubator-apex-core
○ https://github.com/apache/incubator-apex-malhar
● Issue Tracking
○ https://issues.apache.org/jira/browse/APEXCORE
○ https://issues.apache.org/jira/browse/APEXMALHAR
Resources
● @ApacheApex
● /groups/7020520
Apache Apex Meetup
Apex in Distributed Environment
Hadoop Edge Node
dtManage
(Web UI)
Hadoop Node
YARN Container
App Master
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Worker
Container
Hadoop Node
YARN Container
YARN Container
YARN Container
Thread1
Op2
Op1
Thread-N
Op3
Worker
Container
CLI
dtGateway
(REST API)
Part of DataTorrent RTS
dtGateway
(REST API)
dtManage
(Web UI)
Web
Browser
Apache Apex Meetup
➔ AT_LEAST_ONCE (default)
◆ Windows are processed at least once
➔ AT_MOST_ONCE
◆ Windows are processed at most once
➔ EXACTLY_ONCE
◆ Windows are processed exactly once
Processing Modes
Apache Apex Meetup
➔ Saves operator state on HDFS
➔ Each operator undergoes checkpointing
➔ Done by platform
➔ Happens every 60 streaming windows by default i.e. 30 sec.
➔ Checkpoint is named by the windowId at which it happens
➔ If all operators gets checkpointed at same window, that checkpointed state
becomes “committed” state of application
➔ Committed state is used for recovery in case of failure
Checkpointing

More Related Content

PPTX
Introduction to Apache Apex
PPTX
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
PDF
Introduction to Apache Apex - CoDS 2016
PDF
Extending The Yahoo Streaming Benchmark to Apache Apex
PPTX
DataTorrent Presentation @ Big Data Application Meetup
PPTX
Apache Apex Fault Tolerance and Processing Semantics
PPTX
Apache Apex - Hadoop Users Group
PPTX
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Introduction to Apache Apex - CoDS 2016
Extending The Yahoo Streaming Benchmark to Apache Apex
DataTorrent Presentation @ Big Data Application Meetup
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex - Hadoop Users Group
Introduction to Apache Apex and writing a big data streaming application

What's hot (20)

PPTX
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
PPTX
Apache Apex & Bigtop
PPTX
Fault-Tolerant File Input & Output
PPTX
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
PPTX
Apache Apex Meetup at Cask
PPTX
Apache Apex: Stream Processing Architecture and Applications
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Apex as yarn application
PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
PPTX
Stream Processing with Apache Apex
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
PPTX
Intro to Apache Apex @ Women in Big Data
PPTX
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Developing streaming applications with apache apex (strata + hadoop world)
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
PPTX
Introduction to Apache Apex
PPTX
Deep Dive into Apache Apex App Development
PPTX
Smart Partitioning with Apache Apex (Webinar)
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex & Bigtop
Fault-Tolerant File Input & Output
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Apache Apex Meetup at Cask
Apache Apex: Stream Processing Architecture and Applications
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apex as yarn application
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Stream Processing with Apache Apex
Architectual Comparison of Apache Apex and Spark Streaming
Intro to Apache Apex @ Women in Big Data
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Developing streaming applications with apache apex (strata + hadoop world)
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Introduction to Apache Apex
Deep Dive into Apache Apex App Development
Smart Partitioning with Apache Apex (Webinar)
Ad

Viewers also liked (20)

PPTX
Building Distributed Data Streaming System
PPTX
Apache Apex: Stream Processing Architecture and Applications
PDF
In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
PDF
ML3.4 Ponjavic Djuric Smiljanic
PPTX
탄생석과 패션
PDF
sweet_magic_final_2
PPT
Gebeurtenis
DOCX
Practica
PDF
14 higuerilla
PDF
Apache Apex as a YARN Apllication
PDF
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PPTX
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
PPTX
Mejora continua en las pymes
PPT
Presentación de Moodle
PPTX
REDES NEURONALES
PPTX
Individual and societal risk
PDF
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
PDF
Data flow vs. procedural programming: How to put your algorithms into Flink
PPT
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
PDF
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
Building Distributed Data Streaming System
Apache Apex: Stream Processing Architecture and Applications
In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode
ML3.4 Ponjavic Djuric Smiljanic
탄생석과 패션
sweet_magic_final_2
Gebeurtenis
Practica
14 higuerilla
Apache Apex as a YARN Apllication
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
Mejora continua en las pymes
Presentación de Moodle
REDES NEURONALES
Individual and societal risk
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Data flow vs. procedural programming: How to put your algorithms into Flink
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
Ad

Similar to Introduction to Apache Apex (20)

PDF
Building Your First Apache Apex Application
PDF
Building your first aplication using Apache Apex
PDF
BigDataSpain 2016: Stream Processing Applications with Apache Apex
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PDF
Building Automated Data Pipelines with Airflow.pdf
PPTX
Apache Apex - BufferServer
PDF
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PPTX
Flink in action
PPTX
Open Source Big Data Ingestion - Without the Heartburn!
PDF
php & performance
PPTX
Flink history, roadmap and vision
PDF
GopherCon IL 2020 - Web Application Profiling 101
PDF
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
PPTX
Copper: A high performance workflow engine
PPTX
Orchestration service v2
PDF
software defined network, openflow protocol and its controllers
PDF
Apache Samza 1.0 - What's New, What's Next
PDF
Cloud lunch and learn real-time streaming in azure
ODP
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Building Your First Apache Apex Application
Building your first aplication using Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Building Automated Data Pipelines with Airflow.pdf
Apache Apex - BufferServer
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Flink in action
Open Source Big Data Ingestion - Without the Heartburn!
php & performance
Flink history, roadmap and vision
GopherCon IL 2020 - Web Application Profiling 101
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
Copper: A high performance workflow engine
Orchestration service v2
software defined network, openflow protocol and its controllers
Apache Samza 1.0 - What's New, What's Next
Cloud lunch and learn real-time streaming in azure
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Nekopoi APK 2025 free lastest update
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
System and Network Administraation Chapter 3
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Nekopoi APK 2025 free lastest update
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Understanding Forklifts - TECH EHS Solution
Upgrade and Innovation Strategies for SAP ERP Customers
L1 - Introduction to python Backend.pptx
Computer Software and OS of computer science of grade 11.pptx
PTS Company Brochure 2025 (1).pdf.......
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Odoo POS Development Services by CandidRoot Solutions
Operating system designcfffgfgggggggvggggggggg
Which alternative to Crystal Reports is best for small or large businesses.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Reimagine Home Health with the Power of Agentic AI​
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
Digital Systems & Binary Numbers (comprehensive )
System and Network Administraation Chapter 3

Introduction to Apache Apex

  • 1. Apache Apex Meetup Introduction to Apache Apex Real time streaming.. Really!!! Chinmay Kolhatkar chinmay@apache.org February 13, 2016
  • 2. Apache Apex Meetup Agenda ➔ Project History ➔ What is Apache Apex? ➔ Directed Acyclic Graph (DAG) ➔ Components of DAG ➔ Windowing ➔ Operator Lifecycle ➔ Apache Apex Architecture ➔ Other features
  • 3. Apache Apex Meetup Project History ➔ Started development at DataTorrent in 2012 ➔ Open-sourced under ASF in 2015 ➔ Currently Have 50+ committers ➔ Free to use Streaming Application platform
  • 4. Apache Apex Meetup What is Apache Apex? ➔ Apex project is under Apache Software Foundation ➔ Apex is a Streaming Application platform ➔ YARN-native application ➔ Complete implementation is done in Java ➔ Consist of 2 primary components ◆ Apex Core - Engine which facilitates Real time processing ◆ Apex Malhar - Out-of-the-box operators that can be used with Apex Core
  • 5. Apache Apex Meetup ➔ Defines compute stages ➔ Defined how tuple flow over compute stages over stream Directed Acyclic Graph (DAG) Filtered Stream Output StreamTuple Tuple FilteredStream Enriched Stream Enriched Stream er Operator er Operator er Operator er Operator
  • 6. Apache Apex Meetup ➔ Smallest atomic data that flows over a stream ➔ Emitted by Operators after processing ➔ Received by next Operator for processing ➔ Java objects which are serializable ➔ Types: ◆ Data Tuple ◆ Control Tuple Components of DAG - Tuple
  • 7. Apache Apex Meetup ➔ Logical compute unit ➔ Java code which processes a tuple ➔ Runs inside a JVM ➔ Types ◆ Input Adapter ◆ Generic Operator ◆ Output Adapter Components of DAG - Operator
  • 8. Apache Apex Meetup ➔ Connect operators ➔ Channel that carries the tuples from one operator to another Components of DAG - Stream
  • 9. Apache Apex Meetup ➔ Ends of a stream ➔ Part of operator ➔ Types of ports ◆ Input Port ◆ Output Port Components of DAG - Ports
  • 10. Apache Apex Meetup Windowing ➔ Tuples divided into time slices ➔ Windows are given ids (type:long) ➔ Also called as Streaming Window ● Default 500ms
  • 11. Apache Apex Meetup ➔ Input Operator inserts control tuple ➔ Control tuple marks window boundary ➔ Different operator may be processing different windows ➔ All management activities of data happens at the boundary of window Windowing (contd…) BeginWindow Control Tuple EndWindow Control Tuple Data Tuples Window nWindow n+1 Output Adapter Input Adapter Generic Operator
  • 12. Apache Apex Meetup ➔ Called by Apex Platform ➔ Simple unit test like lifecycle ➔ Governed by control tuples ➔ All operators in DAG go through this life-cycle Operator Lifecycle
  • 13. Apache Apex Meetup ➔ Setup ◆ Start of operator lifecycle ◆ Do any initialization here ➔ beginWindow ◆ Marks starting of window ➔ endWindow ◆ Marks end of window ➔ teardown ◆ Do any finalization here ◆ End of operator lifecycle Operator Lifecycle (contd...)
  • 14. Apache Apex Meetup ➔ emitTuples ◆ Called for Input Adapters ◆ Called in an infinite while loop by platform ➔ process ◆ Called for Generic Operators and Output Adapters ◆ Associated to to a port ◆ Called for every incoming tuple Operator Lifecycle (contd...)
  • 15. Apache Apex Meetup ➔ OutputPort::emit ◆ Special method not part of operator lifecycle ◆ To be called by operator code ◆ Emits the tuples to next operator ◆ Bound by Window Operator Lifecycle (contd...)
  • 16. Apache Apex Meetup Apache Apex Architecture Machine nodes (Physical or Virtual) Hadoop (YARN) Distributed File System (e.g. HDFS) Apache Apex Core (Streaming Engine) Streaming Application Streaming Application RESTAPI External Data Sources Apache Apex Malhar (Reusable Operators, Connectors) Custom Operators
  • 17. Apache Apex Meetup ➔ Ease of Use ➔ Locality ➔ Fault Tolerance ➔ Scalability ◆ Partitioning ◆ Auto-scaling Other features of platform
  • 18. Apache Apex Meetup ● Apache Apex Page ○ http://apex.incubator.apache.org ● Mailing Lists ○ dev@apex.incubator.apache.org ○ users@apex.incubator.apache.org ● Repository ○ https://github.com/apache/incubator-apex-core ○ https://github.com/apache/incubator-apex-malhar ● Issue Tracking ○ https://issues.apache.org/jira/browse/APEXCORE ○ https://issues.apache.org/jira/browse/APEXMALHAR Resources ● @ApacheApex ● /groups/7020520
  • 19. Apache Apex Meetup Apex in Distributed Environment Hadoop Edge Node dtManage (Web UI) Hadoop Node YARN Container App Master Hadoop Node YARN Container YARN Container YARN Container Thread1 Op2 Op1 Thread-N Op3 Worker Container Hadoop Node YARN Container YARN Container YARN Container Thread1 Op2 Op1 Thread-N Op3 Worker Container CLI dtGateway (REST API) Part of DataTorrent RTS dtGateway (REST API) dtManage (Web UI) Web Browser
  • 20. Apache Apex Meetup ➔ AT_LEAST_ONCE (default) ◆ Windows are processed at least once ➔ AT_MOST_ONCE ◆ Windows are processed at most once ➔ EXACTLY_ONCE ◆ Windows are processed exactly once Processing Modes
  • 21. Apache Apex Meetup ➔ Saves operator state on HDFS ➔ Each operator undergoes checkpointing ➔ Done by platform ➔ Happens every 60 streaming windows by default i.e. 30 sec. ➔ Checkpoint is named by the windowId at which it happens ➔ If all operators gets checkpointed at same window, that checkpointed state becomes “committed” state of application ➔ Committed state is used for recovery in case of failure Checkpointing