SlideShare a Scribd company logo
© 2015 DataTorrent
Munagala V. Ramanath (“Ram”) <ram@datatorrent.com>
Dec 15th, 2015
Building your First Apache Apex
Application
© 2015 DataTorrent
Outline
Main concepts of an Apex Application.
Brief description of the "Sorted Word Count" application.
Hands on demonstration of cloning the source repository and building
Apex source code.
Hands on demonstration of creating a new application.
Running the application.
Code walk-through.
Questions
© 2015 DataTorrent
Main Concepts
Applications are built from Operators which implement the Operator interface;
each operator has input/output ports which are connected by streams to form a
directed acyclic graph (DAG).
A BaseOperator class is provided which provides empty implementations of all the
required methods.
Within an operator, define necessary input and output ports typically using the
DefaultInputPort and DefaultOutputPort classes.
The Application class implements the StreamingApplication interface; need only
implement populateDAG() method which wires the operators together.
Applications process data within time-based windows, typically 0.5s.
© 2015 DataTorrent
The Sorted Word Count Application
The following operators are involved:
LineReader: reads file dropped into input directory and outputs lines (on its
output port).
WordReader: splits each line into words using a regex.
WindowWordCount: compute and emit word frequencies for all words in lines
processed in current window.
FileWordCount: accumulates all word counts for current file and emits final
sorted list when EOF is reached.
WordCountWriter: writes list to output file in output directory.
© 2015 DataTorrent
The DAG
© 2015 DataTorrent
Resources
6
Apache Apex Community Page
Apache Apex LinkedIn Group
© 2015 DataTorrent
End
7

More Related Content

PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PPTX
Deep Dive into Apache Apex App Development
PPTX
Intro to Apache Apex @ Women in Big Data
PDF
Extending The Yahoo Streaming Benchmark to Apache Apex
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Deep Dive into Apache Apex App Development
Intro to Apache Apex @ Women in Big Data
Extending The Yahoo Streaming Benchmark to Apache Apex

What's hot (20)

PDF
Apex as yarn application
PPTX
Introduction to Apache Apex
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
PPTX
Java High Level Stream API
PPTX
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
PDF
Developing streaming applications with apache apex (strata + hadoop world)
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
PPTX
University program - writing an apache apex application
PPTX
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
PPTX
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
PPTX
Apache Apex Meetup at Cask
PPTX
Apache Apex: Stream Processing Architecture and Applications
PPTX
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Building your first aplication using Apache Apex
PPTX
Apache Apex Fault Tolerance and Processing Semantics
PPTX
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
PPTX
DataTorrent Presentation @ Big Data Application Meetup
PPTX
Smart Partitioning with Apache Apex (Webinar)
PPTX
Introduction to Apache Apex
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
Apex as yarn application
Introduction to Apache Apex
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Java High Level Stream API
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Developing streaming applications with apache apex (strata + hadoop world)
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
University program - writing an apache apex application
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Apache Apex Meetup at Cask
Apache Apex: Stream Processing Architecture and Applications
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Building your first aplication using Apache Apex
Apache Apex Fault Tolerance and Processing Semantics
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
DataTorrent Presentation @ Big Data Application Meetup
Smart Partitioning with Apache Apex (Webinar)
Introduction to Apache Apex
Architectual Comparison of Apache Apex and Spark Streaming
Ad

Viewers also liked (20)

PPTX
HDFS Internals
PPTX
Hadoop Interacting with HDFS
PPTX
Introduction to Real-Time Data Processing
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PPT
Цветочные легенды
PPT
Римский корсаков снегурочка
PPTX
High Performance Distributed Systems with CQRS
PPTX
правописание приставок урок№4
PPTX
бсп (обоб. урок)
PDF
Troubleshooting mysql-tutorial
PDF
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
PDF
Windowing in Apache Apex
PDF
The 5 People in your Organization that grow Legacy Code
PDF
Hadoop File System Shell Commands,
DOCX
Hadoop basic commands
PPTX
Introduction to Apache Apex and writing a big data streaming application
PDF
Build your shiny new pc, with Pangoly
PDF
Hadoop Internals (2.3.0 or later)
PDF
Introduction to UNIX Command-Lines with examples
PPTX
Introduction to Map Reduce
HDFS Internals
Hadoop Interacting with HDFS
Introduction to Real-Time Data Processing
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Цветочные легенды
Римский корсаков снегурочка
High Performance Distributed Systems with CQRS
правописание приставок урок№4
бсп (обоб. урок)
Troubleshooting mysql-tutorial
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Windowing in Apache Apex
The 5 People in your Organization that grow Legacy Code
Hadoop File System Shell Commands,
Hadoop basic commands
Introduction to Apache Apex and writing a big data streaming application
Build your shiny new pc, with Pangoly
Hadoop Internals (2.3.0 or later)
Introduction to UNIX Command-Lines with examples
Introduction to Map Reduce
Ad

More from Apache Apex (9)

PDF
Low Latency Polyglot Model Scoring using Apache Apex
PPTX
Introduction to Yarn
PPTX
Intro to Big Data Hadoop
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
PPTX
Ingestion and Dimensions Compute and Enrich using Apache Apex
PPTX
Apache Beam (incubating)
PPTX
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
PPTX
Apache Apex & Bigtop
PDF
Building Your First Apache Apex Application
Low Latency Polyglot Model Scoring using Apache Apex
Introduction to Yarn
Intro to Big Data Hadoop
Big Data Berlin v8.0 Stream Processing with Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Beam (incubating)
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Apache Apex & Bigtop
Building Your First Apache Apex Application

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Review of recent advances in non-invasive hemoglobin estimation

Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application

  • 1. © 2015 DataTorrent Munagala V. Ramanath (“Ram”) <ram@datatorrent.com> Dec 15th, 2015 Building your First Apache Apex Application
  • 2. © 2015 DataTorrent Outline Main concepts of an Apex Application. Brief description of the "Sorted Word Count" application. Hands on demonstration of cloning the source repository and building Apex source code. Hands on demonstration of creating a new application. Running the application. Code walk-through. Questions
  • 3. © 2015 DataTorrent Main Concepts Applications are built from Operators which implement the Operator interface; each operator has input/output ports which are connected by streams to form a directed acyclic graph (DAG). A BaseOperator class is provided which provides empty implementations of all the required methods. Within an operator, define necessary input and output ports typically using the DefaultInputPort and DefaultOutputPort classes. The Application class implements the StreamingApplication interface; need only implement populateDAG() method which wires the operators together. Applications process data within time-based windows, typically 0.5s.
  • 4. © 2015 DataTorrent The Sorted Word Count Application The following operators are involved: LineReader: reads file dropped into input directory and outputs lines (on its output port). WordReader: splits each line into words using a regex. WindowWordCount: compute and emit word frequencies for all words in lines processed in current window. FileWordCount: accumulates all word counts for current file and emits final sorted list when EOF is reached. WordCountWriter: writes list to output file in output directory.
  • 6. © 2015 DataTorrent Resources 6 Apache Apex Community Page Apache Apex LinkedIn Group