Building YARN Applications

© 2015 DataTorrent
Akshay Gore, Bhupesh Chawda
DataTorrent
Apex Hands-on Lab - Into the code!
Getting started with your first Apex Application!

© 2015 DataTorrent
Operators
• Input Adaptor Vs
Generic Operators ?
• What are streams?
• What are ports?

© 2015 DataTorrent
Apex Operator Lifecycle

© 2015 DataTorrent
Apex Streaming Application
public class Application implements StreamingApplication
{
populateDAG(DAG dag, Configuration conf)
{
// Add Operators to dag - dag.addOperator(args)
// Add Streams between operators - dag.addStream(args)
// Additional config + Hints to YARN - Optional
}
}

© 2015 DataTorrent
Apex Application - FilterWords
Apex Application DAG
• Problem statement - Filter words in the file
ᵒ Read a file located on HDFS
ᵒ Split each line into words, check if it is not one of the forbidden words and write it
down to HDFS
HDFS
Lines Filtered Words
HDFS

© 2015 DataTorrent
FilterWords Application DAG
Reader Tokenize Processor Writter
Input
Operator
(Adapter)
Output
Operator
(Adapter)
Generic
Operators
HDFS HDFS
Lines Words
Filtered
Words

© 2015 DataTorrent
Prerequisites
• JAVA 1.7 or above
• Maven 3.0 or above
• Apache Apex projects:
ᵒ Apache Apex Core: core platform, engine
ᵒ Apache Apex Malhar: operators library
• Hadoop cluster in running state
• Your favourite IDE - Eclipse / vi

© 2015 DataTorrent
Demo time!
• Apex application structure
• Application code walk through
• How to execute the application
• Assignment

© 2015 DataTorrent
Assignment - WordCount
Apex Application DAG
• Problem statement - Count occurrences of words in a file
ᵒ Read a file located on HDFS
ᵒ Emit count at the end of the every window and writes into HDFS
HDFS
Lines <Word, Count>
HDFS

© 2015 DataTorrent
Assignment - Word Count Application DAG
Reader Tokenize
Counter
Output
HDFS HDFS
Lines Words
<Word,
count>

© 2015 DataTorrent
Assignment - What you need to do
Reader Tokenizer Processor Writer
String String String
Line Words Words’
Counter Writer
Map
{Word: Count}
Assignment

© 2015 DataTorrent
Assignment - Hints
• Create copy of Processor.java. Name it Counter.java
• Modify Counter.java as follows:
ᵒ Define a data structure which can hold counts for words
ᵒ Process method of input port must count the occurrences
ᵒ Clear the counts in beginWindow() call
ᵒ Emit the counts in endWindow() call

© 2015 DataTorrent
Solution - Changes to Counter.java
• Need to define a data structure which can hold counts for words
private HashMap<String, Integer> counts = new HashMap<>();
• Process method of input port must count the occurrences
if(counts.containsKey(refinedWord)) {
counts.put(refinedWord, counts.get(refinedWord) + 1);
} else {
counts.put(refinedWord, 1);
}
● Clear the counts in beginWindow call
counts.clear();
● Emit the counts in endWindow call
output.emit(counts.toString());
● Run Application Test

© 2015 DataTorrent
Assignment - Are we done yet?
• Change the DAG
ᵒ Replace Processor operator with the newly created operator - Counter

© 2015 DataTorrent
Assignment - Slight change
• We are emitting a Map. However it is still a string.
ᵒ Change type of output port of Counter to type Map
ᵒ Change type of input port of Writer to Map
ᵒ Make appropriate changes to Writer to read a Map and write in a format such that
each line belongs to a single word.

© 2015 DataTorrent
Assignment - Final change
• Change the code such that each count is the overall count, not just for each
window?

© 2015 DataTorrent
Summary - Recap
• Writing Apache Apex operators
• Chaining the operators into an Apache Apex application
• Executing the application on the Apache Apex platform

© 2015 DataTorrent
Where to go from here?
Apache Apex Documentation - http://apex.incubator.apache.org/docs.html
Apache Apex Core Git - https://github.com/apache/incubator-apex-core
Apache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar
Join Users Mailing List - users-subscribe@apex.incubator.apache.org
Join Dev Mailing List - dev-subscribe@apex.incubator.apache.org
Send queries to Users Mailing List - users@apex.incubator.apache.org
Send queries to Dev Mailing List - dev@apex.incubator.apache.org

Building YARN Applications

More Related Content

What's hot (18)

Similar to Building YARN Applications (11)

More from Apache Apex (20)

Recently uploaded (20)

Building YARN Applications

Editor's Notes