Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
What is Oozie
Oozie is a workflow scheduler system to manage
Apache Hadoop jobs.
Its a system for running workflows of dependent jobs
Oozie Workflow jobs are Directed Acyclical Graphs
(DAGs) of actions.
Oozie is integrated with the rest of the Hadoop stack
supporting several types of Hadoop jobs out of the
box (such as Java map-reduce, Streaming map-reduce,
Pig, Hive, Sqoop and Distcp) as well as system specific
jobs (such as Java programs and shell scripts).
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Features
Designed to scale
Can manage the timely execution of thousands of
workflows in a Hadoop cluster
Makes rerunning failed workflows more tractable
Runs as a service in the cluster
Clients can submit workflow definitions for
immediate or later execution
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie in Hadoop Eco-System
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Components
Composed of 2 parts:
• Workflow engine
 Stores and runs workflows composed of different types of
Hadoop jobs
• Coordinator engine
 Runs workflow jobs based on predefined schedules and data
availability
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow
Workflow is a DAG(Directed Acyclic Graph) of
action nodes and control-flow nodes.
Action node
• performs a workflow task, such as moving files in HDFS,
running a MapReduce, Streaming, Pig, or Hive job
Control-flow node
• governs the workflow execution between actions
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Schedular
Oozie executes workflow based on:
• Time Dependency (Frequency)
• Data Dependency
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Server Setup
Oozie is distributed as two separate packages, a
client package (oozie-client) and a server package
(oozie).
We will install oozie server which also installs
oozie-client.
$ yum –y install oozie
When you install Oozie from an RPM, Oozie server
creates all configuration, documentation and
runtime files in the standard Unix directories, as
follows:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Server Setup
Type of File Where installed
Binaries /usr/lib/oozie/
Configuration /etc/oozie/conf/
Documentation /user/share/doc/oozie/
Examples /user/share/doc/oozie/
Sharelib TAR.GZ /usr/lib/oozie/
Data /var/lib/oozie/
Logs /var/log/oozie/
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Oozie needs a database to store all the workflow
job information
We will be configuring it to use Mysql as database
Step 1: Install and start MySQL 5.x
$ yum –y install mysql-server
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Step 2: Create the Oozie database and Oozie
MySQL user
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Step 3: Configure Oozie to use MySQL
• Edit properties in the oozie-site.xml file as follows:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
Step 4: Add the MySQL JDBC driver JAR to Oozie
• $ ln -s /usr/share/java/mysql-connector-java.jar
/var/lib/oozie/mysql-connector-java.jar
Step 5:Creating the Oozie Database Schema
After configuring Oozie database information and
creating the corresponding database, create the
Oozie database schema. Oozie provides a database
tool for this purpose.
• $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create –
run
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring Oozie to Use MySQL
You should see output such as the following:
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Enabling the Oozie Web Console
By default Oozie does not enable web console.
Following steps must be followed to enable it
Step 1: Download the Library
• $ wget http://dev.sencha.com/deploy/ext-2.2.zip
Step 2: Install the Library
• $ unzip ext-2.2.zip
• $ cp -r ext-2.2 /var/lib/oozie/
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Installing the Oozie ShareLib in HDFS
The Oozie installation bundles Oozie ShareLib, which
contains all of the necessary JARs to enable workflow
jobs to run streaming, DistCp, Pig, Hive, and Sqoop
actions.
ShareLib must be copied in the home directory of
oozie user in HDFS:
• $ sudo –u hdfs hadoop fs –mkdir /user/oozie
• $ sudo –u hdfs hadoop fs –chown oozie:oozie /user/oozie
• $ mkdir /tmp/ooziesharelib
• $ cd /tmp/ooziesharelib
• $ tar –xzf /user/lib/oozie/oozie-sharelib.tar.gz
• $ sudo –u oozie hadoop fs –put share /user/oozie/share
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Starting, Stopping, and Accessing the
Oozie Server
Starting the Oozie Server
• $ service oozie start
Stopping the Oozie Server
• $ service oozie stop
Accessing the Oozie Server with the Oozie Client
• The Oozie client is a command-line utility that interacts with the Oozie server
via the Oozie web-services API
• Use the /usr/bin/oozie script to run the Oozie client.
• For example, if you want to invoke the client on the same machine where the
Oozie server is running:
• $ oozie admin –oozie http://localhost:11000/oozie -status
– System mode: NORMAL
Accessing the Oozie Server with a Browser
• If you have enabled the Oozie web console by adding the ExtJS library, you can
connect to the console at
• http://localhost:11000/oozie
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Defining an Oozie Workflow
Workflow definitions are written in XML using the
Hadoop Process Definition Language
Consists of 2 components
• Control Node
 Start
 End
 Decision
 Fork
 Join
 Kill
• Action Node
 Map-reduce
 Pig, etc..
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Start Control Node
• The start node is the entry point for a workflow job
• It indicates the first workflow node the workflow job must
transition to
• When a workflow is started, it automatically transitions to the
node specified in the start
• A workflow definition must have one start node
Syntax
 <workflow-app name="[WF-DEF-NAME]"
xmlns="uri:oozie:workflow:0.1">
 ...
 <start to="[NODE-NAME]"/>
 ...
 </workflow-app>
The node
name(action) from
which the
workflow should
start
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
End Control Node
• The end node is the end for a workflow job
• Indicates that the workflow job has completed successfully
• When a workflow job reaches the end it finishes successfully
• If one or more actions started by the workflow job are executing
when the end node is reached, the actions will be killed
• A workflow definition must have one end node.
Syntax
 <workflow-app name="[WF-DEF-NAME]"
xmlns="uri:oozie:workflow:0.1">
 ...
 <end name="[NODE-NAME]"/>
 ...
 </workflow-app>
The node
name(action) on
which the
workflow should
end
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Kill Control Node
• The kill node allows a workflow job to kill itself
• When a workflow job reaches the kill it finishes in error
• If one or more actions started by the workflow job are executing
when the kill node is reached, the actions will be killed
• A workflow definition may have zero or more kill nodes
Syntax
 <workflow-app name="[WF-DEF-NAME]"
xmlns="uri:oozie:workflow:0.1">
 ...
 <kill name="[NODE-NAME]">
 <message>[MESSAGE-TO-LOG]</message>
 </kill>
 ...
 </workflow-app>
If the workflow
execution reaches
this node the
workflow will be
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Decision Control Node
• Enables a workflow to make a selection on the execution path to
follow
• The behavior of a decision node can be seen as a switch-case
statement
• Predicates are evaluated in order or appearance until one of them
evaluates to true and the corresponding transition is taken
• If none of the predicates evaluates to true the default transition is
taken
Syntax
 <decision name="[NODE-NAME]">
 <switch> <case to="[NODE_NAME]">[PREDICATE]</case>
 ...
 <case to="[NODE_NAME]">[PREDICATE]</case>
 <default to="[NODE_NAME]"/>
 </switch> </decision>
Switch case to
decide between
the execution of
nodes
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Control Flow Nodes
Fork and Join Control Nodes
• A fork node splits one path of execution into multiple concurrent
paths of execution
• A join node waits until every concurrent execution path of a
previous fork node arrives to it
• The fork and join nodes must be used in pairs
• Actions at fork runs parallel
Syntax
 <fork name="[FORK-NODE-NAME]">
 <path start="[NODE-NAME]" />
 ...
 <path start="[NODE-NAME]" />
 </fork>
 <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Action Basis
• Action Computation/Processing is always remote
• Actions are Asynchronous
• Actions have two transitions, ok and error
• Action Recovery
 Oozie provides recovery capabilities when starting or ending
actions
 Recovery strategies differ on the nature of failure
 For non-transient failures action is suspended
 For transient failures Oozie will perform retries after a fixed
time interval
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Fs(HDFS) Action
• The fs action allows to manipulate files and directories in HDFS from a
workflow application
• The supported commands are move , delete and mkdir
• The FS commands are executed synchronously from within the FS action
• Syntax
 <action name="[NODE-NAME]">
 <fs>
 <delete path='[PATH]'/>
 ...
 <mkdir path='[PATH]'/>
 ...
 <move source='[SOURCE-PATH]' target='[TARGET-PATH]'/>
 </fs> <ok to="[NODE-NAME]"/>
 <error to="[NODE-NAME]"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Pig Action
• The pig action starts a Pig job
• The workflow job will wait until the pig job completes
before continuing to the next action
• The pig action has to be configured with the job-tracker,
name-node, pig script and the necessary parameters and
configuration to run the Pig job.
• The configuration properties are loaded in the following
order, job-xml and configuration , and later values override
earlier values.
• Hadoop mapred.job.tracker and fs.default.name properties
must not be present in the job-xml and inline configuration
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
Pig Action
• Syntax
 <pig>
 <job-tracker>[JOB-TRACKER]</job-tracker>
 <name-node>[NAME-NODE]</name-node>
 <prepare> <delete path="[PATH]"/>
 ... <mkdir path="[PATH]"/> ... </prepare>
 <job-xml>[JOB-XML-FILE]</job-xml>
necessary
configuration
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Workflow Action Nodes
 <configuration>
 <property>
 <name>[PROPERTY-NAME]</name> <value>[PROPERTY-
VALUE]</value>
 </property>
 ... </configuration>
 <script>[PIG-SCRIPT]</script>
 <param>[PARAM-VALUE]</param>
 ... <param>[PARAM-VALUE]</param>
 <argument>[ARGUMENT-VALUE]</argument>
 ... <argument>[ARGUMENT-VALUE]</argument>
 <file>[FILE-PATH]</file>
 ... <archive>[FILE-PATH]</archive>
 ... </pig>
Cluster
wide
configura
tion
Pig script, its
parameters
and arguments
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Oozie Job States
A workflow job can have be in any of the following states:
• PREP: When a workflow job is first created it will be in PREP state. The
workflow job is defined but it is not running.
• RUNNING: When a CREATED workflow job is started it goes
into RUNNING state, it will remain in RUNNING state while it does not
reach its end state, ends in error or it is suspended.
• SUSPENDED: A RUNNING workflow job can be suspended, it will
remain in SUSPENDED state until the workflow job is resumed or it is
killed.
• SUCCEEDED: When a RUNNING workflow job reaches the end node it
ends reaching the SUCCEEDED final state.
• KILLED: When a CREATED , RUNNING or SUSPENDED workflow job is
killed by an administrator or the owner via a request to Oozie the
workflow job ends reaching the KILLED final state.
• FAILED: When a RUNNING workflow job fails due to an unexpected
error it ends reaching the FAILED final state.
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Example
$ cp /usr/share/doc/oozie-3.3.2+49/oozie-
examples.tar.gz .
$ tar -xvf oozie-examples.tar.gz
$ hadoop fs -put examples/ .
$ cd examples/apps/pig/
$ oozie job --oozie http://localhost:11000/oozie
-config job.properties –run
$ oozie job -oozie http://localhost:11000/oozie
-info <job_id>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
Pig Script
• $ cat id.pig
 A = load '$INPUT' using PigStorage(':');
 B = foreach A generate $0 as id;
 store B into '$OUTPUT' USING PigStorage();
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
Workflow xml
• $ cat workflow.xml
 <workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">
 <start to="pig-node"/>
 <action name="pig-node">
 <pig>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
 </prepare>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
 <property>
 <name>mapred.compress.map.output</name>
 <value>true</value>
 </property>
 </configuration>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
 <script>id.pig</script>
 <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-
data/text</param>
 <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-
data/pig</param>
 </pig>
 <ok to="end"/>
 <error to="fail"/>
 </action>
 <kill name="fail">
 <message>Pig failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
 </kill>
 <end name="end"/>
 </workflow-app>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Understand the Example
$ cat job.properties
 nameNode=hdfs://localhost:8020
 jobTracker=localhost:8021
 queueName=default
 examplesRoot=examples
 oozie.use.system.libpath=true
 oozie.wf.application.path=${nameNode}/user/${user.name}/$
{examplesRoot}/apps/pig
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
$ cd /root/examples/apps/demo
$ cat workflow.xml
 <workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf">
 <start to="cleanup-node"/>
 <action name="cleanup-node">
 <fs>
 <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo"/>
 </fs>
 <ok to="fork-node"/>
 <error to="fail"/>
 </action>
 <fork name="fork-node">
 <path start="pig-node"/>
 <path start="streaming-node"/>
 </fork>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <action name="pig-node">
 <pig>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/pig-node"/>
 </prepare>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.map.output.compress</name>
 <value>false</value>
 </property>
 </configuration>
 <script>id.pig</script>
 <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-
data/text</param>

<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-
data/demo/pig-node</param>
 </pig>
 <ok to="join-node"/>
 <error to="fail"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <action name="streaming-node">
 <map-reduce>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/streaming-node"/>
 </prepare>
 <streaming>
 <mapper>/bin/cat</mapper>
 <reducer>/usr/bin/wc</reducer>
 </streaming>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.input.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/input-
data/text</value>
 </property>
 <property>
 <name>mapred.output.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/output-
data/demo/streaming-node</value>
 </property>
 </configuration>
 </map-reduce>
 <ok to="join-node"/>
 <error to="fail"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <join name="join-node" to="mr-node"/>
 <action name="mr-node">
 <map-reduce>
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <prepare>
 <delete
path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/mr-node"/>
 </prepare>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.mapper.class</name>

<value>org.apache.oozie.example.DemoMapper</value>
 </property>
 <property>
 <name>mapred.mapoutput.key.class</name>
 <value>org.apache.hadoop.io.Text</value>
 </property>
 <property>
 <name>mapred.mapoutput.value.class</name>
 <value>org.apache.hadoop.io.IntWritable</value>
 </property>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <property>
 <name>mapred.reducer.class</name>
 <value>org.apache.oozie.example.DemoReducer</value>
 </property>
 <property>
 <name>mapred.map.tasks</name>
 <value>1</value>
 </property>
 <property>
 <name>mapred.input.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/pig-
node,/user/${wf:user()}/${examplesRoot}/output-data/demo/streaming-node</value>
 </property>
 <property>
 <name>mapred.output.dir</name>
 <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/mr-node</value>
 </property>
 </configuration>
 </map-reduce>
 <ok to="decision-node"/>
 <error to="fail"/>
 </action>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <decision name="decision-node">
 <switch>
 <case to="hdfs-
node">${fs:exists(concat(concat(concat(concat(concat(name
Node, '/user/'), wf:user()), '/'), examplesRoot), '/output-
data/demo/mr-node')) == "true"}</case>
 <default to="end"/>
 </switch>
 </decision>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
 <action name="hdfs-node">
 <fs>
 <move source="${nameNode}/user/${wf:user()}/${examplesRoot}/output-
data/demo/mr-node"
 target="/user/${wf:user()}/${examplesRoot}/output-data/demo/final-
data"/>
 </fs>
 <ok to="end"/>
 <error to="fail"/>
 </action>
 <kill name="fail">
 <message>Demo workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
 </kill>
 <end name="end"/>
 </workflow-app>
Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
A Workflow Job
• At the end of the Job Completion you will see
something like this:

More Related Content

PPTX
Apache Oozie
PDF
HDFS Architecture
PDF
Apache Flume
PPTX
MapReduce Programming Model
PPTX
Introduction to ML with Apache Spark MLlib
PDF
Hadoop ecosystem
PPTX
Apache Spark.
PDF
Getting The Best Performance With PySpark
Apache Oozie
HDFS Architecture
Apache Flume
MapReduce Programming Model
Introduction to ML with Apache Spark MLlib
Hadoop ecosystem
Apache Spark.
Getting The Best Performance With PySpark

What's hot (20)

PPTX
Introduction to YARN and MapReduce 2
PPTX
Docker 101 : Introduction to Docker and Containers
PPTX
Introduction to kubernetes
PPTX
Docker.pptx
PDF
Docker Introduction
PDF
Kubernetes architecture
PPTX
Containers and Docker
PDF
Présentation docker et kubernetes
PPTX
Need for Time series Database
PDF
BigData_TP2: Design Patterns dans Hadoop
PDF
BigData_Chp2: Hadoop & Map-Reduce
PDF
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
PDF
Traitement distribue en BIg Data - KAFKA Broker and Kafka Streams
PDF
Docker Registry V2
PDF
Docker 101: Introduction to Docker
PPTX
DevOps 101 - an Introduction to DevOps
PDF
Microservices avec Spring Cloud
PPT
presentation on Docker
PDF
Docker in real life
Introduction to YARN and MapReduce 2
Docker 101 : Introduction to Docker and Containers
Introduction to kubernetes
Docker.pptx
Docker Introduction
Kubernetes architecture
Containers and Docker
Présentation docker et kubernetes
Need for Time series Database
BigData_TP2: Design Patterns dans Hadoop
BigData_Chp2: Hadoop & Map-Reduce
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
Traitement distribue en BIg Data - KAFKA Broker and Kafka Streams
Docker Registry V2
Docker 101: Introduction to Docker
DevOps 101 - an Introduction to DevOps
Microservices avec Spring Cloud
presentation on Docker
Docker in real life
Ad

Similar to Hadoop Oozie (20)

PDF
Oozie @ Riot Games
PDF
October 2013 HUG: Oozie 4.x
PDF
Oozie Summit 2011
ODT
Language Resource Processing Configuration and Run
PPTX
Apache Oozie Workflow Scheduler - Module 10
PDF
Oozie sweet
PDF
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Everything you wanted to know, but were afraid to ask about Oozie
PPTX
July 2012 HUG: Overview of Oozie Qualification Process
PPTX
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
PDF
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
PPT
Workflow on Hadoop Using Oozie__HadoopSummit2010
PDF
Oozie Hug May 2011
PPTX
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
PDF
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
PDF
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
PPTX
August 2016 HUG: Recent development in Apache Oozie
PDF
Apache Oozie The Workflow Scheduler for Hadoop 1st Edition Mohammad Kamrul Islam
PPTX
Building and managing complex dependencies pipeline using Apache Oozie
Oozie @ Riot Games
October 2013 HUG: Oozie 4.x
Oozie Summit 2011
Language Resource Processing Configuration and Run
Apache Oozie Workflow Scheduler - Module 10
Oozie sweet
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Everything you wanted to know, but were afraid to ask about Oozie
July 2012 HUG: Overview of Oozie Qualification Process
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
AI&BigData Lab. Александр Конопко "Celos: оркестрирование и тестирование зада...
Workflow on Hadoop Using Oozie__HadoopSummit2010
Oozie Hug May 2011
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
August 2016 HUG: Recent development in Apache Oozie
Apache Oozie The Workflow Scheduler for Hadoop 1st Edition Mohammad Kamrul Islam
Building and managing complex dependencies pipeline using Apache Oozie
Ad

Recently uploaded (20)

PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PPTX
Current and future trends in Computer Vision.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Amdahl’s law is explained in the above power point presentations
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PPTX
communication and presentation skills 01
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
PDF
737-MAX_SRG.pdf student reference guides
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
Current and future trends in Computer Vision.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Amdahl’s law is explained in the above power point presentations
"Array and Linked List in Data Structures with Types, Operations, Implementat...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
August -2025_Top10 Read_Articles_ijait.pdf
Visual Aids for Exploratory Data Analysis.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
communication and presentation skills 01
Soil Improvement Techniques Note - Rabbi
Exploratory_Data_Analysis_Fundamentals.pdf
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
737-MAX_SRG.pdf student reference guides

Hadoop Oozie

  • 1. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 What is Oozie Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Its a system for running workflows of dependent jobs Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
  • 2. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Features Designed to scale Can manage the timely execution of thousands of workflows in a Hadoop cluster Makes rerunning failed workflows more tractable Runs as a service in the cluster Clients can submit workflow definitions for immediate or later execution
  • 3. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie in Hadoop Eco-System
  • 4. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Components Composed of 2 parts: • Workflow engine  Stores and runs workflows composed of different types of Hadoop jobs • Coordinator engine  Runs workflow jobs based on predefined schedules and data availability
  • 5. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Workflow is a DAG(Directed Acyclic Graph) of action nodes and control-flow nodes. Action node • performs a workflow task, such as moving files in HDFS, running a MapReduce, Streaming, Pig, or Hive job Control-flow node • governs the workflow execution between actions
  • 6. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Schedular Oozie executes workflow based on: • Time Dependency (Frequency) • Data Dependency
  • 7. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Server Setup Oozie is distributed as two separate packages, a client package (oozie-client) and a server package (oozie). We will install oozie server which also installs oozie-client. $ yum –y install oozie When you install Oozie from an RPM, Oozie server creates all configuration, documentation and runtime files in the standard Unix directories, as follows:
  • 8. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Server Setup Type of File Where installed Binaries /usr/lib/oozie/ Configuration /etc/oozie/conf/ Documentation /user/share/doc/oozie/ Examples /user/share/doc/oozie/ Sharelib TAR.GZ /usr/lib/oozie/ Data /var/lib/oozie/ Logs /var/log/oozie/
  • 9. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Oozie needs a database to store all the workflow job information We will be configuring it to use Mysql as database Step 1: Install and start MySQL 5.x $ yum –y install mysql-server
  • 10. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Step 2: Create the Oozie database and Oozie MySQL user
  • 11. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Step 3: Configure Oozie to use MySQL • Edit properties in the oozie-site.xml file as follows:
  • 12. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL Step 4: Add the MySQL JDBC driver JAR to Oozie • $ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar Step 5:Creating the Oozie Database Schema After configuring Oozie database information and creating the corresponding database, create the Oozie database schema. Oozie provides a database tool for this purpose. • $ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create – run
  • 13. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Configuring Oozie to Use MySQL You should see output such as the following:
  • 14. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Enabling the Oozie Web Console By default Oozie does not enable web console. Following steps must be followed to enable it Step 1: Download the Library • $ wget http://dev.sencha.com/deploy/ext-2.2.zip Step 2: Install the Library • $ unzip ext-2.2.zip • $ cp -r ext-2.2 /var/lib/oozie/
  • 15. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Installing the Oozie ShareLib in HDFS The Oozie installation bundles Oozie ShareLib, which contains all of the necessary JARs to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions. ShareLib must be copied in the home directory of oozie user in HDFS: • $ sudo –u hdfs hadoop fs –mkdir /user/oozie • $ sudo –u hdfs hadoop fs –chown oozie:oozie /user/oozie • $ mkdir /tmp/ooziesharelib • $ cd /tmp/ooziesharelib • $ tar –xzf /user/lib/oozie/oozie-sharelib.tar.gz • $ sudo –u oozie hadoop fs –put share /user/oozie/share
  • 16. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Starting, Stopping, and Accessing the Oozie Server Starting the Oozie Server • $ service oozie start Stopping the Oozie Server • $ service oozie stop Accessing the Oozie Server with the Oozie Client • The Oozie client is a command-line utility that interacts with the Oozie server via the Oozie web-services API • Use the /usr/bin/oozie script to run the Oozie client. • For example, if you want to invoke the client on the same machine where the Oozie server is running: • $ oozie admin –oozie http://localhost:11000/oozie -status – System mode: NORMAL Accessing the Oozie Server with a Browser • If you have enabled the Oozie web console by adding the ExtJS library, you can connect to the console at • http://localhost:11000/oozie
  • 17. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Defining an Oozie Workflow Workflow definitions are written in XML using the Hadoop Process Definition Language Consists of 2 components • Control Node  Start  End  Decision  Fork  Join  Kill • Action Node  Map-reduce  Pig, etc..
  • 18. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Start Control Node • The start node is the entry point for a workflow job • It indicates the first workflow node the workflow job must transition to • When a workflow is started, it automatically transitions to the node specified in the start • A workflow definition must have one start node Syntax  <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">  ...  <start to="[NODE-NAME]"/>  ...  </workflow-app> The node name(action) from which the workflow should start
  • 19. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes End Control Node • The end node is the end for a workflow job • Indicates that the workflow job has completed successfully • When a workflow job reaches the end it finishes successfully • If one or more actions started by the workflow job are executing when the end node is reached, the actions will be killed • A workflow definition must have one end node. Syntax  <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">  ...  <end name="[NODE-NAME]"/>  ...  </workflow-app> The node name(action) on which the workflow should end
  • 20. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Kill Control Node • The kill node allows a workflow job to kill itself • When a workflow job reaches the kill it finishes in error • If one or more actions started by the workflow job are executing when the kill node is reached, the actions will be killed • A workflow definition may have zero or more kill nodes Syntax  <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">  ...  <kill name="[NODE-NAME]">  <message>[MESSAGE-TO-LOG]</message>  </kill>  ...  </workflow-app> If the workflow execution reaches this node the workflow will be
  • 21. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Decision Control Node • Enables a workflow to make a selection on the execution path to follow • The behavior of a decision node can be seen as a switch-case statement • Predicates are evaluated in order or appearance until one of them evaluates to true and the corresponding transition is taken • If none of the predicates evaluates to true the default transition is taken Syntax  <decision name="[NODE-NAME]">  <switch> <case to="[NODE_NAME]">[PREDICATE]</case>  ...  <case to="[NODE_NAME]">[PREDICATE]</case>  <default to="[NODE_NAME]"/>  </switch> </decision> Switch case to decide between the execution of nodes
  • 22. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Control Flow Nodes Fork and Join Control Nodes • A fork node splits one path of execution into multiple concurrent paths of execution • A join node waits until every concurrent execution path of a previous fork node arrives to it • The fork and join nodes must be used in pairs • Actions at fork runs parallel Syntax  <fork name="[FORK-NODE-NAME]">  <path start="[NODE-NAME]" />  ...  <path start="[NODE-NAME]" />  </fork>  <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
  • 23. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Action Basis • Action Computation/Processing is always remote • Actions are Asynchronous • Actions have two transitions, ok and error • Action Recovery  Oozie provides recovery capabilities when starting or ending actions  Recovery strategies differ on the nature of failure  For non-transient failures action is suspended  For transient failures Oozie will perform retries after a fixed time interval
  • 24. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Fs(HDFS) Action • The fs action allows to manipulate files and directories in HDFS from a workflow application • The supported commands are move , delete and mkdir • The FS commands are executed synchronously from within the FS action • Syntax  <action name="[NODE-NAME]">  <fs>  <delete path='[PATH]'/>  ...  <mkdir path='[PATH]'/>  ...  <move source='[SOURCE-PATH]' target='[TARGET-PATH]'/>  </fs> <ok to="[NODE-NAME]"/>  <error to="[NODE-NAME]"/>  </action>
  • 25. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Pig Action • The pig action starts a Pig job • The workflow job will wait until the pig job completes before continuing to the next action • The pig action has to be configured with the job-tracker, name-node, pig script and the necessary parameters and configuration to run the Pig job. • The configuration properties are loaded in the following order, job-xml and configuration , and later values override earlier values. • Hadoop mapred.job.tracker and fs.default.name properties must not be present in the job-xml and inline configuration
  • 26. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes Pig Action • Syntax  <pig>  <job-tracker>[JOB-TRACKER]</job-tracker>  <name-node>[NAME-NODE]</name-node>  <prepare> <delete path="[PATH]"/>  ... <mkdir path="[PATH]"/> ... </prepare>  <job-xml>[JOB-XML-FILE]</job-xml> necessary configuration
  • 27. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Workflow Action Nodes  <configuration>  <property>  <name>[PROPERTY-NAME]</name> <value>[PROPERTY- VALUE]</value>  </property>  ... </configuration>  <script>[PIG-SCRIPT]</script>  <param>[PARAM-VALUE]</param>  ... <param>[PARAM-VALUE]</param>  <argument>[ARGUMENT-VALUE]</argument>  ... <argument>[ARGUMENT-VALUE]</argument>  <file>[FILE-PATH]</file>  ... <archive>[FILE-PATH]</archive>  ... </pig> Cluster wide configura tion Pig script, its parameters and arguments
  • 28. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Oozie Job States A workflow job can have be in any of the following states: • PREP: When a workflow job is first created it will be in PREP state. The workflow job is defined but it is not running. • RUNNING: When a CREATED workflow job is started it goes into RUNNING state, it will remain in RUNNING state while it does not reach its end state, ends in error or it is suspended. • SUSPENDED: A RUNNING workflow job can be suspended, it will remain in SUSPENDED state until the workflow job is resumed or it is killed. • SUCCEEDED: When a RUNNING workflow job reaches the end node it ends reaching the SUCCEEDED final state. • KILLED: When a CREATED , RUNNING or SUSPENDED workflow job is killed by an administrator or the owner via a request to Oozie the workflow job ends reaching the KILLED final state. • FAILED: When a RUNNING workflow job fails due to an unexpected error it ends reaching the FAILED final state.
  • 29. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Example $ cp /usr/share/doc/oozie-3.3.2+49/oozie- examples.tar.gz . $ tar -xvf oozie-examples.tar.gz $ hadoop fs -put examples/ . $ cd examples/apps/pig/ $ oozie job --oozie http://localhost:11000/oozie -config job.properties –run $ oozie job -oozie http://localhost:11000/oozie -info <job_id>
  • 30. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example Pig Script • $ cat id.pig  A = load '$INPUT' using PigStorage(':');  B = foreach A generate $0 as id;  store B into '$OUTPUT' USING PigStorage();
  • 31. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example Workflow xml • $ cat workflow.xml  <workflow-app xmlns="uri:oozie:workflow:0.2" name="pig-wf">  <start to="pig-node"/>  <action name="pig-node">  <pig>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>  </prepare>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>  <property>  <name>mapred.compress.map.output</name>  <value>true</value>  </property>  </configuration>
  • 32. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example  <script>id.pig</script>  <param>INPUT=/user/${wf:user()}/${examplesRoot}/input- data/text</param>  <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output- data/pig</param>  </pig>  <ok to="end"/>  <error to="fail"/>  </action>  <kill name="fail">  <message>Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>  </kill>  <end name="end"/>  </workflow-app>
  • 33. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 Understand the Example $ cat job.properties  nameNode=hdfs://localhost:8020  jobTracker=localhost:8021  queueName=default  examplesRoot=examples  oozie.use.system.libpath=true  oozie.wf.application.path=${nameNode}/user/${user.name}/$ {examplesRoot}/apps/pig
  • 34. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job
  • 35. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job $ cd /root/examples/apps/demo $ cat workflow.xml  <workflow-app xmlns="uri:oozie:workflow:0.2" name="demo-wf">  <start to="cleanup-node"/>  <action name="cleanup-node">  <fs>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo"/>  </fs>  <ok to="fork-node"/>  <error to="fail"/>  </action>  <fork name="fork-node">  <path start="pig-node"/>  <path start="streaming-node"/>  </fork>
  • 36. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <action name="pig-node">  <pig>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/pig-node"/>  </prepare>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>
  • 37. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.map.output.compress</name>  <value>false</value>  </property>  </configuration>  <script>id.pig</script>  <param>INPUT=/user/${wf:user()}/${examplesRoot}/input- data/text</param>  <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output- data/demo/pig-node</param>  </pig>  <ok to="join-node"/>  <error to="fail"/>  </action>
  • 38. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <action name="streaming-node">  <map-reduce>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/streaming-node"/>  </prepare>  <streaming>  <mapper>/bin/cat</mapper>  <reducer>/usr/bin/wc</reducer>  </streaming>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>
  • 39. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.input.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/input- data/text</value>  </property>  <property>  <name>mapred.output.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/output- data/demo/streaming-node</value>  </property>  </configuration>  </map-reduce>  <ok to="join-node"/>  <error to="fail"/>  </action>
  • 40. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <join name="join-node" to="mr-node"/>  <action name="mr-node">  <map-reduce>  <job-tracker>${jobTracker}</job-tracker>  <name-node>${nameNode}</name-node>  <prepare>  <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/mr-node"/>  </prepare>  <configuration>  <property>  <name>mapred.job.queue.name</name>  <value>${queueName}</value>  </property>
  • 41. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.mapper.class</name>  <value>org.apache.oozie.example.DemoMapper</value>  </property>  <property>  <name>mapred.mapoutput.key.class</name>  <value>org.apache.hadoop.io.Text</value>  </property>  <property>  <name>mapred.mapoutput.value.class</name>  <value>org.apache.hadoop.io.IntWritable</value>  </property>
  • 42. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <property>  <name>mapred.reducer.class</name>  <value>org.apache.oozie.example.DemoReducer</value>  </property>  <property>  <name>mapred.map.tasks</name>  <value>1</value>  </property>  <property>  <name>mapred.input.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/pig- node,/user/${wf:user()}/${examplesRoot}/output-data/demo/streaming-node</value>  </property>  <property>  <name>mapred.output.dir</name>  <value>/user/${wf:user()}/${examplesRoot}/output-data/demo/mr-node</value>  </property>  </configuration>  </map-reduce>  <ok to="decision-node"/>  <error to="fail"/>  </action>
  • 43. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <decision name="decision-node">  <switch>  <case to="hdfs- node">${fs:exists(concat(concat(concat(concat(concat(name Node, '/user/'), wf:user()), '/'), examplesRoot), '/output- data/demo/mr-node')) == "true"}</case>  <default to="end"/>  </switch>  </decision>
  • 44. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job  <action name="hdfs-node">  <fs>  <move source="${nameNode}/user/${wf:user()}/${examplesRoot}/output- data/demo/mr-node"  target="/user/${wf:user()}/${examplesRoot}/output-data/demo/final- data"/>  </fs>  <ok to="end"/>  <error to="fail"/>  </action>  <kill name="fail">  <message>Demo workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>  </kill>  <end name="end"/>  </workflow-app>
  • 45. Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482 A Workflow Job • At the end of the Job Completion you will see something like this: