SlideShare a Scribd company logo
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 1/12
Automating Hadoop Jobs Using Rundeck
kiran • April 6, 2017  4  738
All Categories Big Data Hadoop & Spark - Advanced
Rundeck is an open source software that helps in automating a set of
procedures. It provides features to automate a certain set of things. Rundeck
is developed on GitHub as a project called Rundeck SimplifyOps by the
Rundeck community.
Following are some of its exciting features:
100% Free Course On Big
Data Essentials
Subscribe to our blog and get access to this course
ABSOLUTELY FREE.
Name
Email
Phone
Submit
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 2/12
Web API
Distributed command execution
Pluggable execution system (SSH by default)
Multistep workflows
Job execution with on-demand or scheduled runs
Graphical web console for command and job execution
Role-based access control policy with support for LDAP/ActiveDirectory
History and auditing logs
Open integration with external host inventory tools
Command line interface tools
In our previous blog, we have shown how to schedule a Hadoop job in
Rundeck. In this blog, we will give you a demo of how to automate a
Hadoop/Hive/Pig job using Rundeck. This will allow your job to run on a
daily or even on a monthly basis.
We recommend our users to go through our previous blogs on Rundeck for
steps on installation and on how to schedule a Hadoop job.
Let us start with project creation. We will create a list of Hive queries in a file,
after which we will configure the job for it to run automatically every day.
To create a new project, Click On the New project, and provide the
necessary details, like project_name, description as shown in the screenshot
below.
Also, check the option Require File Exists in the Resource Model Source and
click on Save.
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 3/12
Now, scroll toward the end of the page and click on Create. Your Rundeck
project will get created and you will be able to see the project screen as
shown below.
Now click on Create Job at the Right corner and click on the New job. Fill
necessary details like Job name, description as shown in the screenshot
below:
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 4/12
For scheduling, come to the Workflow section and select options of your
choice. We have selected the following options:
If a step fails: Stop at the failed step
Strategy: Sequential
To provide a job or a query, go the Add step section, and select the option
Command.
Here, you need to provide the Hive query file containing a set of Hive
queries. Below is our hive query.
We will get our employee details inside the file emp.csv on a daily basis in
our HDFS. So, we are creating an hql file with the following content. We have
named it as hive_query.hql.
CREATE DATABASE IF IT DOES NOT EXIST employee;
use employee;
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 5/12
CREATE EXTERNAL TABLE IF IT DOES NOT EXIST employee_test (
id STRING,
first_name STRING,
last_name STRING,
email STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data inpath '/emp.csv' into table employee_test;
The command used to run this script in the command line is shown below:
hive -f hive_query.hql
 
After entering the command, click on Save, and if you want to run another
query after this, you can do that by adding another step.
After loading the data, I wish to count the number of employees present. We
can do this by using the following Hive query:
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 6/12
select count(*) from employee.employee_test
We will save this query in a file with the name hive_emp.hql. Towards the
end, we have added: >emp_cnt.txt. So, the above query will write the output
into the file: emp_cnt.txt. We will enter this query as the next step in our
workflow as shown in the screen shot below:
hive -f hive_emp.hql>emp_cnt.txt
For automating this job, select the following option:
Schedule or Run Repeated: Yes
You will get two kinds of automation: one is simple and the other is using
the Unix crontab. After selecting the necessary option, scroll to the last and
click on Create.
After clicking on Create, you will be redirected to the Job page. Beside your
job, you can see the countdown left to run it.
You can also see your job definition in the Definition tab as shown below:
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 7/12
In this demo, we have changed the time and we have only 2 min left to run
the job. After 2 min, this job will automatically run.
You can track the job status in the Activity for this job section below. Here,
we have four options: running, recent, failed, and by you.
After 2 min, in the running tab, you can see that your job is running.
Once your job gets deployed, you will get the deployment or execution
number, using which you can track the job running status and its complete
console output. In the screen shot displayed below, you can see that our job
is running.
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 8/12
And the deployment number is 21. In the recent tab, we can see the list of all
the succeeded and failed jobs. Now, we will check for the execution number
21 and then find the console output.
We can see that our job has run successfully. We can check for the output in
Log Output tab. Here, you can see the console output for both the jobs.
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 9/12
Now, we will check for the output in the file emp_cnt.txt.
In the above screen shot, you can see that there are 6000 employees in that
company till date. As scheduled, the same job will run automatically the next
day, and the count will be saved.
Once the job gets completed successfully, you can see the next deployment
countdown as shown in the screen shot below:
We hope this blog helped you in automating your Hadoop jobs using
Rundeck. Keep visiting our website, www.acadgild.com, for more updates on
Big data Training and other technologies.
Related
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 10/12
Scheduling Hadoop Jobs
using RUNDECK
December 26, 2016
In "All Categories"
Scheduling Hadoop Jobs
Using Jenkins
January 10, 2017
In "Big Data Hadoop &
Spark - Advanced"
Running A Map Reduce
Program Using Oozie
January 20, 2016
In "All Categories"
4 Comments
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 11/12
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Reply
Reply
Reply
Reply
drasticdsemulatorinfo
April 16, 2017 at 1:42 PM
Do you mind if I quote a couple of your articles as long as I provide
credit
and sources back to your blog? My website is in the exact
same area of interest as yours and my users would truly benefit from
some
of the information you present here. Please let me know if this alright
with
you. Many thanks!
AcadGild
April 17, 2017 at 10:43 AM
Pls go ahead!
restorative justice in schools
April 18, 2017 at 9:13 AM
Hey, I think your blog might be having browser compatibility
issues. When I look at your blog site in Chrome,
it looks fine but when opening in Internet Explorer,
it has some overlapping. I just wanted to give you a quick heads up!
Other then that, terrific blog!
best golf simulators for home
April 18, 2017 at 7:18 PM
I don’t even know how I ended up right here, but I thought this submit
was
once great. I don’t recognise who you’re however definitely you
are going to a famous blogger should you are not already.
Cheers!
9/24/2018 Automating Hadoop Jobs Using Rundeck |
https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 12/12

More Related Content

PDF
Clearing Airflow Obstructions
PDF
Crafting APIs
PDF
Scaling machine learning to millions of users with Apache Beam
PDF
H2O 3 REST API Overview
PDF
From an idea to production: building a recommender for BBC Sounds
PDF
Introduction to Apache Airflow - Data Day Seattle 2016
PDF
Lean Dependency Management with graphs
PDF
Workflow Engines + Luigi
Clearing Airflow Obstructions
Crafting APIs
Scaling machine learning to millions of users with Apache Beam
H2O 3 REST API Overview
From an idea to production: building a recommender for BBC Sounds
Introduction to Apache Airflow - Data Day Seattle 2016
Lean Dependency Management with graphs
Workflow Engines + Luigi

Similar to Automating hadoop jobs using rundeck (20)

ODT
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
PDF
Hadoop and Mapreduce Certification
PPTX
De-Bugging Hive with Hadoop-in-the-Cloud
PPTX
Debugging Hive with Hadoop-in-the-Cloud
PDF
Hdp developer apache spark using python (lab guide) by hortonworks university...
PPTX
Big data week presentation
PDF
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
DOC
Robin_Hadoop
PPTX
Hadoop Adminstration with Latest Release (2.0)
DOCX
PRAFUL_HADOOP
PPT
Introduction to Apache Hadoop
PPTX
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
PPTX
Big Data and Hadoop Training in Bangalore by myTectra
DOCX
PRAFUL_HADOOP
PPTX
Bigdata workshop february 2015
PPSX
Hadoop-Quick introduction
PDF
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
PPTX
Testing Big Data: Automated Testing of Hadoop with QuerySurge
PDF
Hadoop training kit from lcc infotech
PPTX
Big Data Summer training presentation
Big-Data Hadoop Training Institutes in Pune | CloudEra Certification courses ...
Hadoop and Mapreduce Certification
De-Bugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
Hdp developer apache spark using python (lab guide) by hortonworks university...
Big data week presentation
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Robin_Hadoop
Hadoop Adminstration with Latest Release (2.0)
PRAFUL_HADOOP
Introduction to Apache Hadoop
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Big Data and Hadoop Training in Bangalore by myTectra
PRAFUL_HADOOP
Bigdata workshop february 2015
Hadoop-Quick introduction
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Hadoop training kit from lcc infotech
Big Data Summer training presentation
Ad

Recently uploaded (20)

PPTX
introduction to high performance computing
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
introduction to high performance computing
Automation-in-Manufacturing-Chapter-Introduction.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Fundamentals of safety and accident prevention -final (1).pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Categorization of Factors Affecting Classification Algorithms Selection
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
737-MAX_SRG.pdf student reference guides
Safety Seminar civil to be ensured for safe working.
Abrasive, erosive and cavitation wear.pdf
UNIT - 3 Total quality Management .pptx
Exploratory_Data_Analysis_Fundamentals.pdf
III.4.1.2_The_Space_Environment.p pdffdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Ad

Automating hadoop jobs using rundeck

  • 1. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 1/12 Automating Hadoop Jobs Using Rundeck kiran • April 6, 2017  4  738 All Categories Big Data Hadoop & Spark - Advanced Rundeck is an open source software that helps in automating a set of procedures. It provides features to automate a certain set of things. Rundeck is developed on GitHub as a project called Rundeck SimplifyOps by the Rundeck community. Following are some of its exciting features: 100% Free Course On Big Data Essentials Subscribe to our blog and get access to this course ABSOLUTELY FREE. Name Email Phone Submit
  • 2. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 2/12 Web API Distributed command execution Pluggable execution system (SSH by default) Multistep workflows Job execution with on-demand or scheduled runs Graphical web console for command and job execution Role-based access control policy with support for LDAP/ActiveDirectory History and auditing logs Open integration with external host inventory tools Command line interface tools In our previous blog, we have shown how to schedule a Hadoop job in Rundeck. In this blog, we will give you a demo of how to automate a Hadoop/Hive/Pig job using Rundeck. This will allow your job to run on a daily or even on a monthly basis. We recommend our users to go through our previous blogs on Rundeck for steps on installation and on how to schedule a Hadoop job. Let us start with project creation. We will create a list of Hive queries in a file, after which we will configure the job for it to run automatically every day. To create a new project, Click On the New project, and provide the necessary details, like project_name, description as shown in the screenshot below. Also, check the option Require File Exists in the Resource Model Source and click on Save.
  • 3. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 3/12 Now, scroll toward the end of the page and click on Create. Your Rundeck project will get created and you will be able to see the project screen as shown below. Now click on Create Job at the Right corner and click on the New job. Fill necessary details like Job name, description as shown in the screenshot below:
  • 4. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 4/12 For scheduling, come to the Workflow section and select options of your choice. We have selected the following options: If a step fails: Stop at the failed step Strategy: Sequential To provide a job or a query, go the Add step section, and select the option Command. Here, you need to provide the Hive query file containing a set of Hive queries. Below is our hive query. We will get our employee details inside the file emp.csv on a daily basis in our HDFS. So, we are creating an hql file with the following content. We have named it as hive_query.hql. CREATE DATABASE IF IT DOES NOT EXIST employee; use employee;
  • 5. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 5/12 CREATE EXTERNAL TABLE IF IT DOES NOT EXIST employee_test ( id STRING, first_name STRING, last_name STRING, email STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; load data inpath '/emp.csv' into table employee_test; The command used to run this script in the command line is shown below: hive -f hive_query.hql   After entering the command, click on Save, and if you want to run another query after this, you can do that by adding another step. After loading the data, I wish to count the number of employees present. We can do this by using the following Hive query:
  • 6. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 6/12 select count(*) from employee.employee_test We will save this query in a file with the name hive_emp.hql. Towards the end, we have added: >emp_cnt.txt. So, the above query will write the output into the file: emp_cnt.txt. We will enter this query as the next step in our workflow as shown in the screen shot below: hive -f hive_emp.hql>emp_cnt.txt For automating this job, select the following option: Schedule or Run Repeated: Yes You will get two kinds of automation: one is simple and the other is using the Unix crontab. After selecting the necessary option, scroll to the last and click on Create. After clicking on Create, you will be redirected to the Job page. Beside your job, you can see the countdown left to run it. You can also see your job definition in the Definition tab as shown below:
  • 7. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 7/12 In this demo, we have changed the time and we have only 2 min left to run the job. After 2 min, this job will automatically run. You can track the job status in the Activity for this job section below. Here, we have four options: running, recent, failed, and by you. After 2 min, in the running tab, you can see that your job is running. Once your job gets deployed, you will get the deployment or execution number, using which you can track the job running status and its complete console output. In the screen shot displayed below, you can see that our job is running.
  • 8. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 8/12 And the deployment number is 21. In the recent tab, we can see the list of all the succeeded and failed jobs. Now, we will check for the execution number 21 and then find the console output. We can see that our job has run successfully. We can check for the output in Log Output tab. Here, you can see the console output for both the jobs.
  • 9. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 9/12 Now, we will check for the output in the file emp_cnt.txt. In the above screen shot, you can see that there are 6000 employees in that company till date. As scheduled, the same job will run automatically the next day, and the count will be saved. Once the job gets completed successfully, you can see the next deployment countdown as shown in the screen shot below: We hope this blog helped you in automating your Hadoop jobs using Rundeck. Keep visiting our website, www.acadgild.com, for more updates on Big data Training and other technologies. Related
  • 10. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 10/12 Scheduling Hadoop Jobs using RUNDECK December 26, 2016 In "All Categories" Scheduling Hadoop Jobs Using Jenkins January 10, 2017 In "Big Data Hadoop & Spark - Advanced" Running A Map Reduce Program Using Oozie January 20, 2016 In "All Categories" 4 Comments
  • 11. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 11/12 This site uses Akismet to reduce spam. Learn how your comment data is processed. Reply Reply Reply Reply drasticdsemulatorinfo April 16, 2017 at 1:42 PM Do you mind if I quote a couple of your articles as long as I provide credit and sources back to your blog? My website is in the exact same area of interest as yours and my users would truly benefit from some of the information you present here. Please let me know if this alright with you. Many thanks! AcadGild April 17, 2017 at 10:43 AM Pls go ahead! restorative justice in schools April 18, 2017 at 9:13 AM Hey, I think your blog might be having browser compatibility issues. When I look at your blog site in Chrome, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, terrific blog! best golf simulators for home April 18, 2017 at 7:18 PM I don’t even know how I ended up right here, but I thought this submit was once great. I don’t recognise who you’re however definitely you are going to a famous blogger should you are not already. Cheers!
  • 12. 9/24/2018 Automating Hadoop Jobs Using Rundeck | https://acadgild.com/blog/automating-hadoop-jobs-using-rundeck 12/12