SlideShare a Scribd company logo
Other Functions and
Passing Parameters
Using command line or a
file
Other Functions
 Average:
grunt> grouped = group dataTransaction by CustomerName
grunt> average = FOREACH grouped GENERATE group, AVG
( dataTransaction.TransAmt1);
 COUNT: doesn’t count the NULL VALUES
grunt> cnt = foreach grouped generate group,
COUNT(dataTransaction);
grunt> dump cnt;
 COUNT_STAR: counts even the NULL VALUES
grunt> cntStar = foreach grouped GENERATE group,
COUNT_STAR($1);
Rupak Roy
 Concatenate:
grunt> c = foreach concat.csv GENERATE
CONCAT($0,$1);
 Multiple concatenate:
grunt> c = foreach concat.csv GENERATE
CONCAT($0,’-’,Transaction_ID);
 Is Empty: to check if a bag or map is empty
grunt> F = filter dataTransaction by IsEmpty($1);
Or
grunt> F = filter dataTransaction by Not IsEmpty($1);
Rupak Roy
 MAX/MIN
grunt> g = group dataTransaction by
CustomerName;
grunt> m= foreach g generate group ,
MIN( dataTransaction.TransAmt1);
or
m = foreach g generate
dataTransaction.CustomerName,
MIN(dataTranscation.TransAmt1);
grunt> m= foreach g generate
dataTransaction.CustomerName,
MAX( dataTransaction.TransAmt1);
Rupak Roy
SIZE: is used to calculate the size of the data
according to the Pig data type
grunt> S =foreach dataTransaction generate
SIZE($0), SIZE(CustomerName),SIZE($2);
Rupak Roy
 SUM
grunt> g = group dataTransaction by
CustomerName;
grunt> s= foreach grouped generate
dataTransaction.CustomerName,
SUM( dataTransation.TransAmt1)
Note: SUM, MAX/MIN, COUNT, COUNT_STAR,AVG
requires GROUP statement before we apply the
functions
Rupak Roy
Flatten Operator
 It used to change the structure of the tuples and
bags. Flatten un-nest tuples and bags.
 For example: consider the tuple has structure like
(a(b,c)). If we add FLATTEN such as GENERATE
flatten($0) it will cause the Tuple to become
(a,b,c)
 Again, if we have tuple in the from of
(a,{(b,c,),(d,e)}) which is a group generated by
GROUP OPERATOR and add GENERATE FLATTEN
$0 will give you (a,b,c) and (a,d,e)
Rupak Roy
Run Pig Scripts directly from a file
First create a file and save it in a .pig extension.
Type vi output.pig in the terminal
Then write the Pig script
A= LOAD ‘home/hduser/datasets/store.csv’ using
PigStorage(‘,’) as ( )
B= foreach A generate $0,$2;
Now, save the file as output.pig ( or with any .pig extension)
and now execute from any terminal
[bob$localhost~]$ pig –x local /home/hduser/output.pig
Note: if you want to use in HDFS just type only ‘ pig’
And for local mode ‘ pig –x local ‘
Rupak Roy
Pig gives you 2 available options to
pass parameters:
1. Using file: -param_file path to the
parameter file.
2. Using command line: -p,-param key value
pair of the form param=val
Rupak Roy
Passing Parameters
USING COMMAND LINE:
Create a new file:
vi output1.pig
A= LOAD ‘home/hduser/datasets/store.csv’ using PigStorage(‘,’) as
( )
B = FILTER A by Place ==’$Place’;
DUMP B;
Save the file as output1.pig or with any name and execute the file from
terminal.
[bob$localhost~]$ pig -x local -p Place=‘Alberta’ output1.pig
To pass multiple parameters:
pig -x local -p Place=‘Alberta’ -p Age=‘29’ -p Product=‘electronics’
output1.pig
Rupak Roy
USING FILE:
Create a Parameter file using: vi pfile type i to enter
insert mode
Then type CustomerName == ‘Carl Jackson’
threshold = 5
To exit from insert mode Press Esc
then type :wq! To save the contents
Pig –param_file pfile home/hduser/displayoutput.pig
Rupak Roy
Next
 Flume, a distributed, reliable tool for
collecting large amount of streaming
data.
Rupak Roy

More Related Content

PDF
Apache PIG Relational Operations
PDF
Apache Pig Relational Operators - II
PDF
Merging tables using R
PDF
Data Preparation- handling missing value
PDF
Handling Missing Values
PDF
Ramda lets write declarative js
PDF
Ramda, a functional JavaScript library
PDF
R code for data manipulation
Apache PIG Relational Operations
Apache Pig Relational Operators - II
Merging tables using R
Data Preparation- handling missing value
Handling Missing Values
Ramda lets write declarative js
Ramda, a functional JavaScript library
R code for data manipulation

What's hot (18)

PDF
New features in Ruby 2.4
PDF
R Programming: Importing Data In R
PPTX
Grails queries
PPTX
Pipes and filters
PPTX
PHP function
PDF
Simplifying java with lambdas (short)
PDF
Functional Programming for OO Programmers (part 2)
PDF
PDF
The Ring programming language version 1.5.2 book - Part 21 of 181
PDF
Hello Swift 3/5 - Function
PPTX
Windows power shell basics
PDF
Python Workshop Part 2. LUG Maniapl
PPT
OpenWRT Makefile reference
PDF
Python Variable Types, List, Tuple, Dictionary
PPTX
PostgreSQL table partitioning
PPTX
Functional Programming with JavaScript
PPTX
2. R-basics, Vectors, Arrays, Matrices, Factors
PDF
R Programming: Learn To Manipulate Strings In R
New features in Ruby 2.4
R Programming: Importing Data In R
Grails queries
Pipes and filters
PHP function
Simplifying java with lambdas (short)
Functional Programming for OO Programmers (part 2)
The Ring programming language version 1.5.2 book - Part 21 of 181
Hello Swift 3/5 - Function
Windows power shell basics
Python Workshop Part 2. LUG Maniapl
OpenWRT Makefile reference
Python Variable Types, List, Tuple, Dictionary
PostgreSQL table partitioning
Functional Programming with JavaScript
2. R-basics, Vectors, Arrays, Matrices, Factors
R Programming: Learn To Manipulate Strings In R
Ad

Similar to Passing Parameters using File and Command Line (20)

KEY
Operation Oriented Web Applications / Yokohama pm7
PPTX
Introduction to Apache Pig
PDF
Practical pig
ODP
Data Analysis in Python
PDF
Phil Bartie QGIS PLPython
DOCX
Commands documentaion
PDF
PyCon 2013 : Scripting to PyPi to GitHub and More
PDF
20141111 파이썬으로 Hadoop MR프로그래밍
ODP
Building and Incredible Machine with Pipelines and Generators in PHP (IPC Ber...
PDF
IR Journal (itscholar.codegency.co.in).pdf
PPT
TopicMapReduceComet log analysis by using splunk
PDF
GPars For Beginners
PDF
Pl python python w postgre-sql
PDF
わかった気になるgitit-0.8
PPTX
4.1-Pig.pptx
PDF
Perl web frameworks
PDF
Curscatalyst
PPTX
Introduction to Apache Pig
PPTX
PHP7 Presentation
PPTX
03 pig intro
Operation Oriented Web Applications / Yokohama pm7
Introduction to Apache Pig
Practical pig
Data Analysis in Python
Phil Bartie QGIS PLPython
Commands documentaion
PyCon 2013 : Scripting to PyPi to GitHub and More
20141111 파이썬으로 Hadoop MR프로그래밍
Building and Incredible Machine with Pipelines and Generators in PHP (IPC Ber...
IR Journal (itscholar.codegency.co.in).pdf
TopicMapReduceComet log analysis by using splunk
GPars For Beginners
Pl python python w postgre-sql
わかった気になるgitit-0.8
4.1-Pig.pptx
Perl web frameworks
Curscatalyst
Introduction to Apache Pig
PHP7 Presentation
03 pig intro
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
PDF
Clustering K means and Hierarchical - NLP
PDF
Network Analysis - NLP
PDF
Topic Modeling - NLP
PDF
Sentiment Analysis Practical Steps
PDF
NLP - Sentiment Analysis
PDF
Text Mining using Regular Expressions
PDF
Introduction to Text Mining
PDF
Apache Hbase Architecture
PDF
Introduction to Hbase
PDF
Apache Hive Table Partition and HQL
PDF
Installing Apache Hive, internal and external table, import-export
PDF
Introductive to Hive
PDF
Scoop Job, import and export to RDBMS
PDF
Apache Scoop - Import with Append mode and Last Modified mode
PDF
Introduction to scoop and its functions
PDF
Introduction to Flume
PDF
Apache PIG casting, reference
PDF
Pig Latin, Data Model with Load and Store Functions
PDF
Introduction to PIG components
Hierarchical Clustering - Text Mining/NLP
Clustering K means and Hierarchical - NLP
Network Analysis - NLP
Topic Modeling - NLP
Sentiment Analysis Practical Steps
NLP - Sentiment Analysis
Text Mining using Regular Expressions
Introduction to Text Mining
Apache Hbase Architecture
Introduction to Hbase
Apache Hive Table Partition and HQL
Installing Apache Hive, internal and external table, import-export
Introductive to Hive
Scoop Job, import and export to RDBMS
Apache Scoop - Import with Append mode and Last Modified mode
Introduction to scoop and its functions
Introduction to Flume
Apache PIG casting, reference
Pig Latin, Data Model with Load and Store Functions
Introduction to PIG components

Recently uploaded (20)

PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Complications of Minimal Access Surgery at WLH
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Institutional Correction lecture only . . .
PPTX
master seminar digital applications in india
PPTX
Cell Structure & Organelles in detailed.
PDF
01-Introduction-to-Information-Management.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
RMMM.pdf make it easy to upload and study
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Renaissance Architecture: A Journey from Faith to Humanism
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial disease of the cardiovascular and lymphatic systems
Complications of Minimal Access Surgery at WLH
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers
Module 4: Burden of Disease Tutorial Slides S2 2025
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Institutional Correction lecture only . . .
master seminar digital applications in india
Cell Structure & Organelles in detailed.
01-Introduction-to-Information-Management.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
RMMM.pdf make it easy to upload and study
Abdominal Access Techniques with Prof. Dr. R K Mishra
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

Passing Parameters using File and Command Line

  • 1. Other Functions and Passing Parameters Using command line or a file
  • 2. Other Functions  Average: grunt> grouped = group dataTransaction by CustomerName grunt> average = FOREACH grouped GENERATE group, AVG ( dataTransaction.TransAmt1);  COUNT: doesn’t count the NULL VALUES grunt> cnt = foreach grouped generate group, COUNT(dataTransaction); grunt> dump cnt;  COUNT_STAR: counts even the NULL VALUES grunt> cntStar = foreach grouped GENERATE group, COUNT_STAR($1); Rupak Roy
  • 3.  Concatenate: grunt> c = foreach concat.csv GENERATE CONCAT($0,$1);  Multiple concatenate: grunt> c = foreach concat.csv GENERATE CONCAT($0,’-’,Transaction_ID);  Is Empty: to check if a bag or map is empty grunt> F = filter dataTransaction by IsEmpty($1); Or grunt> F = filter dataTransaction by Not IsEmpty($1); Rupak Roy
  • 4.  MAX/MIN grunt> g = group dataTransaction by CustomerName; grunt> m= foreach g generate group , MIN( dataTransaction.TransAmt1); or m = foreach g generate dataTransaction.CustomerName, MIN(dataTranscation.TransAmt1); grunt> m= foreach g generate dataTransaction.CustomerName, MAX( dataTransaction.TransAmt1); Rupak Roy
  • 5. SIZE: is used to calculate the size of the data according to the Pig data type grunt> S =foreach dataTransaction generate SIZE($0), SIZE(CustomerName),SIZE($2); Rupak Roy
  • 6.  SUM grunt> g = group dataTransaction by CustomerName; grunt> s= foreach grouped generate dataTransaction.CustomerName, SUM( dataTransation.TransAmt1) Note: SUM, MAX/MIN, COUNT, COUNT_STAR,AVG requires GROUP statement before we apply the functions Rupak Roy
  • 7. Flatten Operator  It used to change the structure of the tuples and bags. Flatten un-nest tuples and bags.  For example: consider the tuple has structure like (a(b,c)). If we add FLATTEN such as GENERATE flatten($0) it will cause the Tuple to become (a,b,c)  Again, if we have tuple in the from of (a,{(b,c,),(d,e)}) which is a group generated by GROUP OPERATOR and add GENERATE FLATTEN $0 will give you (a,b,c) and (a,d,e) Rupak Roy
  • 8. Run Pig Scripts directly from a file First create a file and save it in a .pig extension. Type vi output.pig in the terminal Then write the Pig script A= LOAD ‘home/hduser/datasets/store.csv’ using PigStorage(‘,’) as ( ) B= foreach A generate $0,$2; Now, save the file as output.pig ( or with any .pig extension) and now execute from any terminal [bob$localhost~]$ pig –x local /home/hduser/output.pig Note: if you want to use in HDFS just type only ‘ pig’ And for local mode ‘ pig –x local ‘ Rupak Roy
  • 9. Pig gives you 2 available options to pass parameters: 1. Using file: -param_file path to the parameter file. 2. Using command line: -p,-param key value pair of the form param=val Rupak Roy
  • 10. Passing Parameters USING COMMAND LINE: Create a new file: vi output1.pig A= LOAD ‘home/hduser/datasets/store.csv’ using PigStorage(‘,’) as ( ) B = FILTER A by Place ==’$Place’; DUMP B; Save the file as output1.pig or with any name and execute the file from terminal. [bob$localhost~]$ pig -x local -p Place=‘Alberta’ output1.pig To pass multiple parameters: pig -x local -p Place=‘Alberta’ -p Age=‘29’ -p Product=‘electronics’ output1.pig Rupak Roy
  • 11. USING FILE: Create a Parameter file using: vi pfile type i to enter insert mode Then type CustomerName == ‘Carl Jackson’ threshold = 5 To exit from insert mode Press Esc then type :wq! To save the contents Pig –param_file pfile home/hduser/displayoutput.pig Rupak Roy
  • 12. Next  Flume, a distributed, reliable tool for collecting large amount of streaming data. Rupak Roy