SlideShare a Scribd company logo
3
Most read
6
Most read
7
Most read
PIG
Introduction to PIG
components
Apache Pig
 Pig is tool to for analyzing massive amount of
data.
Pig is a high level language that consists of
compiler that compiles user input into a series of
Map-Reduce programs that allows people to
focus more on analyzing data then spending
time in writing MapReduce Programs.
It actually creates a Java .jar file internally itself
from the user script or the input and runs as a
MapReduce job.
Rupak Roy
 Pig provides options not only to read and
write data from HDFS and can aslo be used
from other sources like local storage.
 Pig have 2 components:
1) Pig Latin
2) Execution Environments
1. The language for this platform is called Pig
Latin that turns the input data into a series of
MapReduce jobs
Rupak Roy
2. Execution environments:
2
* Local mode/execution.
* Distributed mode/execution on hadoop clusters.
Rupak Roy
Execution environments:
 Local Mode/Execution: used for running Pig on your local machine
and not in clusters. So any read/write operations is locally stored in
the local file systems and not in HDFS. It is mainly used for
prototyping and debugging.
To run Pig on Local Machine use the command:
P –x local
 Cluster Mode/Execution: used for running Pig on Hadoop clusters.
To run Pig on Cluster Mode/Execution used to command:
P –x mapreduce
Or
Pig
However, cluster mode is the default mode for any for read/write
operations HDFS is used.
Rupak Roy
Pig Architecture
Rupak Roy
Pig Architecture
 Grunt Shell: is a interactive space mainly used to
write Pig Latin scripts.
 Parser: parser is a interpreter that checks the
structure of the Pig scripts. The output of the parser
is DAG (directed acyclic graph) which represents
the Pig Latin statements and logical operators.
 Optimizer: the DAG is carried out by the optimizer,
which takes care of optimizing the logical
operators.
 Compiler: again the optimizer is carried away by
the compiler to generate a series of MapReduce
jobs.
 Finally MapReduce jobs are executed and the
results are stored in HDFS or locally if in case for
Local Mode.
Rupak Roy
Next
 We will learn the PIG Latin Data Model
with Load and Store functions.
Rupak Roy

More Related Content

PDF
Grafana introduction
PPTX
Quiz application
PPTX
Spark introduction and architecture
PDF
Postgres Performance for Humans
PPTX
Dining philosopher problem operating system
PDF
Project report 393_395
PPTX
Introduction to Android ppt
PPTX
iOS Operating System
Grafana introduction
Quiz application
Spark introduction and architecture
Postgres Performance for Humans
Dining philosopher problem operating system
Project report 393_395
Introduction to Android ppt
iOS Operating System

What's hot (20)

PDF
How Orange Financial combat financial frauds over 50M transactions a day usin...
PPTX
PDF
Usr tour and tra vel mini project report
PPTX
Log analysis using Logstash,ElasticSearch and Kibana
PPTX
Contiguous Memory Allocation-R.D.Sivakumar
PDF
Android Networking
PPTX
DBMS - RAID
PPTX
Android task manager project presentation
PPTX
Allocation methods (1).pptx
DOCX
Minor project Report for "Quiz Application"
PPTX
Introduction to Scala
PPTX
Online quiz system
PDF
Hadoop Overview & Architecture
 
PDF
Write Faster SQL with Trino.pdf
PPTX
Php.ppt
PDF
SQOOP PPT
PPTX
Routing Protocols
PPTX
Ios operating system
PPTX
Multiple Access Protocal
PPTX
Bookshop Automation System
How Orange Financial combat financial frauds over 50M transactions a day usin...
Usr tour and tra vel mini project report
Log analysis using Logstash,ElasticSearch and Kibana
Contiguous Memory Allocation-R.D.Sivakumar
Android Networking
DBMS - RAID
Android task manager project presentation
Allocation methods (1).pptx
Minor project Report for "Quiz Application"
Introduction to Scala
Online quiz system
Hadoop Overview & Architecture
 
Write Faster SQL with Trino.pdf
Php.ppt
SQOOP PPT
Routing Protocols
Ios operating system
Multiple Access Protocal
Bookshop Automation System
Ad

Similar to Introduction to PIG components (20)

PPTX
Running, execution and HDFS(Hadoop distributed file system)in pig
PDF
unit-4-apache pig-.pdf
PPTX
Unit 4-apache pig
PPTX
An Introduction to Apache Pig
PPTX
A slide share pig in CCS334 for big data analytics
PDF
Unit V.pdf
PPTX
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
PPTX
Unit 4 lecture2
PPTX
Apache Pig
PPTX
Unit-5 [Pig] working and architecture.pptx
PPTX
Apache pig
PPTX
Compression Options in Hadoop - A Tale of Tradeoffs
PPTX
Introduction to pig.
PDF
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
PPTX
BDA R20 21NM - Summary Big Data Analytics
PPTX
2 Hadoop 1.x presentation in understading .pptx
PPTX
Schedulers optimization to handle multiple jobs in hadoop cluster
PDF
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
PPTX
Compression Options in Hadoop - A Tale of Tradeoffs
Running, execution and HDFS(Hadoop distributed file system)in pig
unit-4-apache pig-.pdf
Unit 4-apache pig
An Introduction to Apache Pig
A slide share pig in CCS334 for big data analytics
Unit V.pdf
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
Unit 4 lecture2
Apache Pig
Unit-5 [Pig] working and architecture.pptx
Apache pig
Compression Options in Hadoop - A Tale of Tradeoffs
Introduction to pig.
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
BDA R20 21NM - Summary Big Data Analytics
2 Hadoop 1.x presentation in understading .pptx
Schedulers optimization to handle multiple jobs in hadoop cluster
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Compression Options in Hadoop - A Tale of Tradeoffs
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
PDF
Clustering K means and Hierarchical - NLP
PDF
Network Analysis - NLP
PDF
Topic Modeling - NLP
PDF
Sentiment Analysis Practical Steps
PDF
NLP - Sentiment Analysis
PDF
Text Mining using Regular Expressions
PDF
Introduction to Text Mining
PDF
Apache Hbase Architecture
PDF
Introduction to Hbase
PDF
Apache Hive Table Partition and HQL
PDF
Installing Apache Hive, internal and external table, import-export
PDF
Introductive to Hive
PDF
Scoop Job, import and export to RDBMS
PDF
Apache Scoop - Import with Append mode and Last Modified mode
PDF
Introduction to scoop and its functions
PDF
Introduction to Flume
PDF
Apache Pig Relational Operators - II
PDF
Passing Parameters using File and Command Line
PDF
Apache PIG Relational Operations
Hierarchical Clustering - Text Mining/NLP
Clustering K means and Hierarchical - NLP
Network Analysis - NLP
Topic Modeling - NLP
Sentiment Analysis Practical Steps
NLP - Sentiment Analysis
Text Mining using Regular Expressions
Introduction to Text Mining
Apache Hbase Architecture
Introduction to Hbase
Apache Hive Table Partition and HQL
Installing Apache Hive, internal and external table, import-export
Introductive to Hive
Scoop Job, import and export to RDBMS
Apache Scoop - Import with Append mode and Last Modified mode
Introduction to scoop and its functions
Introduction to Flume
Apache Pig Relational Operators - II
Passing Parameters using File and Command Line
Apache PIG Relational Operations

Recently uploaded (20)

PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharma ospi slides which help in ospi learning
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Institutional Correction lecture only . . .
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Complications of Minimal Access Surgery at WLH
Pharma ospi slides which help in ospi learning
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Microbial diseases, their pathogenesis and prophylaxis
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Computing-Curriculum for Schools in Ghana
102 student loan defaulters named and shamed – Is someone you know on the list?
Institutional Correction lecture only . . .

Introduction to PIG components

  • 2. Apache Pig  Pig is tool to for analyzing massive amount of data. Pig is a high level language that consists of compiler that compiles user input into a series of Map-Reduce programs that allows people to focus more on analyzing data then spending time in writing MapReduce Programs. It actually creates a Java .jar file internally itself from the user script or the input and runs as a MapReduce job. Rupak Roy
  • 3.  Pig provides options not only to read and write data from HDFS and can aslo be used from other sources like local storage.  Pig have 2 components: 1) Pig Latin 2) Execution Environments 1. The language for this platform is called Pig Latin that turns the input data into a series of MapReduce jobs Rupak Roy
  • 4. 2. Execution environments: 2 * Local mode/execution. * Distributed mode/execution on hadoop clusters. Rupak Roy
  • 5. Execution environments:  Local Mode/Execution: used for running Pig on your local machine and not in clusters. So any read/write operations is locally stored in the local file systems and not in HDFS. It is mainly used for prototyping and debugging. To run Pig on Local Machine use the command: P –x local  Cluster Mode/Execution: used for running Pig on Hadoop clusters. To run Pig on Cluster Mode/Execution used to command: P –x mapreduce Or Pig However, cluster mode is the default mode for any for read/write operations HDFS is used. Rupak Roy
  • 7. Pig Architecture  Grunt Shell: is a interactive space mainly used to write Pig Latin scripts.  Parser: parser is a interpreter that checks the structure of the Pig scripts. The output of the parser is DAG (directed acyclic graph) which represents the Pig Latin statements and logical operators.  Optimizer: the DAG is carried out by the optimizer, which takes care of optimizing the logical operators.  Compiler: again the optimizer is carried away by the compiler to generate a series of MapReduce jobs.  Finally MapReduce jobs are executed and the results are stored in HDFS or locally if in case for Local Mode. Rupak Roy
  • 8. Next  We will learn the PIG Latin Data Model with Load and Store functions. Rupak Roy