SlideShare a Scribd company logo
Pig on Tez
Daniel Dai
@daijy
Rohini Palaniswamy
@rohini_pswamy
H a d o o p S u m m i t 2 0 1 4 , S a n J o s e
Agenda
 Team Introduction
 Apache Pig
 Why Pig on Tez?
 Pig on Tez
- Design
- Tez features in Pig
- Performance
- Current status
- Future Plan
2
3
Apache Pig on Tez Team
Daniel Dai
Pig PMC
Hortonworks
Rohini Palaniswamy
Pig PMC
Yahoo!
Olga Natkovich
Pig PMC
Yahoo!
Cheolsoo Park
VP Pig, Pig PMC
Netflix
Mark Wagner
Pig Committer
LinkedIn
Alex Bain
Pig Contributor
LinkedIn
Pig Latin
 Procedural scripting language
 Closer to relational algebra
 Heavily used for ETL
 Schema / No schema data, Pig eats everything
 More than SQL and Feature rich
4
Multiquery Nested Foreach Illustrate
Algebraic and Accumulator java
UDFs
Script Embedding Scalars
Macros
non-java UDFs (jython, python,
javascript, groovy, jruby)
Distributed Orderby, Skewed
Join
Pig users
 Heavily used for ETL at Web Scale by Major Internet Companies
 At Yahoo!
- 60% of total hadoop jobs run daily
- 12 million monthly pig jobs
 Other heavy users
- Twitter
- Netflix
- LinkedIn
- Ebay
- Salesforce
 Standard data science tool, in university textbook
5
Why Pig on Tez?
 DAG execution framework
 Low level DAG framework
- Build DAG by defining vertices and edges
- Customize scheduling of DAG and routing of data
 Highly customizable with pluggable implementations
 Resource efficient
 Performance
- Without having to increase memory
 Natively built on top of YARN
- Multi-tenancy, resource allocation come for free
 Scale
 Security
 Excellent support from Tez community
- Bikas Saha, Siddharth Seth, Hitesh Shah
6
PIG on TEZ
Design
8
Logical Plan
Tez Plan MR Plan
Physical Plan
Tez Execution Engine MR Execution Engine
LogToPhyTranslationVisitor
MRCompilerTezCompiler
DAG Plan – Split Group by + Join
9
f = LOAD ‘foo’ AS (x, y, z);
g1 = GROUP f BY y;
g2 = GROUP f BY z;
j = JOIN g1 BY group,
g2 BY group;
Group by y Group by z
Load foo
Join
Load g1 and Load g2
Group by y Group by z
Load foo
Join
Multiple outputs
Reduce follows
reduce
HDFS HDFS
Split multiplex de-multiplex
DAG Execution - Visualization
10
Vertex 1
(Load)
Vertex 2
(Group)
Vertex 3
(Group)
Vertex 4
(Join)
MROutput
MRInput
DAG Plan – Distributed Orderby
11
Aggregate
Sample
Sort
Partition
A = LOAD ‘foo’ AS (x, y);
B = FILTER A by $0 is not
null;
C = ORDER f BY x;
Stage sample map
on distributed cache
Load/Filter
& Sample
Aggregate
Partition
Sort
Broadcast sample map
HDFS
HDFS
Load/FilterHDFS
HDFS
Map
Reduce
Map
Reduce
Map
1-1 Unsorted
Edge
Cache sample map
Session Reuse
 Feature
- Session reuse
 Submit more than one DAG to same AM
 Usage
- Each Pig script uses a single session
- Grunt shell uses one session for all commands till timeout
- More than one DAG submitted for merge join, ‘exec’
 Benefits
- A pig script with 5 MR jobs has 5 AM containers launched. Single AM for one
pig script in Tez saves capacity.
- Eliminates issue of queue and resource contention faced in MR by every new
MR job in the pipeline of a multi-stage pig script.
12
Container Reuse
 Features
- Container reuse
 Rerun new tasks on already launched containers (jvm)
 Usage
- Turned on by default for all pig scripts and grunt shell
 Benefits
- Reduced launch overhead
 Container request and release overhead
 Resource localization overhead
 JVM launch time overhead
- Reduced network IO
 1-1 edge tasks are launched on same node
- Object caching
 User impact
- Have to review/profile and fix custom LoadFunc/StoreFunc/UDFs for static variables
and memory leaks due to jvm reuse.
13
Custom Vertex Input/Output/Processor/Manager
 Features
- Custom Vertex Processor
- Custom Input and Output between vertices
- Custom Vertex Manager
 Usage
- PigProcessor instead of MapProcessor and ReduceProcessor
- Unsorted input/output
 with Partitioner – Union
 without Partitioner – Broadcast Edge (Replicate join, Orderby and Skewed join), 1-1
Edge (Order by, Skewed join and Multiquery off)
- Custom Vertex Manager – Automatic Parallelism Estimation
 Benefits
- No framework restrictions like MR
- More efficient processing and algorithms
14
Broadcast Edge and Object Caching
 Feature
- Broadcast Edge
 Broadcast same data to all tasks in successor vertices
- Object Caching
 Ability to cache objects in memory for scope of Vertex, DAG and Session
- Input fetch on choice
 Usage
- Replicate join small table
- Orderby and Skewed join partitioning samples
 Benefits
- Replace use of Distributed cache and avoid NodeManager bottleneck of localization
- Avoid input fetching if in cache on container reuse
- Performance gains of upto 3x in tests for replicated join on smaller clusters with
higher container reuse
15
Vertex Groups
 Feature
- Vertex Grouping
 Ability to group multiple vertices into one vertex group and produce a combined output
 Usage
- Union operator
 Benefits
- Better performance due to elimination
of an additional vertex
- Performance gains of 1.2x to 2x over MR
16
A = LOAD ‘a’;
B = LOAD ‘b’;
C = UNION A, B;
D = GROUP C by $0;
Load A Load B
GROUP
Dynamic Parallelism
 Determine parallelism beforehand is hard
 Dynamic adjust parallelism at runtime
 Tez VertexManagerPlugin
- Custom policy to determine parallelism at runtime
- Library of common policy: ShuffleVertexManager
17
Dynamic Parallelism - ShuffleVertexManager
18
Load A
JOIN
Load A
JOIN 4 2
Load B
Load B
 Stock VertexManagerPlugin from Tez
 Used by Group, Hash Join, etc
 Dynamic reduce parallelism of vertex based on estimated input size
Dynamic Parallelism – PartitionerDefinedVertexManager
 Custom VertexManagerPlugin Used by Order by / Skewed Join
 Dynamic increase / decrease parallelism based on input size
19
Load/Filter
& Sample
Sample
Aggregate
Partition
Sort
Calculate
#Parallelism
PERFORMANCE
Performance numbers –
21
0
10
20
30
40
50
60
70
80
Prod script 1
1.5x
1 MR Job
3172 vs 3172 Tasks
Prod script 2
2.1x
12 MR jobs
966 vs 941 Tasks
Prod script 3
1.5x
4 MR jobs on 8.4
TB input
21397 vs 21382
Tasks
Prod script 4
2%
4 MR Jobs on 25.2
TB input
101864 vs 101856
tasks
Timeinmins
MR
Tez
28 vs 18m
11 vs 5m
50 vs 35m
74 vs 72m
Performance numbers –
22
0
20
40
60
80
100
120
140
160
Prod script 1
2.52x
5 MR Jobs
Prod script 2
2.02x
5 MR Jobs
Prod script 3
2.22x
12 MR Jobs
Prod script 4
1.75x
15 MR jobs
Timeinmins
MR
Tez
25 vs 10m
34 vs 16m
2h 22m vs 1h 21m
1h 46m vs 48m
Lipstick from
23
Performance Numbers – Interactive Query
24
0
100
200
300
400
500
600
700
10G 5G 1G 500M
Timeinsecs
Input Size
TPC-H Q10
MR
Tez
2.49X
3.41X
4.89X 6X
 When the input data is small, latency dominates
 Tez significantly reduce latency through session/container reuse
Performance Numbers – Iterative Algorithm
25
 Pig can be used to implement iterative algorithm using embedding
 Iterative algorithm is ideal for container reuse
 Example: k-means Algorithm
- Each iteration takes an average 1.48s after the first iteration (vs 27s for MR)
0
1000
2000
3000
10 50 100
Timeinsecs
Iteration
k-means
MR
Tez
14.84X
13.12X
5.37X
* Source code can be downloaded at http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding
Performance is proportional to …
 Number of stages in the DAG
- Higher the number of stages in the DAG, performance of Tez over MR will be
better due to elimination of map read stages.
 Size of intermediate output
- More the size of intermediate output, the performance of Tez over MR will be
better due to reduced HDFS usage.
 Cluster/queue capacity
- More congested a queue is, the performance of Tez over MR will be better due
to container reuse.
 Size of data in the job
- For smaller data and more stages, the performance of Tez over MR will be
better as percentage of launch overhead in the total time is high for smaller
jobs.
26
CURRENT & FUTURE
Where are we?
 90% feature parity with Pig on MR
- No Local mode (TEZ-235)
- Rarely used operators not implemented
 MAPREDUCE (native mapreduce jobs)
 Collected CoGroup
 98% of ~1300 e2e tests pass.
 35% of ~2850 unit tests pass. Porting of rest pending on Tez Local mode.
 Tez branch merged into trunk and will be part of Pig 0.14 release
 Netflix has Lipstick working with Pig on Tez
- Credits: Jacob Perkins, Cheolsoo Park
28
User Impact
 Tez
- Zero pain deployment
- Tez library installation on local disk and copy to HDFS
 Pig
- No pain migration from Pig on MR to Pig on Tez
 Existing scripts work as is without any modification
 Only two additional steps to execute in Tez mode
– export TEZ_HOME=/tez-install-location
– pig -x tez myscript.pig
- Users to review/profile and fix custom LoadFunc/StoreFunc/UDFs for static
variables and memory leaks due to jvm reuse.
29
What next?
 Support for Tez Local mode
 All unit tests ported
 Improve
- Stability
- Usability
- Debuggability
 Apache Release
- Pig 0.14 with Tez released by Sep 2014
 Deployment
- In research in Yahoo! by early Q3
- In production in Yahoo and Netflix by Q3/Q4
 Performance
- From 1.2-3x to 1.5x-5x by Q4
30
Tez Features - WIP
 Tez UI
- Application Master UI and Job history UI is in the works by integrating via
Application Timeline server.
- Currently only AM logs are easily viewable. Task logs are available but have to
grep the AM log to find the URL.
 Tez Local mode
 Tez AM Recovery
- Tez checkpointing and resuming on AM failure is functional but needs more
work. With single DAG execution of whole script, AM retries can be very costly.
 Input fetch optimizations
- Custom ShuffleHandler on NodeManager
- Local input fetch on container reuse
31
What next - Performance?
 Shared Edges
- Same output to multiple downstream vertices
 Multiple Vertex Caching
 Unsorted shuffle for skewed join and order by
 Custom edge manager and data routing for skewed join
 Groupby and join using hashing and avoid sorting
 Better memory management
 Dynamic reconfiguration of DAG
- Automatically determine type of join - replicate, skewed or hash join
32
We are hiring!!!
Hortonworks
Stop by Kiosk D5
Yahoo!
Stop by Kiosk P9
or reach out to us at
bigdata@yahoo-inc.com.
Thank You

More Related Content

PPTX
February 2014 HUG : Hive On Tez
PPTX
Pig on Tez: Low Latency Data Processing with Big Data
PPTX
February 2014 HUG : Pig On Tez
PPTX
Apache Tez – Present and Future
PDF
Quick Introduction to Apache Tez
PPTX
Tuning up with Apache Tez
PPTX
Apache Tez – Present and Future
PPTX
Spark vstez
February 2014 HUG : Hive On Tez
Pig on Tez: Low Latency Data Processing with Big Data
February 2014 HUG : Pig On Tez
Apache Tez – Present and Future
Quick Introduction to Apache Tez
Tuning up with Apache Tez
Apache Tez – Present and Future
Spark vstez

What's hot (20)

PPTX
Hive+Tez: A performance deep dive
PPTX
Tez Data Processing over Yarn
PPTX
Apache Tez : Accelerating Hadoop Query Processing
PPTX
Tune up Yarn and Hive
 
PPTX
Apache Tez - Accelerating Hadoop Data Processing
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Hive at Yahoo: Letters from the trenches
PPTX
Yahoo's Experience Running Pig on Tez at Scale
PDF
The Future of Apache Storm
PDF
Tez: Accelerating Data Pipelines - fifthel
PDF
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
PPTX
Apache Tez - A unifying Framework for Hadoop Data Processing
PPTX
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
PPTX
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
PPTX
Performance Hive+Tez 2
PDF
Scaling HDFS to Manage Billions of Files with Key-Value Stores
PDF
Hadoop scheduler
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
Troubleshooting Hadoop: Distributed Debugging
Hive+Tez: A performance deep dive
Tez Data Processing over Yarn
Apache Tez : Accelerating Hadoop Query Processing
Tune up Yarn and Hive
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Flexible and Real-Time Stream Processing with Apache Flink
Hive at Yahoo: Letters from the trenches
Yahoo's Experience Running Pig on Tez at Scale
The Future of Apache Storm
Tez: Accelerating Data Pipelines - fifthel
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Apache Tez - A unifying Framework for Hadoop Data Processing
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Performance Hive+Tez 2
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Hadoop scheduler
Apache Tez: Accelerating Hadoop Query Processing
Troubleshooting Hadoop: Distributed Debugging
Ad

Viewers also liked (18)

PPTX
Architecting Applications with Hadoop
PDF
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
PPTX
06 pig etl features
PDF
Apache Kylinについて #hcj2016
PPTX
Introduction to Pig | Pig Architecture | Pig Fundamentals
PPTX
Introduction to Apache Pig
PDF
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
PPTX
Pig, Making Hadoop Easy
PDF
introduction to data processing using Hadoop and Pig
PDF
Practical Problem Solving with Apache Hadoop & Pig
PPT
HIVE: Data Warehousing & Analytics on Hadoop
PDF
Hive Quick Start Tutorial
PDF
Integration of Hive and HBase
KEY
Hadoop, Pig, and Twitter (NoSQL East 2009)
PPT
Introduction To Map Reduce
PPTX
Introduction to Pig
PPTX
Introduction to YARN and MapReduce 2
PPTX
Big Data Analytics with Hadoop
Architecting Applications with Hadoop
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
06 pig etl features
Apache Kylinについて #hcj2016
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Apache Pig
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Pig, Making Hadoop Easy
introduction to data processing using Hadoop and Pig
Practical Problem Solving with Apache Hadoop & Pig
HIVE: Data Warehousing & Analytics on Hadoop
Hive Quick Start Tutorial
Integration of Hive and HBase
Hadoop, Pig, and Twitter (NoSQL East 2009)
Introduction To Map Reduce
Introduction to Pig
Introduction to YARN and MapReduce 2
Big Data Analytics with Hadoop
Ad

Similar to Pig on Tez - Low Latency ETL with Big Data (20)

PDF
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
PDF
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
PPTX
Apache Tez – Present and Future
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
PDF
Developing Pig on Tez (ApacheCon 2014)
PPTX
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
PPTX
Tez big datacamp-la-bikas_saha
PPTX
Apache Tez -- A modern processing engine
PPTX
February 2014 HUG : Tez Details and Insides
PPTX
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
PPTX
November 2014 HUG: Apache Pig 0.14
PPTX
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
PPTX
YARN Ready: Integrating to YARN with Tez
PDF
Apache Tez : Accelerating Hadoop Query Processing
PPTX
Interactive query in hadoop
PPTX
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
PDF
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
PPT
Capital onehadoopintro
PPTX
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
Apache Tez – Present and Future
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Developing Pig on Tez (ApacheCon 2014)
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Tez big datacamp-la-bikas_saha
Apache Tez -- A modern processing engine
February 2014 HUG : Tez Details and Insides
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
November 2014 HUG: Apache Pig 0.14
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
YARN Ready: Integrating to YARN with Tez
Apache Tez : Accelerating Hadoop Query Processing
Interactive query in hadoop
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster
Capital onehadoopintro
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced IT Governance
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Advanced IT Governance

Pig on Tez - Low Latency ETL with Big Data

  • 1. Pig on Tez Daniel Dai @daijy Rohini Palaniswamy @rohini_pswamy H a d o o p S u m m i t 2 0 1 4 , S a n J o s e
  • 2. Agenda  Team Introduction  Apache Pig  Why Pig on Tez?  Pig on Tez - Design - Tez features in Pig - Performance - Current status - Future Plan 2
  • 3. 3 Apache Pig on Tez Team Daniel Dai Pig PMC Hortonworks Rohini Palaniswamy Pig PMC Yahoo! Olga Natkovich Pig PMC Yahoo! Cheolsoo Park VP Pig, Pig PMC Netflix Mark Wagner Pig Committer LinkedIn Alex Bain Pig Contributor LinkedIn
  • 4. Pig Latin  Procedural scripting language  Closer to relational algebra  Heavily used for ETL  Schema / No schema data, Pig eats everything  More than SQL and Feature rich 4 Multiquery Nested Foreach Illustrate Algebraic and Accumulator java UDFs Script Embedding Scalars Macros non-java UDFs (jython, python, javascript, groovy, jruby) Distributed Orderby, Skewed Join
  • 5. Pig users  Heavily used for ETL at Web Scale by Major Internet Companies  At Yahoo! - 60% of total hadoop jobs run daily - 12 million monthly pig jobs  Other heavy users - Twitter - Netflix - LinkedIn - Ebay - Salesforce  Standard data science tool, in university textbook 5
  • 6. Why Pig on Tez?  DAG execution framework  Low level DAG framework - Build DAG by defining vertices and edges - Customize scheduling of DAG and routing of data  Highly customizable with pluggable implementations  Resource efficient  Performance - Without having to increase memory  Natively built on top of YARN - Multi-tenancy, resource allocation come for free  Scale  Security  Excellent support from Tez community - Bikas Saha, Siddharth Seth, Hitesh Shah 6
  • 8. Design 8 Logical Plan Tez Plan MR Plan Physical Plan Tez Execution Engine MR Execution Engine LogToPhyTranslationVisitor MRCompilerTezCompiler
  • 9. DAG Plan – Split Group by + Join 9 f = LOAD ‘foo’ AS (x, y, z); g1 = GROUP f BY y; g2 = GROUP f BY z; j = JOIN g1 BY group, g2 BY group; Group by y Group by z Load foo Join Load g1 and Load g2 Group by y Group by z Load foo Join Multiple outputs Reduce follows reduce HDFS HDFS Split multiplex de-multiplex
  • 10. DAG Execution - Visualization 10 Vertex 1 (Load) Vertex 2 (Group) Vertex 3 (Group) Vertex 4 (Join) MROutput MRInput
  • 11. DAG Plan – Distributed Orderby 11 Aggregate Sample Sort Partition A = LOAD ‘foo’ AS (x, y); B = FILTER A by $0 is not null; C = ORDER f BY x; Stage sample map on distributed cache Load/Filter & Sample Aggregate Partition Sort Broadcast sample map HDFS HDFS Load/FilterHDFS HDFS Map Reduce Map Reduce Map 1-1 Unsorted Edge Cache sample map
  • 12. Session Reuse  Feature - Session reuse  Submit more than one DAG to same AM  Usage - Each Pig script uses a single session - Grunt shell uses one session for all commands till timeout - More than one DAG submitted for merge join, ‘exec’  Benefits - A pig script with 5 MR jobs has 5 AM containers launched. Single AM for one pig script in Tez saves capacity. - Eliminates issue of queue and resource contention faced in MR by every new MR job in the pipeline of a multi-stage pig script. 12
  • 13. Container Reuse  Features - Container reuse  Rerun new tasks on already launched containers (jvm)  Usage - Turned on by default for all pig scripts and grunt shell  Benefits - Reduced launch overhead  Container request and release overhead  Resource localization overhead  JVM launch time overhead - Reduced network IO  1-1 edge tasks are launched on same node - Object caching  User impact - Have to review/profile and fix custom LoadFunc/StoreFunc/UDFs for static variables and memory leaks due to jvm reuse. 13
  • 14. Custom Vertex Input/Output/Processor/Manager  Features - Custom Vertex Processor - Custom Input and Output between vertices - Custom Vertex Manager  Usage - PigProcessor instead of MapProcessor and ReduceProcessor - Unsorted input/output  with Partitioner – Union  without Partitioner – Broadcast Edge (Replicate join, Orderby and Skewed join), 1-1 Edge (Order by, Skewed join and Multiquery off) - Custom Vertex Manager – Automatic Parallelism Estimation  Benefits - No framework restrictions like MR - More efficient processing and algorithms 14
  • 15. Broadcast Edge and Object Caching  Feature - Broadcast Edge  Broadcast same data to all tasks in successor vertices - Object Caching  Ability to cache objects in memory for scope of Vertex, DAG and Session - Input fetch on choice  Usage - Replicate join small table - Orderby and Skewed join partitioning samples  Benefits - Replace use of Distributed cache and avoid NodeManager bottleneck of localization - Avoid input fetching if in cache on container reuse - Performance gains of upto 3x in tests for replicated join on smaller clusters with higher container reuse 15
  • 16. Vertex Groups  Feature - Vertex Grouping  Ability to group multiple vertices into one vertex group and produce a combined output  Usage - Union operator  Benefits - Better performance due to elimination of an additional vertex - Performance gains of 1.2x to 2x over MR 16 A = LOAD ‘a’; B = LOAD ‘b’; C = UNION A, B; D = GROUP C by $0; Load A Load B GROUP
  • 17. Dynamic Parallelism  Determine parallelism beforehand is hard  Dynamic adjust parallelism at runtime  Tez VertexManagerPlugin - Custom policy to determine parallelism at runtime - Library of common policy: ShuffleVertexManager 17
  • 18. Dynamic Parallelism - ShuffleVertexManager 18 Load A JOIN Load A JOIN 4 2 Load B Load B  Stock VertexManagerPlugin from Tez  Used by Group, Hash Join, etc  Dynamic reduce parallelism of vertex based on estimated input size
  • 19. Dynamic Parallelism – PartitionerDefinedVertexManager  Custom VertexManagerPlugin Used by Order by / Skewed Join  Dynamic increase / decrease parallelism based on input size 19 Load/Filter & Sample Sample Aggregate Partition Sort Calculate #Parallelism
  • 21. Performance numbers – 21 0 10 20 30 40 50 60 70 80 Prod script 1 1.5x 1 MR Job 3172 vs 3172 Tasks Prod script 2 2.1x 12 MR jobs 966 vs 941 Tasks Prod script 3 1.5x 4 MR jobs on 8.4 TB input 21397 vs 21382 Tasks Prod script 4 2% 4 MR Jobs on 25.2 TB input 101864 vs 101856 tasks Timeinmins MR Tez 28 vs 18m 11 vs 5m 50 vs 35m 74 vs 72m
  • 22. Performance numbers – 22 0 20 40 60 80 100 120 140 160 Prod script 1 2.52x 5 MR Jobs Prod script 2 2.02x 5 MR Jobs Prod script 3 2.22x 12 MR Jobs Prod script 4 1.75x 15 MR jobs Timeinmins MR Tez 25 vs 10m 34 vs 16m 2h 22m vs 1h 21m 1h 46m vs 48m
  • 24. Performance Numbers – Interactive Query 24 0 100 200 300 400 500 600 700 10G 5G 1G 500M Timeinsecs Input Size TPC-H Q10 MR Tez 2.49X 3.41X 4.89X 6X  When the input data is small, latency dominates  Tez significantly reduce latency through session/container reuse
  • 25. Performance Numbers – Iterative Algorithm 25  Pig can be used to implement iterative algorithm using embedding  Iterative algorithm is ideal for container reuse  Example: k-means Algorithm - Each iteration takes an average 1.48s after the first iteration (vs 27s for MR) 0 1000 2000 3000 10 50 100 Timeinsecs Iteration k-means MR Tez 14.84X 13.12X 5.37X * Source code can be downloaded at http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding
  • 26. Performance is proportional to …  Number of stages in the DAG - Higher the number of stages in the DAG, performance of Tez over MR will be better due to elimination of map read stages.  Size of intermediate output - More the size of intermediate output, the performance of Tez over MR will be better due to reduced HDFS usage.  Cluster/queue capacity - More congested a queue is, the performance of Tez over MR will be better due to container reuse.  Size of data in the job - For smaller data and more stages, the performance of Tez over MR will be better as percentage of launch overhead in the total time is high for smaller jobs. 26
  • 28. Where are we?  90% feature parity with Pig on MR - No Local mode (TEZ-235) - Rarely used operators not implemented  MAPREDUCE (native mapreduce jobs)  Collected CoGroup  98% of ~1300 e2e tests pass.  35% of ~2850 unit tests pass. Porting of rest pending on Tez Local mode.  Tez branch merged into trunk and will be part of Pig 0.14 release  Netflix has Lipstick working with Pig on Tez - Credits: Jacob Perkins, Cheolsoo Park 28
  • 29. User Impact  Tez - Zero pain deployment - Tez library installation on local disk and copy to HDFS  Pig - No pain migration from Pig on MR to Pig on Tez  Existing scripts work as is without any modification  Only two additional steps to execute in Tez mode – export TEZ_HOME=/tez-install-location – pig -x tez myscript.pig - Users to review/profile and fix custom LoadFunc/StoreFunc/UDFs for static variables and memory leaks due to jvm reuse. 29
  • 30. What next?  Support for Tez Local mode  All unit tests ported  Improve - Stability - Usability - Debuggability  Apache Release - Pig 0.14 with Tez released by Sep 2014  Deployment - In research in Yahoo! by early Q3 - In production in Yahoo and Netflix by Q3/Q4  Performance - From 1.2-3x to 1.5x-5x by Q4 30
  • 31. Tez Features - WIP  Tez UI - Application Master UI and Job history UI is in the works by integrating via Application Timeline server. - Currently only AM logs are easily viewable. Task logs are available but have to grep the AM log to find the URL.  Tez Local mode  Tez AM Recovery - Tez checkpointing and resuming on AM failure is functional but needs more work. With single DAG execution of whole script, AM retries can be very costly.  Input fetch optimizations - Custom ShuffleHandler on NodeManager - Local input fetch on container reuse 31
  • 32. What next - Performance?  Shared Edges - Same output to multiple downstream vertices  Multiple Vertex Caching  Unsorted shuffle for skewed join and order by  Custom edge manager and data routing for skewed join  Groupby and join using hashing and avoid sorting  Better memory management  Dynamic reconfiguration of DAG - Automatically determine type of join - replicate, skewed or hash join 32
  • 33. We are hiring!!! Hortonworks Stop by Kiosk D5 Yahoo! Stop by Kiosk P9 or reach out to us at bigdata@yahoo-inc.com. Thank You