SlideShare a Scribd company logo
Tuning Up With Apache Tez
Gal Vinograd @ Crosswise - 2016/03/09
Tuning up with Apache Tez
Agenda
The Pipeline
The Problem
Why we chose Tez
Lessons Learned
Demo
The Batch
Internet
Labels
Data
Internet
Labels
Data
~200
Scripts
250 c3.2xlarge X
30 hours
10TB per Batch
Tuning up with Apache Tez
“Tez aims to be a general purpose execution
runtime that enhances various scenarios
that are not well served by classic
Map-Reduce. In the short term the major
focus is to support Hive and Pig ...”
Tez Design v1.1
“Tez aims to be a general purpose execution
runtime that enhances various scenarios that are
not well served by classic Map-Reduce. In the
short term the major focus is to support
Hive and Pig ...”
Tez Design v1.1
Hortonworks
The Batch
Internet
Labels
Data
~200
Scripts
Tez Atomic Components
Tokenizer
Aggregator
Edge
Vertex
Vertex
Logical and Physical Graphs
PhysicalLogical
Hortonworks
Optimizations
No “NOP” Map
Project
Distinct
GroupBy
NOP
Project
Distinct
GroupBy
Tez MR
Optimizations
No Barrier Between Jobs
Project
GroupBy
Project
Project
Distinct
Project
Distinct
GroupBy
Tez MR
Optimizations
No Redundant Resource Allocation
Project
Project
Distinct
GroupBy
Project
Project
Distinct
GroupBy
Pig
Process
Pig
Process
Tez MR
Optimizations
Sessions
Allocate
Submit 2
Submit 1
Cleanup
Client
Lessons Learned
Some Pig Tasks Did Not Compile  Occasionaly Froze
No DistributedCache Support For S3
Poor Amazon Support
No Pre-Built Releases
Additional Deployment for Tez UI
What is it good for?
Earily
Adopters
Pig  Hive
Bounded
Thanks for Listening!

More Related Content

PPTX
February 2014 HUG : Hive On Tez
PPTX
Apache Tez – Present and Future
PPTX
Pig on Tez - Low Latency ETL with Big Data
PDF
Quick Introduction to Apache Tez
PPTX
Pig on Tez: Low Latency Data Processing with Big Data
PPTX
Apache Tez – Present and Future
PPTX
February 2014 HUG : Tez Details and Insides
PPTX
February 2014 HUG : Pig On Tez
February 2014 HUG : Hive On Tez
Apache Tez – Present and Future
Pig on Tez - Low Latency ETL with Big Data
Quick Introduction to Apache Tez
Pig on Tez: Low Latency Data Processing with Big Data
Apache Tez – Present and Future
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Pig On Tez

What's hot (20)

PPTX
Tez Data Processing over Yarn
PPTX
Apache Tez : Accelerating Hadoop Query Processing
PPTX
Apache Tez - Accelerating Hadoop Data Processing
PPTX
Tune up Yarn and Hive
 
PPTX
Hive at Yahoo: Letters from the trenches
PPTX
Stinger Initiative - Deep Dive
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PPTX
Yahoo's Experience Running Pig on Tez at Scale
PPTX
October 2014 HUG : Hive On Spark
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PPTX
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
PPTX
Achieving 100k Queries per Hour on Hive on Tez
PPTX
Powering a Virtual Power Station with Big Data
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
Hive on Spark, production experience @Uber
PPTX
Apache Tez - A unifying Framework for Hadoop Data Processing
PDF
The Future of Apache Storm
PDF
2013 July 23 Toronto Hadoop User Group Hive Tuning
PPTX
Operationalizing YARN based Hadoop Clusters in the Cloud
Tez Data Processing over Yarn
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez - Accelerating Hadoop Data Processing
Tune up Yarn and Hive
 
Hive at Yahoo: Letters from the trenches
Stinger Initiative - Deep Dive
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Yahoo's Experience Running Pig on Tez at Scale
October 2014 HUG : Hive On Spark
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez: Accelerating Hadoop Query Processing
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Achieving 100k Queries per Hour on Hive on Tez
Powering a Virtual Power Station with Big Data
Flexible and Real-Time Stream Processing with Apache Flink
Hive on Spark, production experience @Uber
Apache Tez - A unifying Framework for Hadoop Data Processing
The Future of Apache Storm
2013 July 23 Toronto Hadoop User Group Hive Tuning
Operationalizing YARN based Hadoop Clusters in the Cloud
Ad

Similar to Tuning up with Apache Tez (20)

PDF
ppbench - A Visualizing Network Benchmark for Microservices
PPTX
Yahoo compares Storm and Spark
PPTX
Scaling a MeteorJS SaaS app on AWS
PDF
[262] netflix 빅데이터 플랫폼
PPTX
Matt Franklin - Apache Software (Geekfest)
PDF
From a student to an apache committer practice of apache io tdb
PDF
Apache Tez : Accelerating Hadoop Query Processing
PDF
Overview of stinger interactive query for hive
PPTX
MapReduce: A useful parallel tool that still has room for improvement
PDF
PostgreSQL: The Time-Series Database You (Actually) Want
PPTX
Presto Meetup Talk @ FB (03/19/15)
PPTX
Presto@Netflix Presto Meetup 03-19-15
PPT
Cloud computing and Hadoop introduction
PPTX
Stream Processing and Real-Time Data Pipelines
PDF
[AWS Builders] Effective AWS Glue
KEY
Scaling application servers for efficiency
PDF
Hadoop Hardware @Twitter: Size does matter!
PPT
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
PDF
Fighting Against Chaotically Separated Values with Embulk
PPTX
ModernWeb 2019: Why we replace TypeScript with Dart
ppbench - A Visualizing Network Benchmark for Microservices
Yahoo compares Storm and Spark
Scaling a MeteorJS SaaS app on AWS
[262] netflix 빅데이터 플랫폼
Matt Franklin - Apache Software (Geekfest)
From a student to an apache committer practice of apache io tdb
Apache Tez : Accelerating Hadoop Query Processing
Overview of stinger interactive query for hive
MapReduce: A useful parallel tool that still has room for improvement
PostgreSQL: The Time-Series Database You (Actually) Want
Presto Meetup Talk @ FB (03/19/15)
Presto@Netflix Presto Meetup 03-19-15
Cloud computing and Hadoop introduction
Stream Processing and Real-Time Data Pipelines
[AWS Builders] Effective AWS Glue
Scaling application servers for efficiency
Hadoop Hardware @Twitter: Size does matter!
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Fighting Against Chaotically Separated Values with Embulk
ModernWeb 2019: Why we replace TypeScript with Dart
Ad

Recently uploaded (20)

PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Advanced Soft Computing BINUS July 2025.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
CIFDAQ's Market Insight: SEC Turns Pro Crypto
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

Tuning up with Apache Tez

Editor's Notes

  • #19: -Dpig.tez.opt.union=false