SlideShare a Scribd company logo
©	
  Hortonworks	
  Inc.	
  2015 Page	
  1
Apache	
  Tez
-­‐ Next	
  Generation	
  of	
  execution	
  engine	
  upon	
  hadoop
Jeff	
  Zhang	
  (@zjffdu)
©	
  Hortonworks	
  Inc.	
  2015
Who’s	
  this	
  guy
• Start	
  use	
  pig	
  from	
  2009.	
  Become	
  Pig	
  committer	
  from	
  Nov	
  
2009
• Join	
  Hortonworks	
  in	
  2014.	
  
• Tez Committer	
  from	
  Oct	
  2014
©	
  Hortonworks	
  Inc.	
  2015
Agenda
•Tez Introduction
•Tez Feature	
  Deep	
  Dive
•Tez Status	
  &	
  Roadmap
©	
  Hortonworks	
  Inc.	
  2015
I/O	
  Synchronization	
  
Barrier
I/O	
  Synchronization	
  
Barrier
Job	
  1	
  (	
  Join a	
  &	
  b	
  )
Job	
  3 (	
  Group by	
  of	
  c	
  )
Job	
  2	
  	
  (Group	
  by	
  of	
  
a	
  Join b)
Job	
  4	
  (Join	
  of	
  S	
  & R	
  )
Hive	
  -­‐ MR
Example	
  of	
  MR	
  versus	
  Tez
Page	
  4
Single	
  Job
Hive	
  -­‐ Tez
Join a	
  &	
  b
Group	
  by	
  of	
  a	
  Join b
Group by	
  of	
  c
Job	
  4	
  (Join	
  of	
  S	
  & R	
  )
©	
  Hortonworks	
  Inc.	
  2015
Tez	
  – Introduction
Page	
  5
• Distributed	
  execution	
  framework	
  
targeted	
  towards	
  data-­‐processing	
  
applications.
• Based	
  on	
  expressing	
  a	
  computation	
  
as	
  a	
  dataflow	
  graph	
  (DAG).
• Highly	
  customizable	
  to	
  meet	
  a	
  broad	
  
spectrum	
  of	
  use	
  cases.
• Built	
  on	
  top	
  of	
  YARN	
  – the	
  resource	
  
management	
  framework	
  for	
  
Hadoop.
• Open	
  source	
  Apache	
  project	
  and	
  
Apache	
  licensed.
©	
  Hortonworks	
  Inc.	
  2015
What	
  is	
  DAG	
  &	
  Why	
  	
  DAG
Projection
Filter
GroupBy
…
Join
Union
Intersect
…
Split
…
• Directed	
  Acyclic	
  Graph
• Any	
  complicated	
  DAG	
  can	
  been	
  composed	
  of	
  the	
  following	
  3	
  basic	
  
paradigm
– Sequential
– Merge
– Divide
©	
  Hortonworks	
  Inc.	
  2015
Expressing	
  DAG	
  in	
  Tez API
• DAG	
  API	
  (Logic	
  View)
– Allowuser to	
  build	
  DAG
– Topological	
  structure	
  of	
  the	
  data	
  computation	
  flow
• Runtime	
  API	
  (Runtime	
  View)
– Application	
  logic	
  of	
  each	
  computation	
  unit	
  (vertex)
– How to move/read/write	
  data between vertices
©	
  Hortonworks	
  Inc.	
  2015
DAG	
  API	
  (Logic	
  View)
Page	
  8
• Vertex	
  (Processor,	
  Parallelism,	
  Resource,	
  etc…)
• Edge (EdgeProperty)
– DataMovement
– Scatter	
  Gather	
  (Join,	
  GroupBy …	
  )
– Broadcast	
  	
  	
  (	
  Pig	
  Replicated	
  Join	
  /	
  Hive	
  Broadcast	
  Join	
  )
– One-­‐to-­‐One	
  	
  (	
  Pig	
  Order	
  by	
  )
– Custom
©	
  Hortonworks	
  Inc.	
  2015
Runtime	
  API	
  (Runtime	
  View)
Page	
  9
ProcessorInput Output
• Input
– Through	
  which	
  processor	
  receives	
  data	
  on	
  an	
  edge
– Vertex	
  can	
  have	
  multiple	
  inputs
• Processor
– Application	
  Logic	
  (One	
  vertex	
  one	
  processor)
– Consume	
  the	
  inputs	
  and	
  produce	
  the	
  outputs
• Output
– Through	
  which	
  processor	
  writes	
  data	
  to	
  an	
  edge
– One	
  vertex	
  can	
  have	
  multiple	
  outputs	
  
• Example	
  of	
  Input/Output/Processor
– MRInput &	
  MROutput (InputFormat/OutputFormat)
– OrderedGroupedKVInput &	
  OrderedPartitionedKVOutput (Scatter	
  Gather)
– UnorderedKVInput &	
  UnorderedKVOutput (Broadcast	
  &	
  One-­‐to-­‐One)
– PigProcessor/HiveProcessor
©	
  Hortonworks	
  Inc.	
  2015
Benefit	
  of	
  DAG
• Easier	
  to	
  express	
  computation	
  in	
  DAG
• No	
  intermediate	
  data	
  written	
  to	
  HDFS
• Less	
  pressure	
  on	
  NameNode
• No	
  resource	
  queuing	
  effort	
  &	
  less	
  resource	
  contention
• More	
  optimization	
  opportunity	
  with	
  more	
  global	
  context
©	
  Hortonworks	
  Inc.	
  2015
Agenda
•Tez Introduction
•Tez Feature	
  Deep	
  Dive
•Tez Improvement	
  &	
  Debuggability
•Tez Status	
  &	
  Roadmap
©	
  Hortonworks	
  Inc.	
  2015
Container-­‐Reuse
• Reuse	
  the	
  same	
  container	
  across	
  DAG/Vertices/Tasks
• Benefit	
  of	
  Container-­‐Reuse
– Less	
  resources	
  consumed
– Reduce	
  overhead	
  of	
  launching	
  JVM
– Reduce	
  overhead	
  of	
  negotiate with Resource	
  Manager
– Reduce	
  overhead	
  of	
  resource	
  localization
– Reduce	
  network	
  IO
– Object	
  Caching	
  (Object	
  Sharing)
©	
  Hortonworks	
  Inc.	
  2015
Tez Session
• Multiple	
  Jobs/DAGs	
  in	
  one	
  AM
• Container-­‐reuse	
  across	
  Jobs/DAGs
• Data	
  sharing	
  between	
  Jobs/DAGs
©	
  Hortonworks	
  Inc.	
  2015
Dynamic	
  Parallelism	
  Estimation	
  
• VertexManager
– Listen	
  to	
  the	
  other	
  vertices	
  
status
– Coordinate	
  and	
  schedule	
  its	
  
tasks
– Communication	
  between	
  
vertices
©	
  Hortonworks	
  Inc.	
  2015
ATS	
  Integration
• Tez is	
  fully	
  integrated	
  with	
  YARN	
  ATS	
  (Application	
  Timeline	
  
Service)
– DAG	
  Status,	
  DAG	
  Metrics,	
  Task	
  Status,	
  Task	
  Metrics	
  are	
  captured
• Diagnostics	
  &	
  Performance	
  analysis
– Data	
  Source	
  for	
  monitoring	
  &	
  diagnostics	
  
– Data	
  Source	
  for	
  performance	
  analysis	
  
©	
  Hortonworks	
  Inc.	
  2015
Recovery
• AM	
  can	
  crash	
  in	
  corner	
  cases
– OOM
– Node	
  failure
– …
• Continue	
  from	
  the	
  last	
  checkpoint
• Transparent	
  to	
  end	
  users
AM	
  Crash
©	
  Hortonworks	
  Inc.	
  2015
Order	
  By	
  of	
  Pig
f =	
  Load	
  ‘foo’	
  as	
  (x,	
  y);
o =	
  Order	
  f	
  by	
  x;Load
Sample
(Calculate	
  Histogram)
HDFS
Partition
Sort
Broadcast
Load
Sample
(Calculate	
  Histogram)
Partition
Sort
One-­‐to-­‐One
Scatter	
  Gather
Scatter	
  Gather
©	
  Hortonworks	
  Inc.	
  2015
Tez UI
©	
  Hortonworks	
  Inc.	
  2015
Tez UI
Tez UI
20
Download	
  data from	
  ATS
©	
  Hortonworks	
  Inc.	
  2015
RoadMap
• Shared	
  output	
  edges
– Same	
  output	
  to	
  multiple	
  vertices
• Local	
  mode	
  stabilization
• Optimizing	
  (include/exclude)	
  vertex	
  at	
  runtime
• Partial	
  completion	
  VertexManager
• Co-­‐Scheduling
• Framework	
  stats	
  for	
  better	
  runtime	
  decisions
©	
  Hortonworks	
  Inc.	
  2015
Tez	
  – Adoption	
  
• Apache	
  Hive
• Start	
  from	
  Hive	
  0.13
• set	
  hive.exec.engine =	
  tez
• Apache	
  Pig
• Start	
  from	
  Pig	
  0.14
• pig	
  -­‐x	
  tez
• Cascading
• Flink
Page	
  22
©	
  Hortonworks	
  Inc.	
  2015
Tez Community
• Useful	
  Links
– http://tez.apache.org/
– JIRA	
  :	
  https://issues.apache.org/jira/browse/TEZ
– Code	
  Repository:	
  https://git-­‐wip-­‐us.apache.org/repos/asf/tez.git
– Mailing	
  Lists
– Dev List:	
  dev@tez.apache.org
– User	
  List:	
  user@tez.apache.org
– Issues	
  List:	
  issues@tez.apache.org
• Tez Meetup
– http://www.meetup.com/Apache-­‐Tez-­‐User-­‐Group
©	
  Hortonworks	
  Inc.	
  2015
Thank  You!
Questions  &  Answers
Page	
  24

More Related Content

PDF
Tez: Accelerating Data Pipelines - fifthel
PDF
A TPC Benchmark of Hive LLAP and Comparison with Presto
PPTX
Data organization: hive meetup
PDF
Managing Machine Learning workflows on Treasure Data
PPTX
Apache Tez – Present and Future
PDF
Improve data engineering work with Digdag and Presto UDF
PPTX
Apache Tez – Present and Future
PDF
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
Tez: Accelerating Data Pipelines - fifthel
A TPC Benchmark of Hive LLAP and Comparison with Presto
Data organization: hive meetup
Managing Machine Learning workflows on Treasure Data
Apache Tez – Present and Future
Improve data engineering work with Digdag and Presto UDF
Apache Tez – Present and Future
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine

What's hot (20)

PDF
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
PDF
201810 td tech_talk
PPTX
Llap: Locality is Dead
PDF
Quick Introduction to Apache Tez
PPTX
October 2014 HUG : Hive On Spark
PDF
Recent Changes and Challenges for Future Presto
PPTX
Stinger Initiative - Deep Dive
PDF
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
PPTX
EMR and DynamoDB
PPTX
LLAP: long-lived execution in Hive
PPTX
YARN - Presented At Dallas Hadoop User Group
PDF
Spark Summit EU talk by Brij Bhushan Ravat
PPTX
Hive edw-dataworks summit-eu-april-2017
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Geographica: A Benchmark for Geospatial RDF Stores
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PPTX
Apache Tez - Accelerating Hadoop Data Processing
PPTX
Tune up Yarn and Hive
 
PPTX
Apache Hadoop YARN 2015: Present and Future
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
201810 td tech_talk
Llap: Locality is Dead
Quick Introduction to Apache Tez
October 2014 HUG : Hive On Spark
Recent Changes and Challenges for Future Presto
Stinger Initiative - Deep Dive
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
EMR and DynamoDB
LLAP: long-lived execution in Hive
YARN - Presented At Dallas Hadoop User Group
Spark Summit EU talk by Brij Bhushan Ravat
Hive edw-dataworks summit-eu-april-2017
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
An Overview on Optimization in Apache Hive: Past, Present Future
Geographica: A Benchmark for Geospatial RDF Stores
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Tez - Accelerating Hadoop Data Processing
Tune up Yarn and Hive
 
Apache Hadoop YARN 2015: Present and Future
Ad

Similar to 3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai (20)

PPTX
Apache Tez – Present and Future
PPTX
February 2014 HUG : Tez Details and Insides
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PPTX
Tez Data Processing over Yarn
PPTX
Apache Tez -- A modern processing engine
PPTX
Tez big datacamp-la-bikas_saha
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
PDF
Mhug apache storm
PPTX
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
PPTX
Apache Tez : Accelerating Hadoop Query Processing
PPTX
YARN Ready: Integrating to YARN with Tez
PPTX
Apache Apex Meetup at Cask
PDF
Stream Processing Everywhere - What to use?
PPTX
Interactive query in hadoop
PPTX
Hortonworks Yarn Code Walk Through January 2014
PPTX
DataTorrent Presentation @ Big Data Application Meetup
PDF
Tajo_Meetup_20141120
PPTX
La big datacamp2014_vikram_dixit
Apache Tez – Present and Future
February 2014 HUG : Tez Details and Insides
Apache Tez: Accelerating Hadoop Query Processing
Tez Data Processing over Yarn
Apache Tez -- A modern processing engine
Tez big datacamp-la-bikas_saha
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Mhug apache storm
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Apache Tez : Accelerating Hadoop Query Processing
YARN Ready: Integrating to YARN with Tez
Apache Apex Meetup at Cask
Stream Processing Everywhere - What to use?
Interactive query in hadoop
Hortonworks Yarn Code Walk Through January 2014
DataTorrent Presentation @ Big Data Application Meetup
Tajo_Meetup_20141120
La big datacamp2014_vikram_dixit
Ad

More from Luke Han (19)

PDF
Augmented OLAP for Big Data
PDF
Apache Kylin and Use Cases - 2018 Big Data Spain
PPTX
Refactoring your EDW with Mobile Analytics Products
PPTX
Building Enterprise OLAP on Hadoop for FSI
PDF
Apache Kylin Use Cases in China and Japan
PDF
The Apache Way - Building Open Source Community in China - Luke Han
PDF
The Evolution of Apache Kylin by Luke Han
PDF
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
PDF
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
PPTX
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
PDF
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
PDF
Apache Kylin Open Source Journey for QCon2015 Beijing
PPTX
ApacheKylin_HBaseCon2015
PPTX
Apache Kylin Extreme OLAP Engine for Big Data
PPTX
Apache Kylin Introduction
PPTX
Adding Spark support to Kylin at Bay Area Spark Meetup
PPTX
Apache kylin - Big Data Technology Conference 2014 Beijing
PPTX
Kylin OLAP Engine Tour
PPTX
Actuate presentation 2011
Augmented OLAP for Big Data
Apache Kylin and Use Cases - 2018 Big Data Spain
Refactoring your EDW with Mobile Analytics Products
Building Enterprise OLAP on Hadoop for FSI
Apache Kylin Use Cases in China and Japan
The Apache Way - Building Open Source Community in China - Luke Han
The Evolution of Apache Kylin by Luke Han
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
Apache Kylin Open Source Journey for QCon2015 Beijing
ApacheKylin_HBaseCon2015
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Introduction
Adding Spark support to Kylin at Bay Area Spark Meetup
Apache kylin - Big Data Technology Conference 2014 Beijing
Kylin OLAP Engine Tour
Actuate presentation 2011

Recently uploaded (20)

PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Complete React Javascript Course Syllabus.pdf
PDF
System and Network Administraation Chapter 3
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Digital Strategies for Manufacturing Companies
PPT
JAVA ppt tutorial basics to learn java programming
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Essential Infomation Tech presentation.pptx
PDF
medical staffing services at VALiNTRY
PPTX
Transform Your Business with a Software ERP System
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
Materi_Pemrograman_Komputer-Looping.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
How to Migrate SBCGlobal Email to Yahoo Easily
Complete React Javascript Course Syllabus.pdf
System and Network Administraation Chapter 3
Online Work Permit System for Fast Permit Processing
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
How to Choose the Right IT Partner for Your Business in Malaysia
Digital Strategies for Manufacturing Companies
JAVA ppt tutorial basics to learn java programming
How Creative Agencies Leverage Project Management Software.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Essential Infomation Tech presentation.pptx
medical staffing services at VALiNTRY
Transform Your Business with a Software ERP System
Operating system designcfffgfgggggggvggggggggg
Which alternative to Crystal Reports is best for small or large businesses.pdf

3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai

  • 1. ©  Hortonworks  Inc.  2015 Page  1 Apache  Tez -­‐ Next  Generation  of  execution  engine  upon  hadoop Jeff  Zhang  (@zjffdu)
  • 2. ©  Hortonworks  Inc.  2015 Who’s  this  guy • Start  use  pig  from  2009.  Become  Pig  committer  from  Nov   2009 • Join  Hortonworks  in  2014.   • Tez Committer  from  Oct  2014
  • 3. ©  Hortonworks  Inc.  2015 Agenda •Tez Introduction •Tez Feature  Deep  Dive •Tez Status  &  Roadmap
  • 4. ©  Hortonworks  Inc.  2015 I/O  Synchronization   Barrier I/O  Synchronization   Barrier Job  1  (  Join a  &  b  ) Job  3 (  Group by  of  c  ) Job  2    (Group  by  of   a  Join b) Job  4  (Join  of  S  & R  ) Hive  -­‐ MR Example  of  MR  versus  Tez Page  4 Single  Job Hive  -­‐ Tez Join a  &  b Group  by  of  a  Join b Group by  of  c Job  4  (Join  of  S  & R  )
  • 5. ©  Hortonworks  Inc.  2015 Tez  – Introduction Page  5 • Distributed  execution  framework   targeted  towards  data-­‐processing   applications. • Based  on  expressing  a  computation   as  a  dataflow  graph  (DAG). • Highly  customizable  to  meet  a  broad   spectrum  of  use  cases. • Built  on  top  of  YARN  – the  resource   management  framework  for   Hadoop. • Open  source  Apache  project  and   Apache  licensed.
  • 6. ©  Hortonworks  Inc.  2015 What  is  DAG  &  Why    DAG Projection Filter GroupBy … Join Union Intersect … Split … • Directed  Acyclic  Graph • Any  complicated  DAG  can  been  composed  of  the  following  3  basic   paradigm – Sequential – Merge – Divide
  • 7. ©  Hortonworks  Inc.  2015 Expressing  DAG  in  Tez API • DAG  API  (Logic  View) – Allowuser to  build  DAG – Topological  structure  of  the  data  computation  flow • Runtime  API  (Runtime  View) – Application  logic  of  each  computation  unit  (vertex) – How to move/read/write  data between vertices
  • 8. ©  Hortonworks  Inc.  2015 DAG  API  (Logic  View) Page  8 • Vertex  (Processor,  Parallelism,  Resource,  etc…) • Edge (EdgeProperty) – DataMovement – Scatter  Gather  (Join,  GroupBy …  ) – Broadcast      (  Pig  Replicated  Join  /  Hive  Broadcast  Join  ) – One-­‐to-­‐One    (  Pig  Order  by  ) – Custom
  • 9. ©  Hortonworks  Inc.  2015 Runtime  API  (Runtime  View) Page  9 ProcessorInput Output • Input – Through  which  processor  receives  data  on  an  edge – Vertex  can  have  multiple  inputs • Processor – Application  Logic  (One  vertex  one  processor) – Consume  the  inputs  and  produce  the  outputs • Output – Through  which  processor  writes  data  to  an  edge – One  vertex  can  have  multiple  outputs   • Example  of  Input/Output/Processor – MRInput &  MROutput (InputFormat/OutputFormat) – OrderedGroupedKVInput &  OrderedPartitionedKVOutput (Scatter  Gather) – UnorderedKVInput &  UnorderedKVOutput (Broadcast  &  One-­‐to-­‐One) – PigProcessor/HiveProcessor
  • 10. ©  Hortonworks  Inc.  2015 Benefit  of  DAG • Easier  to  express  computation  in  DAG • No  intermediate  data  written  to  HDFS • Less  pressure  on  NameNode • No  resource  queuing  effort  &  less  resource  contention • More  optimization  opportunity  with  more  global  context
  • 11. ©  Hortonworks  Inc.  2015 Agenda •Tez Introduction •Tez Feature  Deep  Dive •Tez Improvement  &  Debuggability •Tez Status  &  Roadmap
  • 12. ©  Hortonworks  Inc.  2015 Container-­‐Reuse • Reuse  the  same  container  across  DAG/Vertices/Tasks • Benefit  of  Container-­‐Reuse – Less  resources  consumed – Reduce  overhead  of  launching  JVM – Reduce  overhead  of  negotiate with Resource  Manager – Reduce  overhead  of  resource  localization – Reduce  network  IO – Object  Caching  (Object  Sharing)
  • 13. ©  Hortonworks  Inc.  2015 Tez Session • Multiple  Jobs/DAGs  in  one  AM • Container-­‐reuse  across  Jobs/DAGs • Data  sharing  between  Jobs/DAGs
  • 14. ©  Hortonworks  Inc.  2015 Dynamic  Parallelism  Estimation   • VertexManager – Listen  to  the  other  vertices   status – Coordinate  and  schedule  its   tasks – Communication  between   vertices
  • 15. ©  Hortonworks  Inc.  2015 ATS  Integration • Tez is  fully  integrated  with  YARN  ATS  (Application  Timeline   Service) – DAG  Status,  DAG  Metrics,  Task  Status,  Task  Metrics  are  captured • Diagnostics  &  Performance  analysis – Data  Source  for  monitoring  &  diagnostics   – Data  Source  for  performance  analysis  
  • 16. ©  Hortonworks  Inc.  2015 Recovery • AM  can  crash  in  corner  cases – OOM – Node  failure – … • Continue  from  the  last  checkpoint • Transparent  to  end  users AM  Crash
  • 17. ©  Hortonworks  Inc.  2015 Order  By  of  Pig f =  Load  ‘foo’  as  (x,  y); o =  Order  f  by  x;Load Sample (Calculate  Histogram) HDFS Partition Sort Broadcast Load Sample (Calculate  Histogram) Partition Sort One-­‐to-­‐One Scatter  Gather Scatter  Gather
  • 18. ©  Hortonworks  Inc.  2015 Tez UI
  • 19. ©  Hortonworks  Inc.  2015 Tez UI
  • 21. ©  Hortonworks  Inc.  2015 RoadMap • Shared  output  edges – Same  output  to  multiple  vertices • Local  mode  stabilization • Optimizing  (include/exclude)  vertex  at  runtime • Partial  completion  VertexManager • Co-­‐Scheduling • Framework  stats  for  better  runtime  decisions
  • 22. ©  Hortonworks  Inc.  2015 Tez  – Adoption   • Apache  Hive • Start  from  Hive  0.13 • set  hive.exec.engine =  tez • Apache  Pig • Start  from  Pig  0.14 • pig  -­‐x  tez • Cascading • Flink Page  22
  • 23. ©  Hortonworks  Inc.  2015 Tez Community • Useful  Links – http://tez.apache.org/ – JIRA  :  https://issues.apache.org/jira/browse/TEZ – Code  Repository:  https://git-­‐wip-­‐us.apache.org/repos/asf/tez.git – Mailing  Lists – Dev List:  dev@tez.apache.org – User  List:  user@tez.apache.org – Issues  List:  issues@tez.apache.org • Tez Meetup – http://www.meetup.com/Apache-­‐Tez-­‐User-­‐Group
  • 24. ©  Hortonworks  Inc.  2015 Thank  You! Questions  &  Answers Page  24