SlideShare a Scribd company logo
Pivotal HAWQ 
A.Grishchenko 
HadoopKitchen @ Mail.ru 
27 Sep 2014 
Pivotal Confidential––Internal Use Only 1
SQL-on-Hadoop Solutions 
2008 
Hive 
 Developed by Facebook 
– Hive is used for data analysis in their data warehouse 
– DWH size is ~300PB at the moment, ~600TB of data is loaded daily. Data 
is compressed using ORCFiles, compression ratio is ~8x 
 HiveQL language is not compatible with ANSI SQL-92 
 Has many limitations on subqueries 
 Cost-based optimizer (Optiq) is only in technical preview now 
Pivotal Confidential–Internal Use Only 2
SQL-on-Hadoop Solutions 
2008 
Hive 
 Developed by Cloudera 
10.2012 
Impala 
– Open-source solution 
– Cloudera sells this solution to enterprise shops 
– Was in beta until the May’2013 
 Supports HiveQL, moving forward complete ANSI SQL-92 support 
 Written in C++, does not use Map-Reduce for running queries 
 Requires much memory, big tables join usually causes OOM error 
Pivotal Confidential–Internal Use Only 3
SQL-on-Hadoop Solutions 
2008 
Hive 
 Hortonworks initiative 
10.2012 
Impala 
02.2013 
Stinger 
– Consists of a number of steps to make Hive run 100x faster 
 Tez – solution to make Hive queries be translated to Tez jobs, which are 
similar to Map-Reduce but may have arbitrary topology 
 Optiq – cost-based query optimizer for Hive (technical preview ATM) 
 ORCFile – columnar storage format with adaptive compression and 
inline indexes 
 Hive-5317 – ACID and Update/Delete support (release at ~ 11.2014) 
Pivotal Confidential–Internal Use Only 4
SQL-on-Hadoop Solutions 
2008 
Hive 
 Pivotal product 
10.2012 
Impala 
02.2013 
Stinger 
02.2013 
HAWQ 
– Greenplum MPP DBMS, ported to store data in HDFS 
– Written in C, query optimizer is rewritten for this solution (ORCA) 
 Supports ANSI SQL-92 and analytic extensions from SQL-2003 
 Supports complex queries with correlated subqueries, window functions 
and different joins 
 Data is put on disk only if the process does not have enough memory 
Pivotal Confidential–Internal Use Only 5
SQL-on-Hadoop Solutions 
2008 
Hive 
 HP Vertica 
10.2012 
Impala 
02.2013 
Stinger 
02.2013 
HAWQ 
– Supports only MapR distribution as requires updatable storage 
– Supports ANSI SQL-92, SQL-2003 
– Supports UPDATE/DELETE 
– Officially announced as available in July’2014, no implementations yet 
 IBM BigSQL v3 
– IBM DB2 ported to store data in HDFS 
– Federated queries, good query optimizer, etc. 
 Both solutions are similar to Pivotal HAWQ in general idea 
2014 
Vertica, 
BigSQL 
Pivotal Confidential–Internal Use Only 6
Pivotal HAWQ Components 
Master 
Server 1 
Server 3 
Segment 1 
Segment 2 
… 
Segment K 
Standby 
Master 
Server 2 
Server 4 
Segment K+1 
Segment K+2 
… 
Segment 2*K 
Server M 
… 
Segment N 
… 
Pivotal Confidential–Internal Use Only 7
Pivotal HAWQ Components 
Server 1 
HAWQ Master 
Server 2 
ZK QJM ZK QJM ZK QJM 
HAWQ SBMstr 
Server 5 
Datanode 
HAWQ Segm. 
Server 3 
NameNode 
… 
Server 4 
SNameNode 
Server 6 
Datanode 
HAWQ Segm. 
Server M 
Datanode 
HAWQ Segm. 
Pivotal Confidential–Internal Use Only 8
Pivotal HAWQ Components 
HAWQ Master 
Query Parser 
Query Optimizer 
Query Executor 
Transaction 
Manager 
Metadata 
Catalog 
Process 
Manager 
HAWQ Standby Master 
Query Parser 
Query Optimizer 
Query Executor 
Transaction 
Manager 
Metadata 
Catalog 
Process 
Manager 
WAL 
replic. 
Pivotal Confidential–Internal Use Only 9
Pivotal HAWQ Components 
 Metadata is stored only on master-servers 
 Metadata is stored in modified Postgres instance, replicated 
to standby master with WAL 
 Metadata contains 
– Table information – schema, names, files 
– Statistics – number of unique values, value ranges, sample values, 
etc. 
– Information about users, groups, priorities, etc. 
 Master server shutdown causes the switch to standby with 
the loss of running sessions 
Pivotal Confidential–Internal Use Only 10
Pivotal HAWQ Components 
HAWQ Segment 
Query Executor 
libhdfs3 
PXF 
HDFS Datanode 
Segment Data Directory 
Local Filesystem (xfs) 
Spill Data Directory 
Pivotal Confidential–Internal Use Only 11
Pivotal HAWQ Components 
 Both masters and segments are modified postgres 
instances (to be clear, modified Greenplum instances) 
 Opening connection to the master server you fork 
postmaster process that starts to work with your session 
 Starting the query execution you connect to the segment 
instances and they also fork a process to execute query 
 Query execution plan is split into independent blocks 
(slices), each of them is executed as a separate OS process 
on the segment server, moving the data through UDP 
Pivotal Confidential–Internal Use Only 12
Pivotal HAWQ Components 
 Tables can be stored as: 
– Row-oriented (quicklz, zlib compression) 
– Column-oriented (quicklz, zlib, rle compression) 
– Parquet tables 
 Each segment has separate directory on HDFS where it 
stores its data shard 
 Within columnar storage each column is represented as a 
separate file 
 Parquet allows to store the table by columns and does not 
load NameNode with many files / block location requests 
Pivotal Confidential–Internal Use Only 13
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 14
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 15
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 16
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 17
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 18
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 19
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 20
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 21
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 22
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 23
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 24
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 25
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 26
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 27
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 28
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 29
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 30
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 31
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 32
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
MotionGather 
Projects.beer, s.price 
HashJoinb.name = s.bar 
MotionRedist(b.name) 
s Filterb.city = 'San Francisco' 
b 
ScanBars 
HAWQ Segment 
Backend 
QE S1 S2 S3 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
ScanSells 
Pivotal Confidential–Internal Use Only 33
Query Execution in Pivotal HAWQ 
HAWQ Master 
Parser Query Optimiz. 
Metadata 
Transact. Mgr. 
Process Mgr. 
Query Executor 
NameNode 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Backend 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 34
PXF Framework 
 Gives you ability to read different data types from HDFS 
– Text files, both compressed and uncompressed 
– Seqence-files 
– AVRO-files 
 Able to read data from external data sources 
– HBase 
– Cassandra 
– Redis 
 Extensible API 
Pivotal Confidential–Internal Use Only 35
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 36
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 37
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 38
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 39
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 40
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 41
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 42
NameNode 
PXF Framework 
HAWQ Master 
PXF Fragmenter 
Process Mgr. 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
HAWQ Segment 
Query Executor 
PXF Accessor 
PXF Fragmenter 
HDFS Datanode 
Segment Directory 
Local Spill Directory 
Pivotal Confidential–Internal Use Only 43
Further Steps 
 Master server scaling – pool of master servers 
 New native data storage formats and new native 
compression algorithms 
 YARN as resource manager for HAWQ 
 Dynamic segment allocation / decommission 
Pivotal Confidential–Internal Use Only 44
Questions? 
Pivotal Confidential––Internal Use Only 45
BUILT FOR THE SPEED OF BUSINESS

More Related Content

PPTX
Pivotal HD as a Cloud Foundry Service
PDF
How to manage Hortonworks HDB Resources with YARN
PPTX
Architecting Applications with Hadoop
PDF
Application architectures with Hadoop – Big Data TechCon 2014
PDF
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
PDF
Application Architectures with Hadoop - Big Data TechCon SF 2014
PDF
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
PPTX
Introduction to Apache Drill
Pivotal HD as a Cloud Foundry Service
How to manage Hortonworks HDB Resources with YARN
Architecting Applications with Hadoop
Application architectures with Hadoop – Big Data TechCon 2014
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Application Architectures with Hadoop - Big Data TechCon SF 2014
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Introduction to Apache Drill

What's hot (20)

PPTX
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
PPTX
Apache Drill
PDF
Applications on Hadoop
PPTX
Modern Data Architecture
PDF
An introduction to apache drill presentation
PDF
Introduction to Impala
PDF
a Secure Public Cache for YARN Application Resources
PDF
Cloudera Impala
PDF
NYC HUG - Application Architectures with Apache Hadoop
PPTX
Incredible Impala
PPTX
February 2014 HUG : Pig On Tez
PDF
SQL Engines for Hadoop - The case for Impala
PDF
Cloudera Impala, updated for v1.0
PPTX
Taming the Elephant: Efficient and Effective Apache Hadoop Management
PDF
Hadoop 3.0 - Revolution or evolution?
PDF
Big data processing meets non-volatile memory: opportunities and challenges
KEY
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
PPTX
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
PPTX
Using Apache Drill
PDF
Hadoop User Group - Status Apache Drill
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Apache Drill
Applications on Hadoop
Modern Data Architecture
An introduction to apache drill presentation
Introduction to Impala
a Secure Public Cache for YARN Application Resources
Cloudera Impala
NYC HUG - Application Architectures with Apache Hadoop
Incredible Impala
February 2014 HUG : Pig On Tez
SQL Engines for Hadoop - The case for Impala
Cloudera Impala, updated for v1.0
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Hadoop 3.0 - Revolution or evolution?
Big data processing meets non-volatile memory: opportunities and challenges
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Using Apache Drill
Hadoop User Group - Status Apache Drill
Ad

Viewers also liked (20)

PPTX
Apache HAWQ Architecture
PDF
MPP vs Hadoop
PPTX
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
PPTX
Apache Spark Architecture
PPTX
Архитектура Apache HAWQ Highload++ 2015
PDF
HAWQ: a massively parallel processing SQL engine in hadoop
PDF
Greenplum Architecture
PPTX
Build & test Apache Hawq
PDF
Managing Apache HAWQ with Apache AMBARI
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
PPTX
How to Use Apache Zeppelin with HWX HDB
PDF
SQL and Machine Learning on Hadoop using HAWQ
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
PPTX
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
PDF
Pivotal Big Data Suite: A Technical Overview
PDF
Introduction to Greenplum
PPT
Hadoop distributions - ecosystem
PDF
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
PDF
gsoc_mentor for Shivram Mani
PPTX
PXF BDAM 2016
Apache HAWQ Architecture
MPP vs Hadoop
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Apache Spark Architecture
Архитектура Apache HAWQ Highload++ 2015
HAWQ: a massively parallel processing SQL engine in hadoop
Greenplum Architecture
Build & test Apache Hawq
Managing Apache HAWQ with Apache AMBARI
Webinar turbo charging_data_science_hawq_on_hdp_final
How to Use Apache Zeppelin with HWX HDB
SQL and Machine Learning on Hadoop using HAWQ
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Introduction to Impala ~Hadoop用のSQLエンジン~ #hcj13w
Pivotal Big Data Suite: A Technical Overview
Introduction to Greenplum
Hadoop distributions - ecosystem
[D22] Pivotal HD 2.0 -業界最高レベルSQL on Hadoop技術「HAWQ」解説- by Masayuki Matsushita
gsoc_mentor for Shivram Mani
PXF BDAM 2016
Ad

Similar to Pivotal hawq internals (20)

PDF
Pivotal HAWQ 소개
PPTX
SQL on Hadoop: Defining the New Generation of Analytics Databases
PDF
Big data Hadoop Analytic and Data warehouse comparison guide
PDF
Big data hadooop analytic and data warehouse comparison guide
PPTX
5. pivotal hd 2013
PDF
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
PPTX
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
PDF
Hoodie - DataEngConf 2017
PPTX
Learn Hadoop Administration
PDF
HAWQ Meets Hive - Querying Unmanaged Data
PPT
Eric Baldeschwieler Keynote from Storage Developers Conference
PPTX
Hawq meets Hive - DataWorks San Jose 2017
PDF
Key trends in Big Data and new reference architecture from Hewlett Packard En...
PDF
Yarn by default (Spark on YARN)
PDF
HUG Meetup 2013: HCatalog / Hive Data Out
PDF
May 2013 HUG: HCatalog/Hive Data Out
PDF
Savanna - Elastic Hadoop on OpenStack
PPTX
Learn to setup a Hadoop Multi Node Cluster
PPTX
Hadoop_arunam_ppt
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
Pivotal HAWQ 소개
SQL on Hadoop: Defining the New Generation of Analytics Databases
Big data Hadoop Analytic and Data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
5. pivotal hd 2013
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Hoodie - DataEngConf 2017
Learn Hadoop Administration
HAWQ Meets Hive - Querying Unmanaged Data
Eric Baldeschwieler Keynote from Storage Developers Conference
Hawq meets Hive - DataWorks San Jose 2017
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Yarn by default (Spark on YARN)
HUG Meetup 2013: HCatalog / Hive Data Out
May 2013 HUG: HCatalog/Hive Data Out
Savanna - Elastic Hadoop on OpenStack
Learn to setup a Hadoop Multi Node Cluster
Hadoop_arunam_ppt
Big Data Analytics with Hadoop, MongoDB and SQL Server

Recently uploaded (20)

PPTX
Custom Software Development Services.pptx.pptx
PPTX
chapter 5 systemdesign2008.pptx for cimputer science students
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
GSA Content Generator Crack (2025 Latest)
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Designing Intelligence for the Shop Floor.pdf
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
Website Design Services for Small Businesses.pdf
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
Cost to Outsource Software Development in 2025
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
iTop VPN Crack Latest Version Full Key 2025
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Cybersecurity: Protecting the Digital World
Custom Software Development Services.pptx.pptx
chapter 5 systemdesign2008.pptx for cimputer science students
Computer Software and OS of computer science of grade 11.pptx
GSA Content Generator Crack (2025 Latest)
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Salesforce Agentforce AI Implementation.pdf
Designing Intelligence for the Shop Floor.pdf
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Website Design Services for Small Businesses.pdf
Wondershare Recoverit Full Crack New Version (Latest 2025)
Cost to Outsource Software Development in 2025
DNT Brochure 2025 – ISV Solutions @ D365
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Trending Python Topics for Data Visualization in 2025
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
iTop VPN Crack Latest Version Full Key 2025
How to Use SharePoint as an ISO-Compliant Document Management System
Cybersecurity: Protecting the Digital World

Pivotal hawq internals

  • 1. Pivotal HAWQ A.Grishchenko HadoopKitchen @ Mail.ru 27 Sep 2014 Pivotal Confidential––Internal Use Only 1
  • 2. SQL-on-Hadoop Solutions 2008 Hive  Developed by Facebook – Hive is used for data analysis in their data warehouse – DWH size is ~300PB at the moment, ~600TB of data is loaded daily. Data is compressed using ORCFiles, compression ratio is ~8x  HiveQL language is not compatible with ANSI SQL-92  Has many limitations on subqueries  Cost-based optimizer (Optiq) is only in technical preview now Pivotal Confidential–Internal Use Only 2
  • 3. SQL-on-Hadoop Solutions 2008 Hive  Developed by Cloudera 10.2012 Impala – Open-source solution – Cloudera sells this solution to enterprise shops – Was in beta until the May’2013  Supports HiveQL, moving forward complete ANSI SQL-92 support  Written in C++, does not use Map-Reduce for running queries  Requires much memory, big tables join usually causes OOM error Pivotal Confidential–Internal Use Only 3
  • 4. SQL-on-Hadoop Solutions 2008 Hive  Hortonworks initiative 10.2012 Impala 02.2013 Stinger – Consists of a number of steps to make Hive run 100x faster  Tez – solution to make Hive queries be translated to Tez jobs, which are similar to Map-Reduce but may have arbitrary topology  Optiq – cost-based query optimizer for Hive (technical preview ATM)  ORCFile – columnar storage format with adaptive compression and inline indexes  Hive-5317 – ACID and Update/Delete support (release at ~ 11.2014) Pivotal Confidential–Internal Use Only 4
  • 5. SQL-on-Hadoop Solutions 2008 Hive  Pivotal product 10.2012 Impala 02.2013 Stinger 02.2013 HAWQ – Greenplum MPP DBMS, ported to store data in HDFS – Written in C, query optimizer is rewritten for this solution (ORCA)  Supports ANSI SQL-92 and analytic extensions from SQL-2003  Supports complex queries with correlated subqueries, window functions and different joins  Data is put on disk only if the process does not have enough memory Pivotal Confidential–Internal Use Only 5
  • 6. SQL-on-Hadoop Solutions 2008 Hive  HP Vertica 10.2012 Impala 02.2013 Stinger 02.2013 HAWQ – Supports only MapR distribution as requires updatable storage – Supports ANSI SQL-92, SQL-2003 – Supports UPDATE/DELETE – Officially announced as available in July’2014, no implementations yet  IBM BigSQL v3 – IBM DB2 ported to store data in HDFS – Federated queries, good query optimizer, etc.  Both solutions are similar to Pivotal HAWQ in general idea 2014 Vertica, BigSQL Pivotal Confidential–Internal Use Only 6
  • 7. Pivotal HAWQ Components Master Server 1 Server 3 Segment 1 Segment 2 … Segment K Standby Master Server 2 Server 4 Segment K+1 Segment K+2 … Segment 2*K Server M … Segment N … Pivotal Confidential–Internal Use Only 7
  • 8. Pivotal HAWQ Components Server 1 HAWQ Master Server 2 ZK QJM ZK QJM ZK QJM HAWQ SBMstr Server 5 Datanode HAWQ Segm. Server 3 NameNode … Server 4 SNameNode Server 6 Datanode HAWQ Segm. Server M Datanode HAWQ Segm. Pivotal Confidential–Internal Use Only 8
  • 9. Pivotal HAWQ Components HAWQ Master Query Parser Query Optimizer Query Executor Transaction Manager Metadata Catalog Process Manager HAWQ Standby Master Query Parser Query Optimizer Query Executor Transaction Manager Metadata Catalog Process Manager WAL replic. Pivotal Confidential–Internal Use Only 9
  • 10. Pivotal HAWQ Components  Metadata is stored only on master-servers  Metadata is stored in modified Postgres instance, replicated to standby master with WAL  Metadata contains – Table information – schema, names, files – Statistics – number of unique values, value ranges, sample values, etc. – Information about users, groups, priorities, etc.  Master server shutdown causes the switch to standby with the loss of running sessions Pivotal Confidential–Internal Use Only 10
  • 11. Pivotal HAWQ Components HAWQ Segment Query Executor libhdfs3 PXF HDFS Datanode Segment Data Directory Local Filesystem (xfs) Spill Data Directory Pivotal Confidential–Internal Use Only 11
  • 12. Pivotal HAWQ Components  Both masters and segments are modified postgres instances (to be clear, modified Greenplum instances)  Opening connection to the master server you fork postmaster process that starts to work with your session  Starting the query execution you connect to the segment instances and they also fork a process to execute query  Query execution plan is split into independent blocks (slices), each of them is executed as a separate OS process on the segment server, moving the data through UDP Pivotal Confidential–Internal Use Only 12
  • 13. Pivotal HAWQ Components  Tables can be stored as: – Row-oriented (quicklz, zlib compression) – Column-oriented (quicklz, zlib, rle compression) – Parquet tables  Each segment has separate directory on HDFS where it stores its data shard  Within columnar storage each column is represented as a separate file  Parquet allows to store the table by columns and does not load NameNode with many files / block location requests Pivotal Confidential–Internal Use Only 13
  • 14. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 14
  • 15. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 15
  • 16. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 16
  • 17. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 17
  • 18. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 18
  • 19. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 19
  • 20. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 20
  • 21. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 21
  • 22. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 22
  • 23. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 23
  • 24. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 24
  • 25. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 25
  • 26. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 26
  • 27. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 27
  • 28. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 28
  • 29. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 29
  • 30. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 30
  • 31. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 31
  • 32. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 32
  • 33. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory MotionGather Projects.beer, s.price HashJoinb.name = s.bar MotionRedist(b.name) s Filterb.city = 'San Francisco' b ScanBars HAWQ Segment Backend QE S1 S2 S3 HDFS Datanode Segment Directory Local Spill Directory ScanSells Pivotal Confidential–Internal Use Only 33
  • 34. Query Execution in Pivotal HAWQ HAWQ Master Parser Query Optimiz. Metadata Transact. Mgr. Process Mgr. Query Executor NameNode HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Backend HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 34
  • 35. PXF Framework  Gives you ability to read different data types from HDFS – Text files, both compressed and uncompressed – Seqence-files – AVRO-files  Able to read data from external data sources – HBase – Cassandra – Redis  Extensible API Pivotal Confidential–Internal Use Only 35
  • 36. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 36
  • 37. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 37
  • 38. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 38
  • 39. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 39
  • 40. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 40
  • 41. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 41
  • 42. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 42
  • 43. NameNode PXF Framework HAWQ Master PXF Fragmenter Process Mgr. HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory HAWQ Segment Query Executor PXF Accessor PXF Fragmenter HDFS Datanode Segment Directory Local Spill Directory Pivotal Confidential–Internal Use Only 43
  • 44. Further Steps  Master server scaling – pool of master servers  New native data storage formats and new native compression algorithms  YARN as resource manager for HAWQ  Dynamic segment allocation / decommission Pivotal Confidential–Internal Use Only 44
  • 46. BUILT FOR THE SPEED OF BUSINESS