SlideShare a Scribd company logo
Data 
Modeling 
and 
Netezza 
Biju 
Nair 
NENUG 
Talk 
30-­‐Apr-­‐2014
Mo@va@on 
and 
Goal 
• Mo@va@on 
– Performance 
degrada@on 
with 
data 
volume 
• Goal 
– Highlight 
considera@ons 
while 
modeling 
for 
NZ 
2
Data 
Modeling 
• Logical 
Data 
Modeling 
– Business 
domain 
data 
representa@on 
– Independent 
of 
DBMS 
technology 
– NZ 
is 
an 
appliance 
for 
analy@cs 
• Set 
based 
processing 
vs 
row 
based 
processing 
• De-­‐normaliza@on 
• Snow 
flake/Star 
schema 
• Physical 
Data 
Modeling 
– Takes 
in 
the 
DBMS 
features 
and 
constraints 
– Need 
to 
understand 
the 
DBMS 
architecture 
3
NZ 
Architecture 
Host 
Snippet 
Processors 
Snippet 
Processors 
Snippet 
Processors 
Data 
Data 
Data 
-­‐ 
Parse, 
Op+mize 
and 
Compile 
query 
-­‐ 
Schedule 
snippets 
-­‐ 
Distribute 
data 
Executes 
snippets 
-­‐ 
Shared 
nothing 
MPP 
-­‐ 
Custom 
IP 
backbone 
-­‐ Appliance 
efficiency 
is 
maximized 
when 
-­‐ snippet 
processors 
can 
run 
independently 
i.e. 
data 
independence 
-­‐ All 
snippet 
processors 
are 
u@lized 
uniformly 
4
Snippet 
Processors 
Data 
Accelerator 
(FPGA) 
CPU 
Compute 
Host 
Reads 
compressed 
data 
Un-­‐compress 
Remove 
columns 
Restrict 
rows 
(where) 
Perform 
computa+on 
Send 
data 
to 
host 
-­‐ Disk 
reads 
are 
incredibly 
slow 
rela@ve 
to 
other 
components 
especially 
seek 
@me 
-­‐ While 
the 
CPU 
overhead 
is 
reduced, 
volume 
of 
data 
read 
will 
impact 
performance 
5
Data 
Storage 
Host 
Snippet 
Processors 
Data 
Extend 
Page 
Extend 
Extend 
… 
Meta-­‐Data 
-­‐ 
Meta 
data 
iden@fies 
extends/pages 
to 
read 
or 
skip 
6
Modeling 
Priori@es 
• U@liza@on 
of 
all 
snippet 
processors 
– Need 
to 
be 
able 
to 
u@lize 
uniformly 
• Maximize 
MPP 
capability 
of 
snippet 
processors 
– Ideally 
snippet 
processors 
should 
be 
independent 
• Minimize 
data 
read 
from 
disk 
– Minimize 
data 
stored 
• Improve 
computa@on 
in 
snippet 
processor 
– Compounded 
with 
data 
volume 
will 
help 
performance 
7
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
Host 
Snippet 
Processors 
Snippet 
Processors 
Snippet 
Processors 
1,MA,1212,… 
3,MA,0414,… 
2,CA,0113,… 
1,MA,1212,… 
2,CA,0113,… 
3,MA,0414,… 
Distribute 
by 
state 
Data 
Skew 
-­‐ NZ 
will 
pick 
one 
of 
the 
columns 
to 
distribute 
if 
none 
specified 
in 
table 
defini@on 
-­‐ First 
column 
in 
the 
table 
8
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
Host 
1,MA,1212,… 
2,CA,0113,… 
3,MA,0414,… 
Snippet 
Processors 
Snippet 
Processors 
Distribute 
by 
mo-­‐yr 
Snippet 
Processors 
1,MA,1212,… 
2,CA,0113,… 
3,MA,0414,… 
-­‐ Snippet 
processors 
are 
u@lized 
uniformly 
-­‐ What 
if 
most 
of 
the 
query 
is 
on 
for 
the 
current 
month? 
-­‐ Processing 
skew 
9
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
Host 
Snippet 
Processors 
Snippet 
Processors 
Snippet 
Processors 
1,MA,1212,… 
4,CA,0414,… 
2,CA,0113,… 
3,MA,0414,… 
1,MA,1212,… 
2,CA,0113,… 
3,MA,0414,… 
4,CA,0414,… 
Distribute 
random 
-­‐ Snippet 
processors 
are 
u@lized 
uniformly 
-­‐ Helps 
prevent 
processing 
skew 
10
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
and 
Table 
Joins 
3,ORD1,ITEM1,… 
2,ORD1,ITEM1,… 
4,ORD1,ITEM1,… 
1,ORD1,ITEM1,… 
Host 
Snippet 
Processors 
Snippet 
Processors 
Snippet 
Processors 
4,CA,0414,… 
1,MA,1212,… 
3,MA,0414,… 
2,CA,0113,… 
1,ORD1,ITEM1,… 
2,ORD1,ITEM1,… 
3,ORD1,ITEM1,… 
4,ORD1,ITEM1,… 
Distribute 
random 
Need 
to 
redistribute 
data 
from 
both 
tables 
-­‐ Snippet 
processors 
are 
u@lized 
uniformly 
-­‐ Makes 
snippet 
processors 
dependent 
on 
others 
impac@ng 
MPP 
maximiza@on 
11
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
and 
Table 
Joins 
2,ORD1,ITEM1,… 
3,ORD1,ITEM1,… 
1,ORD1,ITEM1,… 
4,ORD1,ITEM1,… 
Host 
Snippet 
Processors 
Snippet 
Processors 
Snippet 
Processors 
4,CA,0414,… 
1,MA,1212,… 
3,MA,0414,… 
2,CA,0113,… 
1,ORD1,ITEM1,… 
2,ORD1,ITEM1,… 
3,ORD1,ITEM1,… 
4,ORD1,ITEM1,… 
Distribute 
on 
Join 
column 
-­‐ 
cid 
-­‐ Snippet 
processors 
are 
u@lized 
uniformly 
-­‐ Makes 
snippet 
processors 
dependent 
on 
others 
impac@ng 
MPP 
maximiza@on 
-­‐ Becer 
than 
the 
previous 
scenario 
Need 
to 
redistribute 
data 
from 
one 
table 
12
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
and 
Table 
Joins 
2,ORD1,ITEM1,… 
Distribute 
both 
tables 
on 
Join 
column 
-­‐ 
cid 
3,ORD1,ITEM1,… 
1,ORD1,ITEM1,… 
4,ORD1,ITEM1,… 
Host 
Snippet 
Processors 
Snippet 
Processors 
Snippet 
Processors 
1,MA,1212,… 
4,CA,0414,… 
2,CA,0113,… 
3,MA,0414,… 
-­‐ Snippet 
processors 
are 
u@lized 
uniformly 
-­‐ Makes 
snippet 
processors 
independent 
maximizing 
MPP 
13
Snippet 
Processor 
U@liza@on 
Data 
Distribu+on 
• Iden@fy 
keys 
to 
distribute 
data 
uniformly 
– Avoid 
data 
and 
processing 
skew 
– Try 
using 
join 
columns 
as 
the 
distribu@on 
keys 
– Choose 
same 
data 
types 
for 
join 
columns 
• If 
table 
size 
is 
small 
random 
distribu@on 
is 
fine 
– If 
one 
of 
the 
join 
table 
is 
small, 
NZ 
will 
broadcast 
• Redistribu@on 
may 
not 
be 
an 
overkill 
for 
small 
data 
– For 
e.g., 
selec@ng 
a 
small 
number 
if 
columns 
14
Snippet 
Processor 
U@liza@on 
Distribu+on 
and 
Query 
Time 
3.5 
3 
2.5 
2 
1.5 
1 
0.5 
0 
Query 
Time 
For 
Different 
Distribu+ons 
Random 
1 
Correct 
Distribu@on 
2 
Correct 
Distribu@on 
Time 
(min) 
15
Snippet 
Processor 
U@liza@on 
Join 
Column 
Type 
and 
Query 
Time 
2.5 
2 
1.5 
1 
0.5 
0 
Join 
query 
+me 
-­‐ 
same 
and 
diff 
data 
types 
Incorrect 
Data 
Types 
Correct 
Data 
Types 
Time 
(min) 
16
Minimize 
Data 
Read 
From 
Disk 
Zone 
Maps 
• Data 
types 
which 
supports 
Zone 
Maps 
– All 
integer 
data 
types 
• int1 
• int2 
• int4 
• int8 
– Date 
– Timestamp 
17 
Refer 
to 
the 
product 
manual 
for 
the 
version 
of 
NZ 
used 
for 
the 
complete 
list 
of 
zone 
map 
able 
data 
types
Minimize 
Data 
Read 
From 
Disk 
Table 
column 
(cid) 
is 
numeric(10,0) 
Host 
Snippet 
Processors 
Data 
Extend 
1 
Page 
Extend 
2 
… 
Extend 
3 
Zone 
Maps 
Meta-­‐Data 
-­‐ 
May 
end 
up 
reading 
all 
data 
from 
disk 
No 
zone 
map 
for 
cid 
18
Minimize 
Data 
Read 
From 
Disk 
Table 
column 
(cid) 
is 
bigint 
Host 
Snippet 
Processors 
Data 
Extend 
Page 
Extend 
Extend 
… 
Zone 
Maps 
Meta-­‐Data 
-­‐ 
Zone 
maps 
can 
be 
used 
to 
minimize 
data 
read 
from 
disk 
19
Minimize 
Data 
Read 
From 
Disk 
Zone 
Maps 
and 
Query 
Time 
1.6 
1.55 
1.5 
1.45 
1.4 
1.35 
1.3 
Query 
+me 
with 
and 
without 
zone 
map 
Incorrect 
Data 
Type 
Correct 
Data 
Type 
Time 
(min) 
20
Minimize 
Data 
Read 
From 
Disk 
Clustered 
Base 
Tables 
• NZ 
stores 
data 
with 
same 
organize 
keys 
closely 
• Addi@onal 
data 
types 
are 
zone 
map 
able 
– char 
– varchar 
– nchar 
– nvarchar 
– float 
– double 
– bool 
– @me 
– Interval 
• Helps 
improve 
performance 
of 
mul@ 
table 
join 
21
Minimize 
Data 
Read 
From 
Disk 
Extend 
Extend 
Extend 
Clustered 
Base 
Tables 
Table 
distributed 
on 
cid 
1,MA,Boston,… 
5,CA,LA,… 
3,FL,Tampa,,… 
1,MA,Salem,… 
5,CA,SF,… 
3,FL,Orlando,… 
1,MA,Lowell,… 
5,CA,Pasadena, 
… 
3,FL,Miami,… 
Table 
distributed 
on 
cid 
organize 
on 
state 
Extend 
1,MA,Boston,… 
1,MA,Salem,… 
1,MA,Lowell,… 
Extend 
3,FL,Tampa,,… 
3,FL,Orlando,… 
3,FL,Miami,… 
Extend 
5,CA,LA,… 
5,CA,SF,… 
5,CA,Pasadena, 
… 
State 
Extend 
1,AL,Alabama,… 
5,CA,California, 
… 
10,FL,Florida,… 
22,MA,Mass,… 
22
Minimize 
Data 
Read 
From 
Disk 
Clustered 
Base 
Table 
2.5 
2 
1.5 
1 
0.5 
0 
Query 
Time 
with 
and 
without 
org 
No 
Org+Correct 
Dist 
Org+Correct 
Dist 
Time 
(min) 
23
Minimize 
Data 
Read 
From 
Disk 
Materialized 
Views 
• View 
with 
frequently 
used 
columns 
of 
base 
table 
– Unlike 
views, 
materialized 
view 
stores 
data 
– Reduced 
data 
read 
from 
disk 
– Addi@onal 
storage 
required 
– Need 
to 
be 
refreshed 
if 
base 
table 
data 
changes 
• Can 
be 
used 
as 
an 
index 
against 
base 
table 
– Stores 
loca@on 
of 
base 
table 
data 
loc 
in 
a 
column 
24
Minimize 
Data 
Read 
From 
Disk 
Extend 
Extend 
Extend 
Materialized 
Views 
T_EMP 
distributed 
on 
cid 
1,MA,Mike,… 
5,CA,Fally,… 
3,FL,Chris,… 
4,MA,Robert,… 
7,CA,Mary,… 
2,FL,Jus+n,… 
6,MA,Harini,… 
8,CA,Mike,… 
9,FL,Martha,… 
SELECT 
ID, 
NAME 
FROM 
T_EMP; 
MV 
on 
T_EMP 
order 
by 
state, 
name 
Extend 
5,CA,Fally 
7,CA,Mary 
8,CA,Mike 
3,FL,Chris 
2,FL,Jus+n 
9,FL,Martha 
6,MA,Harini 
1,MA,Mike 
4,MA,Robert 
SELECT 
* 
FROM 
T_EMP 
WHERE 
STATE 
= 
‘CA’ 
25
Minimize 
Data 
Read 
From 
Disk 
Materialized 
Views 
1.4 
1.2 
1 
0.8 
0.6 
0.4 
0.2 
0 
Query 
+me 
on 
single 
table 
-­‐MV 
vs 
No 
MV 
No 
MV 
With 
MV 
Time 
(min) 
26
Minimize 
Data 
Read 
From 
Disk 
Materialized 
Views 
1.8 
1.6 
1.4 
1.2 
1 
0.8 
0.6 
0.4 
0.2 
0 
Join 
Query 
Time 
-­‐ 
With 
and 
without 
MV 
No 
MV 
With 
MV 
Time 
(min) 
27
Minimize 
Data 
Read 
From 
Disk 
Minimize 
Data 
Stored 
• Choose 
storage 
efficient 
data 
types 
– Difference 
between 
bigint 
and 
int 
is 
4 
bytes 
• Use 
char 
instead 
of 
varchar 
if 
the 
data 
length 
is 
fixed 
– varchar 
has 
a 
2 
byte 
overhead 
• Define 
columns 
as 
“not 
null” 
where 
possible 
• Store 
only 
the 
required 
data 
in 
table 
columns 
• Encode 
duplicate 
data 
stored 
in 
rows 
28
Improve 
Computa@on 
in 
Snippet 
Processor 
NZ 
Object 
Defini+ons 
• Define 
columns 
as 
“not 
null” 
where 
possible 
– Removes 
logic 
to 
check 
nulls 
• Define 
table 
keys 
and 
rela@onships 
– Helps 
NZ 
query 
op@mizer 
to 
generate 
efficient 
code 
29
30 
bnair@asquareb.com 
blog.asquareb.com 
https://github.com/bijugs 
@gsbiju

More Related Content

PDF
Netezza fundamentals-for-developers
PPT
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
PDF
Oracle to Netezza Migration Casestudy
PDF
Using R on Netezza
PPT
An Introduction to Netezza
PPTX
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
DOC
Course content (netezza dba)
PDF
Netezza fundamentals for developers
Netezza fundamentals-for-developers
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Oracle to Netezza Migration Casestudy
Using R on Netezza
An Introduction to Netezza
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Course content (netezza dba)
Netezza fundamentals for developers

What's hot (20)

PDF
Using Netezza Query Plan to Improve Performace
PDF
Parameter substitution in Aginity Workbench
PDF
Backup Options for IBM PureData for Analytics powered by Netezza
PPT
Netezza Online Training by www.etraining.guru in India
PDF
Netezza All labs
PDF
IBM Netezza
PDF
Netezza workload management
PPTX
IBM Pure Data System for Analytics (Netezza)
PDF
Netezza vs teradata
PDF
Netezza Deep Dives
PPTX
Netezza pure data
PPTX
Oracle 11g data warehouse introdution
DOC
Datastage parallell jobs vs datastage server jobs
PDF
Netezza Architecture and Administration
PPTX
PDF
PDF
Data warehousing labs maunal
PPTX
Oracle dba online training
PPTX
Tera data
PPTX
database backup and recovery
Using Netezza Query Plan to Improve Performace
Parameter substitution in Aginity Workbench
Backup Options for IBM PureData for Analytics powered by Netezza
Netezza Online Training by www.etraining.guru in India
Netezza All labs
IBM Netezza
Netezza workload management
IBM Pure Data System for Analytics (Netezza)
Netezza vs teradata
Netezza Deep Dives
Netezza pure data
Oracle 11g data warehouse introdution
Datastage parallell jobs vs datastage server jobs
Netezza Architecture and Administration
Data warehousing labs maunal
Oracle dba online training
Tera data
database backup and recovery
Ad

Viewers also liked (7)

PDF
Websphere MQ (MQSeries) fundamentals
PDF
HDFS User Reference
PDF
Row or Columnar Database
PDF
Concurrency
PDF
Project Risk Management
PPTX
Apache HBase Performance Tuning
PDF
HBase Application Performance Improvement
Websphere MQ (MQSeries) fundamentals
HDFS User Reference
Row or Columnar Database
Concurrency
Project Risk Management
Apache HBase Performance Tuning
HBase Application Performance Improvement
Ad

Similar to NENUG Apr14 Talk - data modeling for netezza (20)

PDF
Apache Cassandra at Macys
PPTX
MapReduce.pptx
PDF
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
PPTX
MongoDB for Time Series Data: Sharding
PDF
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
PPT
Performance Tuning And Optimization Microsoft SQL Database
PDF
2017 AWS DB Day | Amazon Redshift 소개 및 실습
PDF
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
PPTX
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
PDF
Understanding and building big data Architectures - NoSQL
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
PDF
Performance tuning ColumnStore
PPTX
Enar short course
PDF
ENAR short course
PDF
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
PPTX
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
PPT
Informix partitioning interval_rolling_window_table
PPTX
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
PDF
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
PDF
[Www.pkbulk.blogspot.com]dbms13
Apache Cassandra at Macys
MapReduce.pptx
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
MongoDB for Time Series Data: Sharding
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Performance Tuning And Optimization Microsoft SQL Database
2017 AWS DB Day | Amazon Redshift 소개 및 실습
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Understanding and building big data Architectures - NoSQL
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Performance tuning ColumnStore
Enar short course
ENAR short course
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Informix partitioning interval_rolling_window_table
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
[Www.pkbulk.blogspot.com]dbms13

More from Biju Nair (8)

PDF
Chef conf-2015-chef-patterns-at-bloomberg-scale
PDF
HBase Internals And Operations
PDF
Apache Kafka Reference
PDF
Serving queries at low latency using HBase
PDF
Multi-Tenant HBase Cluster - HBaseCon2018-final
PDF
Cursor Implementation in Apache Phoenix
PDF
Hadoop security
PDF
Chef patterns
Chef conf-2015-chef-patterns-at-bloomberg-scale
HBase Internals And Operations
Apache Kafka Reference
Serving queries at low latency using HBase
Multi-Tenant HBase Cluster - HBaseCon2018-final
Cursor Implementation in Apache Phoenix
Hadoop security
Chef patterns

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Monthly Chronicles - July 2025
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
A Presentation on Artificial Intelligence
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Network Security Unit 5.pdf for BCA BBA.

NENUG Apr14 Talk - data modeling for netezza

  • 1. Data Modeling and Netezza Biju Nair NENUG Talk 30-­‐Apr-­‐2014
  • 2. Mo@va@on and Goal • Mo@va@on – Performance degrada@on with data volume • Goal – Highlight considera@ons while modeling for NZ 2
  • 3. Data Modeling • Logical Data Modeling – Business domain data representa@on – Independent of DBMS technology – NZ is an appliance for analy@cs • Set based processing vs row based processing • De-­‐normaliza@on • Snow flake/Star schema • Physical Data Modeling – Takes in the DBMS features and constraints – Need to understand the DBMS architecture 3
  • 4. NZ Architecture Host Snippet Processors Snippet Processors Snippet Processors Data Data Data -­‐ Parse, Op+mize and Compile query -­‐ Schedule snippets -­‐ Distribute data Executes snippets -­‐ Shared nothing MPP -­‐ Custom IP backbone -­‐ Appliance efficiency is maximized when -­‐ snippet processors can run independently i.e. data independence -­‐ All snippet processors are u@lized uniformly 4
  • 5. Snippet Processors Data Accelerator (FPGA) CPU Compute Host Reads compressed data Un-­‐compress Remove columns Restrict rows (where) Perform computa+on Send data to host -­‐ Disk reads are incredibly slow rela@ve to other components especially seek @me -­‐ While the CPU overhead is reduced, volume of data read will impact performance 5
  • 6. Data Storage Host Snippet Processors Data Extend Page Extend Extend … Meta-­‐Data -­‐ Meta data iden@fies extends/pages to read or skip 6
  • 7. Modeling Priori@es • U@liza@on of all snippet processors – Need to be able to u@lize uniformly • Maximize MPP capability of snippet processors – Ideally snippet processors should be independent • Minimize data read from disk – Minimize data stored • Improve computa@on in snippet processor – Compounded with data volume will help performance 7
  • 8. Snippet Processor U@liza@on Data Distribu+on Host Snippet Processors Snippet Processors Snippet Processors 1,MA,1212,… 3,MA,0414,… 2,CA,0113,… 1,MA,1212,… 2,CA,0113,… 3,MA,0414,… Distribute by state Data Skew -­‐ NZ will pick one of the columns to distribute if none specified in table defini@on -­‐ First column in the table 8
  • 9. Snippet Processor U@liza@on Data Distribu+on Host 1,MA,1212,… 2,CA,0113,… 3,MA,0414,… Snippet Processors Snippet Processors Distribute by mo-­‐yr Snippet Processors 1,MA,1212,… 2,CA,0113,… 3,MA,0414,… -­‐ Snippet processors are u@lized uniformly -­‐ What if most of the query is on for the current month? -­‐ Processing skew 9
  • 10. Snippet Processor U@liza@on Data Distribu+on Host Snippet Processors Snippet Processors Snippet Processors 1,MA,1212,… 4,CA,0414,… 2,CA,0113,… 3,MA,0414,… 1,MA,1212,… 2,CA,0113,… 3,MA,0414,… 4,CA,0414,… Distribute random -­‐ Snippet processors are u@lized uniformly -­‐ Helps prevent processing skew 10
  • 11. Snippet Processor U@liza@on Data Distribu+on and Table Joins 3,ORD1,ITEM1,… 2,ORD1,ITEM1,… 4,ORD1,ITEM1,… 1,ORD1,ITEM1,… Host Snippet Processors Snippet Processors Snippet Processors 4,CA,0414,… 1,MA,1212,… 3,MA,0414,… 2,CA,0113,… 1,ORD1,ITEM1,… 2,ORD1,ITEM1,… 3,ORD1,ITEM1,… 4,ORD1,ITEM1,… Distribute random Need to redistribute data from both tables -­‐ Snippet processors are u@lized uniformly -­‐ Makes snippet processors dependent on others impac@ng MPP maximiza@on 11
  • 12. Snippet Processor U@liza@on Data Distribu+on and Table Joins 2,ORD1,ITEM1,… 3,ORD1,ITEM1,… 1,ORD1,ITEM1,… 4,ORD1,ITEM1,… Host Snippet Processors Snippet Processors Snippet Processors 4,CA,0414,… 1,MA,1212,… 3,MA,0414,… 2,CA,0113,… 1,ORD1,ITEM1,… 2,ORD1,ITEM1,… 3,ORD1,ITEM1,… 4,ORD1,ITEM1,… Distribute on Join column -­‐ cid -­‐ Snippet processors are u@lized uniformly -­‐ Makes snippet processors dependent on others impac@ng MPP maximiza@on -­‐ Becer than the previous scenario Need to redistribute data from one table 12
  • 13. Snippet Processor U@liza@on Data Distribu+on and Table Joins 2,ORD1,ITEM1,… Distribute both tables on Join column -­‐ cid 3,ORD1,ITEM1,… 1,ORD1,ITEM1,… 4,ORD1,ITEM1,… Host Snippet Processors Snippet Processors Snippet Processors 1,MA,1212,… 4,CA,0414,… 2,CA,0113,… 3,MA,0414,… -­‐ Snippet processors are u@lized uniformly -­‐ Makes snippet processors independent maximizing MPP 13
  • 14. Snippet Processor U@liza@on Data Distribu+on • Iden@fy keys to distribute data uniformly – Avoid data and processing skew – Try using join columns as the distribu@on keys – Choose same data types for join columns • If table size is small random distribu@on is fine – If one of the join table is small, NZ will broadcast • Redistribu@on may not be an overkill for small data – For e.g., selec@ng a small number if columns 14
  • 15. Snippet Processor U@liza@on Distribu+on and Query Time 3.5 3 2.5 2 1.5 1 0.5 0 Query Time For Different Distribu+ons Random 1 Correct Distribu@on 2 Correct Distribu@on Time (min) 15
  • 16. Snippet Processor U@liza@on Join Column Type and Query Time 2.5 2 1.5 1 0.5 0 Join query +me -­‐ same and diff data types Incorrect Data Types Correct Data Types Time (min) 16
  • 17. Minimize Data Read From Disk Zone Maps • Data types which supports Zone Maps – All integer data types • int1 • int2 • int4 • int8 – Date – Timestamp 17 Refer to the product manual for the version of NZ used for the complete list of zone map able data types
  • 18. Minimize Data Read From Disk Table column (cid) is numeric(10,0) Host Snippet Processors Data Extend 1 Page Extend 2 … Extend 3 Zone Maps Meta-­‐Data -­‐ May end up reading all data from disk No zone map for cid 18
  • 19. Minimize Data Read From Disk Table column (cid) is bigint Host Snippet Processors Data Extend Page Extend Extend … Zone Maps Meta-­‐Data -­‐ Zone maps can be used to minimize data read from disk 19
  • 20. Minimize Data Read From Disk Zone Maps and Query Time 1.6 1.55 1.5 1.45 1.4 1.35 1.3 Query +me with and without zone map Incorrect Data Type Correct Data Type Time (min) 20
  • 21. Minimize Data Read From Disk Clustered Base Tables • NZ stores data with same organize keys closely • Addi@onal data types are zone map able – char – varchar – nchar – nvarchar – float – double – bool – @me – Interval • Helps improve performance of mul@ table join 21
  • 22. Minimize Data Read From Disk Extend Extend Extend Clustered Base Tables Table distributed on cid 1,MA,Boston,… 5,CA,LA,… 3,FL,Tampa,,… 1,MA,Salem,… 5,CA,SF,… 3,FL,Orlando,… 1,MA,Lowell,… 5,CA,Pasadena, … 3,FL,Miami,… Table distributed on cid organize on state Extend 1,MA,Boston,… 1,MA,Salem,… 1,MA,Lowell,… Extend 3,FL,Tampa,,… 3,FL,Orlando,… 3,FL,Miami,… Extend 5,CA,LA,… 5,CA,SF,… 5,CA,Pasadena, … State Extend 1,AL,Alabama,… 5,CA,California, … 10,FL,Florida,… 22,MA,Mass,… 22
  • 23. Minimize Data Read From Disk Clustered Base Table 2.5 2 1.5 1 0.5 0 Query Time with and without org No Org+Correct Dist Org+Correct Dist Time (min) 23
  • 24. Minimize Data Read From Disk Materialized Views • View with frequently used columns of base table – Unlike views, materialized view stores data – Reduced data read from disk – Addi@onal storage required – Need to be refreshed if base table data changes • Can be used as an index against base table – Stores loca@on of base table data loc in a column 24
  • 25. Minimize Data Read From Disk Extend Extend Extend Materialized Views T_EMP distributed on cid 1,MA,Mike,… 5,CA,Fally,… 3,FL,Chris,… 4,MA,Robert,… 7,CA,Mary,… 2,FL,Jus+n,… 6,MA,Harini,… 8,CA,Mike,… 9,FL,Martha,… SELECT ID, NAME FROM T_EMP; MV on T_EMP order by state, name Extend 5,CA,Fally 7,CA,Mary 8,CA,Mike 3,FL,Chris 2,FL,Jus+n 9,FL,Martha 6,MA,Harini 1,MA,Mike 4,MA,Robert SELECT * FROM T_EMP WHERE STATE = ‘CA’ 25
  • 26. Minimize Data Read From Disk Materialized Views 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Query +me on single table -­‐MV vs No MV No MV With MV Time (min) 26
  • 27. Minimize Data Read From Disk Materialized Views 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Join Query Time -­‐ With and without MV No MV With MV Time (min) 27
  • 28. Minimize Data Read From Disk Minimize Data Stored • Choose storage efficient data types – Difference between bigint and int is 4 bytes • Use char instead of varchar if the data length is fixed – varchar has a 2 byte overhead • Define columns as “not null” where possible • Store only the required data in table columns • Encode duplicate data stored in rows 28
  • 29. Improve Computa@on in Snippet Processor NZ Object Defini+ons • Define columns as “not null” where possible – Removes logic to check nulls • Define table keys and rela@onships – Helps NZ query op@mizer to generate efficient code 29
  • 30. 30 bnair@asquareb.com blog.asquareb.com https://github.com/bijugs @gsbiju