SlideShare a Scribd company logo
Vectorwise
Implementation best practices


Mark Van de Wiel
Director Product Management, Vectorwise

Thursday, November 01, 2012



1 of 9 1 of 9
Confidential © 2012 Actian Corporation
Agenda

 Hardware
 Operating system
 Database configuration
 Database design
 Data loading
 High availability
 Monitoring




            Confidential © 2012 Actian Corporation   2
100x (+) Performance Difference – 2003
Custom C versus Relational Database
                                           TPC-H 1 GB query 1
                                             (runtime in s)
30                                28.1
     26.2
25
20                                                                           MySQL
15                                                                           DBMS 'X'
                                                                             C program
10
                                                                             Vectorwise
 5
                                                     0.2           0.6
 0
     MySQL                    DBMS 'X'            C program     Vectorwise



        Confidential © 2012 Actian Corporation                                    3
Some Numbers

 Traditional RDBMS: <200 MB/s per core
  Even these use MPP to I/O challenges

 Vectorwise (lab environment): >1.5 GB/s per core
  Maximum throughput requirement
  is extremely high
  Realistically (cost-effectively) only
  RAM can serve data quick enough




             Confidential © 2012 Actian Corporation   4
What Hardware to Use

 CPU
 Memory
 Storage I/O and capacity




        Requirements                               Budget


          Confidential © 2012 Actian Corporation            5
Hardware Considerations – MEMORY

 Ideally frequently-accessed data should fit in memory
  May be all data
  May be a small portion of the data
  Note: data is compressed in memory buffer
   •   3x – 5x compression ratios are common

 Query execution should all take place in memory
  Operations against larger data sets require more memory
  Consider query concurrency
  “Spill to disk” is supported but should be a last resort




             Confidential © 2012 Actian Corporation          6
Hardware Recommendation

 CPUs
  Use CPUs with higher clock rate for better raw throughput
  Use more cores for higher throughput
  Higher power CPUs are faster
 Memory
  At least 8 GB per core (more is always better)
 Storage
  Use as many drives as possible
  Ensure sufficient capacity
  Use the fastest drives available
   •   SAS over SATA, ideally 15k RPM
   •   SSDs are often not cost-effective relative to more memory




              Confidential © 2012 Actian Corporation               7
Examples

Small configuration (1 TB)
  Dell R620
  Lenovo RD430
Medium configuration (single digit TBs)
  Dell R720
  HP DL380
  IBM x3650
  Lenovo RD630
High-end configuration
  Dell R910
  HP DL580 or DL980
  IBM x3750




              Confidential © 2012 Actian Corporation   8
Operating System Considerations


                                                 64-bit




    Redhat                                                 Windows 7 (or higher)
     SuSE            xfs, ext3, ext4                      Windows 2008 (or higher)
    Ubuntu


        Confidential © 2012 Actian Corporation                                       9
Database Configuration

Installation defaults are generally good
 May want to adjust column buffer size (default 25% of RAM)
 May want to adjust processing memory (default 50% of RAM)




          Confidential © 2012 Actian Corporation              10
Database Design

 Schema – no particular preference
  Single demormalized table, star schema, snowflake schema, 3rd normal form

 Constraints
  Only on empty tables today… (to be addressed in Vectorwise 3.0)
  Consider data loading order and impact

 Indexes
  Note: clustered index-only today (“index-organized table”)
  One per table
  Consider incremental load




             Confidential © 2012 Actian Corporation                           11
Data Loading

Initial load
  File-based bulk load through vwload or copy
   Conversion into UTF8

  Use tools
   Pentaho
   Informatica
   Talend
   HVR
   Attunity




               Confidential © 2012 Actian Corporation   12
Data Loading

Incremental load
 INSERT, UPDATE and/or DELETE
 Append if possible
 Batch if possible
 Use COMBINE
 Positional Delta Trees
  Memory considerations
  Propagation to disk

 Use tools




             Confidential © 2012 Actian Corporation   13
Moving Window of Data

Considerations
 COMBINE on a large table can be expensive
  Mostly relevant for updates and deletes

 Alternative: manual partitioning
  One table per period
  Single view across all tables




             Confidential © 2012 Actian Corporation   14
High Availability

 Hardware and OS best practices
  UPS, RAID

 Vectorwise backup
  Only read-only, full backup
  Consider periodic full backup and file incremental loads

 Disaster recovery
  Dual load
  Active/active possibility




              Confidential © 2012 Actian Corporation         15
Monitoring

 OS monitoring
  CPU, memory utilization, I/O statistics

 vwinfo data
 Actian Director
 DBA tools




             Confidential © 2012 Actian Corporation   16
Agenda

 Hardware
 Operating system
 Database configuration
 Database design
 Data loading
 High availability
 Monitoring



More information in the Vectorwise Developer Guide:
 http://www.actian.com/images/white_papers/vw_developers_v2.5.pdf


            Confidential © 2012 Actian Corporation            17
Confidential © 2012 Actian Corporation

More Related Content

PDF
A14 Getting Started with Vectorwise by Mark Van de Wiel
PDF
Novinky v NetBackup 7.7
PDF
TECHNICAL WHITE PAPER▸ NetBackup 7.6 Plugin for VMware vCenter
PDF
Presentazione SimpliVity @ VMUGIT UserCon 2015
PDF
ds-3164-en
PDF
CloudByte Technology Whitepaper
PPTX
Storage Virtualization Challenges
PPTX
Multiple instances consolidation practices
A14 Getting Started with Vectorwise by Mark Van de Wiel
Novinky v NetBackup 7.7
TECHNICAL WHITE PAPER▸ NetBackup 7.6 Plugin for VMware vCenter
Presentazione SimpliVity @ VMUGIT UserCon 2015
ds-3164-en
CloudByte Technology Whitepaper
Storage Virtualization Challenges
Multiple instances consolidation practices

What's hot (20)

PDF
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
PPTX
NetBackup Appliance Family presentation
PDF
EV9 & NBU5000
PDF
DELL STORAGE REPLICATION aCelera and WAN Series Solution Brief
PDF
MySQL Enterprise Backup - BnR Scenarios
PDF
Blue Medora - VMware vROps Management Pack for NetApp Storage Overview
PPTX
Netbackup training-course-navi-mumbai-netbackup-course-provider-navi-mumbai
PPTX
02 Dell Blade Server Day 1
PPTX
DBaaS - The Next generation of database infrastructure
PDF
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
PDF
MySQL enterprise backup overview
PDF
DATASHEET▶ Enterprise Cloud Backup & Recovery with Symantec NetBackup
PDF
Symantec Backup Exec 2010 and NetBackup 7
PDF
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
ODP
CloudOpt intro
PDF
Oracle Cloud Infrastructure – Compute
PDF
MongoDB Sharding
PDF
Data Domain Architecture
PDF
Hyperconvergence FAQ's
PDF
CloudByte_CureForNoisyNeighbors
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
NetBackup Appliance Family presentation
EV9 & NBU5000
DELL STORAGE REPLICATION aCelera and WAN Series Solution Brief
MySQL Enterprise Backup - BnR Scenarios
Blue Medora - VMware vROps Management Pack for NetApp Storage Overview
Netbackup training-course-navi-mumbai-netbackup-course-provider-navi-mumbai
02 Dell Blade Server Day 1
DBaaS - The Next generation of database infrastructure
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MySQL enterprise backup overview
DATASHEET▶ Enterprise Cloud Backup & Recovery with Symantec NetBackup
Symantec Backup Exec 2010 and NetBackup 7
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
CloudOpt intro
Oracle Cloud Infrastructure – Compute
MongoDB Sharding
Data Domain Architecture
Hyperconvergence FAQ's
CloudByte_CureForNoisyNeighbors
Ad

Similar to A27 Vectorwise Performance Considerations_implementation_best_practices (20)

PDF
Oracle Database 12c Multitenant for Consolidation
PPTX
Emc sql server 2012 overview
PDF
Greenplum feature
PDF
Open world exadata_top_10_lessons_learned
PDF
The Oracle RAC Family of Solutions - Presentation
PPTX
Hadoop Technical Presentation
PPTX
The Best Storage For V Mware Environments Customer Presentation Jul201
PDF
DB2 pureScale Overview Sept 2010
PDF
C4 delivering database as a service within your organization
PPTX
8392-exadatamaa-1887964.pptx
PPTX
SQL PASS Taiwan 七月份聚會-1
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
Hadoop and Voldemort @ LinkedIn
ODP
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
PPTX
apex-42-in-12c-1970039.pptx apex oracle
PPT
DB2 for z/O S Data Sharing
PPTX
Things learned from OpenWorld 2013
PPTX
Pro sphere customer technical
PDF
Improving Website Performance and Scalability with Memcached
PDF
VMworld 2013: Dell Solutions for VMware Virtual SAN
Oracle Database 12c Multitenant for Consolidation
Emc sql server 2012 overview
Greenplum feature
Open world exadata_top_10_lessons_learned
The Oracle RAC Family of Solutions - Presentation
Hadoop Technical Presentation
The Best Storage For V Mware Environments Customer Presentation Jul201
DB2 pureScale Overview Sept 2010
C4 delivering database as a service within your organization
8392-exadatamaa-1887964.pptx
SQL PASS Taiwan 七月份聚會-1
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
apex-42-in-12c-1970039.pptx apex oracle
DB2 for z/O S Data Sharing
Things learned from OpenWorld 2013
Pro sphere customer technical
Improving Website Performance and Scalability with Memcached
VMworld 2013: Dell Solutions for VMware Virtual SAN
Ad

More from Insight Technology, Inc. (20)

PDF
グラフデータベースは如何に自然言語を理解するか?
PDF
Docker and the Oracle Database
PDF
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
PDF
事例を通じて機械学習とは何かを説明する
PDF
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
PDF
MBAAで覚えるDBREの大事なおしごと
PDF
グラフデータベースは如何に自然言語を理解するか?
PDF
DBREから始めるデータベースプラットフォーム
PDF
SQL Server エンジニアのためのコンテナ入門
PDF
Lunch & Learn, AWS NoSQL Services
PDF
db tech showcase2019オープニングセッション @ 森田 俊哉
PDF
db tech showcase2019 オープニングセッション @ 石川 雅也
PDF
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
PPTX
難しいアプリケーション移行、手軽に試してみませんか?
PPTX
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
PPTX
そのデータベース、クラウドで使ってみませんか?
PPTX
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
PDF
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
PPTX
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
PPTX
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
グラフデータベースは如何に自然言語を理解するか?
Docker and the Oracle Database
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
事例を通じて機械学習とは何かを説明する
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
MBAAで覚えるDBREの大事なおしごと
グラフデータベースは如何に自然言語を理解するか?
DBREから始めるデータベースプラットフォーム
SQL Server エンジニアのためのコンテナ入門
Lunch & Learn, AWS NoSQL Services
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
難しいアプリケーション移行、手軽に試してみませんか?
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
そのデータベース、クラウドで使ってみませんか?
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
sap open course for s4hana steps from ECC to s4
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction

A27 Vectorwise Performance Considerations_implementation_best_practices

  • 1. Vectorwise Implementation best practices Mark Van de Wiel Director Product Management, Vectorwise Thursday, November 01, 2012 1 of 9 1 of 9 Confidential © 2012 Actian Corporation
  • 2. Agenda Hardware Operating system Database configuration Database design Data loading High availability Monitoring Confidential © 2012 Actian Corporation 2
  • 3. 100x (+) Performance Difference – 2003 Custom C versus Relational Database TPC-H 1 GB query 1 (runtime in s) 30 28.1 26.2 25 20 MySQL 15 DBMS 'X' C program 10 Vectorwise 5 0.2 0.6 0 MySQL DBMS 'X' C program Vectorwise Confidential © 2012 Actian Corporation 3
  • 4. Some Numbers Traditional RDBMS: <200 MB/s per core Even these use MPP to I/O challenges Vectorwise (lab environment): >1.5 GB/s per core Maximum throughput requirement is extremely high Realistically (cost-effectively) only RAM can serve data quick enough Confidential © 2012 Actian Corporation 4
  • 5. What Hardware to Use CPU Memory Storage I/O and capacity Requirements Budget Confidential © 2012 Actian Corporation 5
  • 6. Hardware Considerations – MEMORY Ideally frequently-accessed data should fit in memory May be all data May be a small portion of the data Note: data is compressed in memory buffer • 3x – 5x compression ratios are common Query execution should all take place in memory Operations against larger data sets require more memory Consider query concurrency “Spill to disk” is supported but should be a last resort Confidential © 2012 Actian Corporation 6
  • 7. Hardware Recommendation CPUs Use CPUs with higher clock rate for better raw throughput Use more cores for higher throughput Higher power CPUs are faster Memory At least 8 GB per core (more is always better) Storage Use as many drives as possible Ensure sufficient capacity Use the fastest drives available • SAS over SATA, ideally 15k RPM • SSDs are often not cost-effective relative to more memory Confidential © 2012 Actian Corporation 7
  • 8. Examples Small configuration (1 TB) Dell R620 Lenovo RD430 Medium configuration (single digit TBs) Dell R720 HP DL380 IBM x3650 Lenovo RD630 High-end configuration Dell R910 HP DL580 or DL980 IBM x3750 Confidential © 2012 Actian Corporation 8
  • 9. Operating System Considerations 64-bit Redhat Windows 7 (or higher) SuSE xfs, ext3, ext4 Windows 2008 (or higher) Ubuntu Confidential © 2012 Actian Corporation 9
  • 10. Database Configuration Installation defaults are generally good May want to adjust column buffer size (default 25% of RAM) May want to adjust processing memory (default 50% of RAM) Confidential © 2012 Actian Corporation 10
  • 11. Database Design Schema – no particular preference Single demormalized table, star schema, snowflake schema, 3rd normal form Constraints Only on empty tables today… (to be addressed in Vectorwise 3.0) Consider data loading order and impact Indexes Note: clustered index-only today (“index-organized table”) One per table Consider incremental load Confidential © 2012 Actian Corporation 11
  • 12. Data Loading Initial load File-based bulk load through vwload or copy Conversion into UTF8 Use tools Pentaho Informatica Talend HVR Attunity Confidential © 2012 Actian Corporation 12
  • 13. Data Loading Incremental load INSERT, UPDATE and/or DELETE Append if possible Batch if possible Use COMBINE Positional Delta Trees Memory considerations Propagation to disk Use tools Confidential © 2012 Actian Corporation 13
  • 14. Moving Window of Data Considerations COMBINE on a large table can be expensive Mostly relevant for updates and deletes Alternative: manual partitioning One table per period Single view across all tables Confidential © 2012 Actian Corporation 14
  • 15. High Availability Hardware and OS best practices UPS, RAID Vectorwise backup Only read-only, full backup Consider periodic full backup and file incremental loads Disaster recovery Dual load Active/active possibility Confidential © 2012 Actian Corporation 15
  • 16. Monitoring OS monitoring CPU, memory utilization, I/O statistics vwinfo data Actian Director DBA tools Confidential © 2012 Actian Corporation 16
  • 17. Agenda Hardware Operating system Database configuration Database design Data loading High availability Monitoring More information in the Vectorwise Developer Guide: http://www.actian.com/images/white_papers/vw_developers_v2.5.pdf Confidential © 2012 Actian Corporation 17
  • 18. Confidential © 2012 Actian Corporation