HPE Vertica_7.0.x Administrators Guide

Administrator's Guide
HP Vertica Analytic Database
Software Version: 7.0.x
Document Release Date: 2/20/2015

Legal Notices
Warranty
The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be
construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
The information contained herein is subject to change without notice.
Restricted Rights Legend
Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer
Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial
license.
Copyright Notice
© Copyright 2006 - 2014 Hewlett-Packard Development Company, L.P.
Trademark Notices
Adobe® is a trademark of Adobe Systems Incorporated.
Microsoft® and Windows® are U.S. registered trademarks of Microsoft Corporation.
UNIX® is a registered trademark of The Open Group.
HP Vertica Analytic Database (7.0.x) Page 2 of 997

Contents
Contents 3
Administration Overview 51
Managing Licenses 52
Copying Enterprise, Evaluation, and Flex Zone License Files 52
Obtaining a License Key File 52
Understanding HP Vertica Licenses 52
License Types 53
Installing or Upgrading a License Key 54
New HP Vertica License Installations 54
HP Vertica License Renewals or Upgrades 54
Uploading or Upgrading a License Key Using Administration Tools 55
Uploading or Upgrading a License Key Using Management Console 55
Flex Table License Installations 56
Installing a Flex Table license using vsql 56
Installing a Flex Table license using Management Console 56
Viewing Your License Status 56
Examining Your License Key 57
Viewing Your License Status 57
Viewing Your License Status Through MC 58
Calculating the Database Size 58
How HP Vertica Estimates Raw Data Size 58
Excluding Data From Raw Data Size Estimate 58
Evaluating Data Type Footprint Size 59
Using AUDIT to Estimate Database Size 59
Monitoring Database Size for License Compliance 60
Viewing Your License Compliance Status 60
Manually Auditing Columnar Data Usage 60
Manually Auditing Flex Table Data Usage 61
Targeted Auditing 61

Using Management Console to Monitor License Compliance 62
Managing License Warnings and Limits 62
Term License Warnings and Expiration 62
Data Size License Warnings and Remedies 62
If Your HP Vertica Enterprise Edition Database Size Exceeds Your Licensed Limits 63
If Your HP VerticaCommunity Edition Database Size Exceeds Your Licensed Limits 63
Configuring the Database 65
Configuration Procedure 66
IMPORTANT NOTES 66
Prepare Disk Storage Locations 67
Specifying Disk Storage Location During Installation 67
To Specify the Disk Storage Location When You install: 68
Notes 68
Specifying Disk Storage Location During Database Creation 68
Notes 69
Specifying Disk Storage Location on MC 69
Configuring Disk Usage to Optimize Performance 69
Using Shared Storage With HP Vertica 70
Viewing Database Storage Information 70
Disk Space Requirements for HP Vertica 70
Disk Space Requirements for Management Console 70
Prepare the Logical Schema Script 70
Prepare Data Files 71
How to Name Data Files 71
Prepare Load Scripts 71
Create an Optional Sample Query Script 72
Create an Empty Database 73
Creating a Database Name and Password 73
Database Passwords 73
Create an Empty Database Using MC 75
How to Create an Empty Database on an MC-managed Cluster 75
Contents

Notes 76
Create a Database Using Administration Tools 77
Create the Logical Schema 78
Perform a Partial Data Load 79
Test the Database 79
Optimize Query Performance 80
Complete the Data Load 80
Test the Optimized Database 80
Set Up Incremental (Trickle) Loads 81
Implement Locales for International Data Sets 83
ICU Locale Support 83
Changing DB Locale for a Session 83
Specify the Default Locale for the Database 84
Override the Default Locale for a Session 85
Best Practices for Working with Locales 85
Server Locale 86
vsql Client 86
ODBC Clients 86
JDBC and ADO.NET Clients 87
Notes and Restrictions 87
Change Transaction Isolation Levels 89
Notes 90
Configuration Parameters 91
Configuring HP Vertica Settings Using MC 91
Configuring HP Vertica At the Command Line 93
General Parameters 93
Tuple Mover Parameters 96
Epoch Management Parameters 98
Monitoring Parameters 99
Profiling Parameters 101
Security Parameters 102
Contents

Database Designer Parameters 102
Internationalization Parameters 102
Data Collector Parameters 103
Kerberos Authentication Parameters 104
HCatalog Connector Parameters 105
Designing a Logical Schema 107
Using Multiple Schemas 108
Multiple Schema Examples 108
Using Multiple Private Schemas 108
Using Combinations of Private and Shared Schemas 110
Creating Schemas 110
Specifying Objects in Multiple Schemas 111
Setting Search Paths 111
Creating Objects That Span Multiple Schemas 113
Tables in Schemas 114
About Base Tables 114
Automatic Projection Creation 114
About Temporary Tables 115
Local Temporary Tables 116
Automatic Projection Creation and Characteristics 116
Implementing Views 118
Creating Views 118
Using Views 118
Notes 120
Creating a Database Design 121
What Is a Design? 121
How Database Designer Creates a Design 122
Who Can Run Database Designer 123
Granting and Enabling the DBDUSER Role 123
Allowing the DBDUSER to Run Database Designer Using Management
Console 124
Allowing the DBDUSER to Run Database Designer Programmatically 125
Contents

DBDUSER Capabilities and Limitations 126
Workflow for Running Database Designer 127
Specifying Parameters for Database Designer 129
Design Name 129
Design Types 129
Comprehensive Design 129
Incremental Design 130
Optimization Objectives 130
Design Tables with Sample Data 130
Design Queries 131
Query Repository 131
K-Safety for Design 131
Replicated and Unsegmented Projections 132
Replicated Projections 132
Unsegmented Projections 133
Statistics Analysis 133
Building a Design 133
Resetting a Design 134
Deploying a Design 136
Deploying Designs Using Database Designer 136
Deploying Designs Manually 137
How to Create a Design 137
Using Management Console to Create a Design 138
Using the Wizard to Create a Design 139
Creating a Design Manually 141
Using Administration Tools to Create a Design 144
Creating Custom Designs 146
The Design Process 146
Planning Your Design 147
Design Requirements 147
Determining the Number of Projections to Use 147
Contents

Designing for K-Safety 148
Requirements for a K-Safe Physical Schema Design 148
Requirements for a Physical Schema Design with No K-Safety 149
Designing Replicated Projections for K-Safety 149
Designing Segmented Projections for K-Safety 150
Segmenting Projections 150
Creating Buddy Projections 151
Designing for Segmentation 151
Design Fundamentals 152
Writing and Deploying Custom Projections 152
Anatomy of a Projection 152
Column List and Encoding 153
Base Query 153
Sort Order 153
Segmentation 154
Designing Superprojections 154
Minimizing Storage Requirements 154
Maximizing Query Performance 155
Projection Design for Merge Operations 155
Maximizing Projection Performance 157
Choosing Sort Order: Best Practices 157
Combine RLE and Sort Order 157
Maximize the Advantages of RLE 158
Put Lower Cardinality Column First for Functional Dependencies 158
Sort for Merge Joins 159
Sort on Columns in Important Queries 160
Sort Columns of Equal Cardinality By Size 160
Sort Foreign Key Columns First, From Low to High Distinct Cardinality 160
Prioritizing Column Access Speed 160
Projection Examples 162
New K-Safe=2 Database 162
Contents

Creating Segmented Projections Example 162
Creating Unsegmented Projections Example 164
Adding Node to a Database 164
Creating Segmented Projections Example 165
Creating Unsegmented Projections Example 166
Implementing Security 168
Client Authentication 168
Connection Encryption 168
Client Authorization 169
Implementing Client Authentication 170
Supported Client Authentication Types 170
If You Want Communication Layer Authentication 171
Password Authentication 172
About Password Creation and Modification 172
Default Password Authentication 172
Profiles 172
How You Create and Modify Profiles 173
Password Expiration 174
Account Locking 174
How to Unlock a Locked Account 174
Password Guidelines 175
What to Use 175
What to Avoid 176
About External Authentication 177
Setting up Your Environment to Create Authentication Records 177
About Local Password Authentication 178
How to Create Authentication Records 178
How to Create Authentication Records 178
If You Do Not Specify a Client Authentication Method 179
Authentication Record Format and Rules 179
Formatting Rules 182
Contents

Configuring LDAP Authentication 183
What You Need to Know to Configure LDAP Authentication 183
Prerequisites for LDAP Authentication 184
Terminology for LDAP Authentication 184
DBADMIN Authentication Access and LDAP 185
Bind vs. Bind and Search 185
LDAP Anonymous Binding 186
Using LDAP over SSL and TLS 186
LDAP Configuration Considerations 186
Workflow for Configuring LDAP Bind 187
Workflow for Configuring LDAP Bind and Search 188
Configuring Multiple LDAP Servers 189
Configuring Ident Authentication 190
ClientAuthentication Records for Ident Authentication 190
Installing and Configuring an Ident Server 191
Example Authentication Records 192
Using an IP Range and Trust Authentication Method 192
Using Multiple Authentication Records 192
Record Order 193
How to Modify Authentication Records 193
Using the Administration Tools 193
Using the ClientAuthentication Configuration Parameter 193
Examples 194
Implementing Kerberos Authentication 194
Kerberos Prerequisites 195
Configure HP Vertica for Kerberos Authentication 196
Point machines at the KDC and configure realms 200
Configure Clients for Kerberos Authentication 201
Configure ODBC and vsql Clients on Linux, HP-UX, AIX, MAC OSX, and
Solaris 202
Configure ADO.NET, ODBC, and vsql Clients on Windows 204
Contents

Windows KDC on Active Directory with Windows built-in Kerberos client
and HP Vertica 205
Linux KDC with Windows-built-in Kerberos client and HP Vertica 205
Configuring Windows clients for Kerberos authentication 205
Authenticate and connect clients 205
Configure JDBC Clients on All Platforms 206
Determining the Client Authentication Method 209
Troubleshooting Kerberos Authentication 209
Server's principal name doesn't match the host name 209
JDBC client authentication 211
Working Domain Name Service (DNS) 211
Clock synchronization 212
Encryption algorithm choices 212
Kerberos passwords 213
Using the ODBC Data Source Configuration utility 213
Implementing SSL 214
Certificate Authority 214
Public/private Keys 214
SSL Prerequisites 215
Prerequisites for SSL Server Authentication and SSL Encryption 215
Optional Prerequisites for SSL Server and Client Mutual Authentication 216
Generating SSL Certificates and Keys 216
Create a CA Private Key and Public Certificate 217
Creating the Server Private Key and Certificate 218
Create the Client Private Key and Certificate 219
Summary Illustration (Generating Certificates and Keys) 220
Set Server and Client Key and Certificate Permissions 220
JDBC Certificates 221
Summary Illustration (JDBC Certificates) 222
Generating Certificates and Keys for MC 222
Signed Certificates 223
Self-Signed Certificates 223
Contents

Importing a New Certificate to MC 224
To Import a New Certificate 224
Distributing Certificates and Keys 225
Configuring SSL 225
To Enable SSL: 225
Configuring SSL for ODBC Clients 226
SSLMode Parameter 226
SSLKeyFile Parameter 227
SSLCertFile Parameter 227
Configuring SSL for JDBC Clients 227
Setting Required Properties 227
Troubleshooting 227
Requiring SSL for Client Connections 228
Managing Users and Privileges 229
About Database Users 230
Types of Database Users 231
DBADMIN User 231
Object Owner 231
PUBLIC User 232
Creating a Database User 232
Notes 232
Example 232
Locking/unlocking a user's Database Access 233
Changing a user's Password 234
Changing a user's MC Password 234
About MC Users 235
Permission Group Types 235
MC User Types 235
Creating Users and Choosing an Authentication Method 236
Default MC Users 236
Creating an MC User 236
Contents

Prerequisites 237
Create a New MC-authenticated User 237
Create a New LDAP-authenticated User 238
How MC Validates New Users 239
Managing MC Users 239
Who Manages Users 239
What Kind of User Information You Can Manage 240
About User Names 240
About Database Privileges 241
Default Privileges for All Users 241
Default Privileges for MC Users 242
Privileges Required for Common Database Operations 242
Schemas 242
Tables 242
Views 244
Projections 245
External Procedures 245
Libraries 245
User-Defined Functions 246
Sequences 246
Resource Pools 247
Users/Profiles/Roles 248
Object Visibility 248
I/O Operations 249
Comments 251
Transactions 251
Sessions 252
Tuning Operations 252
Privileges That Can Be Granted on Objects 253
Database Privileges 253
Schema Privileges 254
Contents

Schema Privileges and the Search Path 254
Table Privileges 255
Projection Privileges 256
Explicit Projection Creation and Privileges 256
Implicit Projection Creation and Privileges 257
Selecting From Projections 257
Dropping Projections 257
View Privileges 257
Sequence Privileges 258
External Procedure Privileges 259
User-Defined Function Privileges 259
Library Privileges 260
Resource Pool Privileges 260
Storage Location Privileges 260
Role, profile, and User Privileges 261
Metadata Privileges 262
I/O Privileges 263
Comment Privileges 264
Transaction Privileges 264
Session Privileges 265
Tuning Privileges 265
Granting and Revoking Privileges 265
About Superuser Privileges 265
About Schema Owner Privileges 266
About Object Owner Privileges 266
How to Grant Privileges 267
How to Revoke Privileges 267
Privilege Ownership Chains 269
Modifying Privileges 271
Changing a Table Owner 271
Notes 271
Contents

Example 271
Table Reassignment with Sequences 273
Changing a Sequence Owner 274
Example 274
Viewing Privileges Granted on Objects 275
About Database Roles 278
Role Hierarchies 278
Creating and Using a Role 278
Roles on Management Console 279
Types of Database Roles 280
DBADMIN Role 280
View a List of Database Superusers 281
DBDUSER Role 281
PSEUDOSUPERUSER Role 282
PUBLIC Role 282
Example 283
Default Roles for Database Users 283
Notes 284
Using Database Roles 284
Role Hierarchy 285
Example 285
Creating Database Roles 287
Deleting Database Roles 287
Granting Privileges to Roles 287
Example 287
Revoking Privileges From Roles 288
Granting Access to Database Roles 288
Example 289
Revoking Access From Database Roles 290
Example 290
Granting Administrative Access to a Role 291
Contents

Example 291
Revoking Administrative Access From a Role 292
Example 292
Enabling Roles 292
Disabling Roles 293
Viewing Enabled and Available Roles 293
Viewing Named Roles 294
Viewing a User's Role 294
How to View a User's Role 294
Users 295
Roles 296
Grants 296
Viewing User Roles on Management Console 296
About MC Privileges and Roles 297
MC Permission Groups 297
MC's Configuration Privileges and Database Access 297
MC Configuration Privileges 298
MC Configuration Privileges By User Role 299
SUPER Role (mc) 300
ADMIN Role (mc) 301
About the MC Database Administrator Role 302
IT Role (mc) 303
About the MC IT (database) Role 303
NONE Role (mc) 303
MC Database Privileges 304
MC Database Privileges By Role 305
ADMIN Role (db) 306
About the ADMIN (MC configuration) Role 307
IT Role (db) 307
About the IT (MC configuration) Role 307
USER Role (db) 308
Contents

Granting Database Access to MC Users 308
Prerequisites 308
Grant a Database-Level Role to an MC user: 309
How MC Validates New Users 309
Mapping an MC User to a Database user's Privileges 310
How to Map an MC User to a Database User 310
What If You Map the Wrong Permissions 313
Adding Multiple MC Users to a Database 313
How to Find Out an MC user's Database Role 314
Adding Multiple Users to MC-managed Databases 315
Before You Start 315
How to Add Multiple Users to a Database 316
MC Mapping Matrix 316
Running the Administration Tools 319
First Time Only 320
Between Dialogs 320
Using the Administration Tools Interface 321
Enter [Return] 321
OK - Cancel - Help 321
Menu Dialogs 322
List Dialogs 322
Form Dialogs 322
Help Buttons 323
K-Safety Support in Administration Tools 323
Notes for Remote Terminal Users 324
Using the Administration Tools Help 325
In a Menu Dialog 325
In a Dialog Box 326
Scrolling 326
Password Authentication 326
Contents

Distributing Changes Made to the Administration Tools Metadata 327
Administration Tools and Management Console 327
Administration Tools Reference 330
Viewing Database Cluster State 330
Connecting to the Database 331
Starting the Database 332
Starting the Database Using MC 332
Starting the Database Using the Administration Tools 332
Starting the Database At the Command Line 333
Stopping a Database 333
Error 333
Description 334
Resolution 334
Controlling Sessions 336
Notes 337
Restarting HP Vertica on Host 338
Configuration Menu Item 339
Creating a Database 339
Dropping a Database 341
Notes 341
Viewing a Database 342
Setting the Restart Policy 342
Best Practice for Restoring Failed Hardware 343
Installing External Procedure Executable Files 344
Advanced Menu Options 345
Rolling Back Database to the Last Good Epoch 345
Important note: 345
Stopping HP Vertica on Host 346
Killing the HP Vertica Process on Host 347
Upgrading an Enterprise or Evaluation License Key 348
Managing Clusters 349
Contents

Using Cluster Management 349
Administration Tools Metadata 349
Writing Administration Tools Scripts 350
Syntax 350
Parameters 350
Tools 351
Using Management Console 361
Connecting to MC 361
Managing Client Connections on MC 362
Managing Database Clusters on MC 363
Create an Empty Database Using MC 364
How to Create an Empty Database on an MC-managed Cluster 364
Notes 365
Import an Existing Database Into MC 366
How to Import an Existing Database on the Cluster 366
Using MC on an AWS Cluster 367
Managing MC Settings 367
Modifying Database-Specific Settings 367
Changing MC or Agent Ports 368
If You Need to Change the MC Default Ports 368
How to Change the Agent Port 368
Change the Agent Port in config.py 368
Change the Agent Port on MC 369
How to Change the MC Port 369
Backing Up MC 369
Troubleshooting Management Console 371
What You Can diagnose: 371
Viewing the MC Log 371
Exporting the User Audit Log 372
To Manually Export MC User Activity 372
Contents

Restarting MC 373
How to Restart MC Through the MC Interface (using Your browser) 373
How to Restart MC At the Command Line 373
Starting over 374
Resetting MC to Pre-Configured State 374
Avoiding MC Self-Signed Certificate Expiration 374
Operating the Database 375
Starting and Stopping the Database 375
Starting the Database 375
Starting the Database Using MC 375
Starting the Database Using the Administration Tools 375
Starting the Database At the Command Line 376
Stopping the Database 376
Stopping a Running Database Using MC 377
Stopping a Running Database Using the Administration Tools 377
Stopping a Running Database At the Command Line 377
Working with the HP Vertica Index Tool 379
Syntax 379
Parameters 380
Permissions 380
Controlling Expression Analysis 380
Performance and CRC 380
Running the Reindex Option 381
Running the CheckCRC Option 382
Handling CheckCRC Errors 383
Running the Checksort Option 383
Viewing Details of Index Tool Results 384
Working with Tables 387
Creating Base Tables 387
Creating Tables Using the /*+direct*/ Clause 387
Automatic Projection Creation 388
Contents

Characteristics of Default Automatic Projections 389
Creating a Table Like Another 390
Epochs and Node Recovery 391
Storage Location and Policies for New Tables 391
Simple Example 391
Using CREATE TABLE LIKE 391
Creating Temporary Tables 393
Global Temporary Tables 393
Local Temporary Tables 393
Creating a Temp Table Using the /*+direct*/ Clause 394
Characteristics of Default Automatic Projections 395
Preserving GLOBAL Temporary Table Data for a Transaction or Session 396
Specifying Column Encoding 396
Creating External Tables 397
Required Permissions for External Tables 397
COPY Statement Definition 397
Developing User-Defined Load (UDL) Functions for External Tables 398
Examples 398
Validating External Tables 398
Limiting the Maximum Number of Exceptions 399
Working with External Tables 399
Managing Resources for External Tables 399
Backing Up and Restoring External Tables 399
Using Sequences and Identity Columns in External Tables 399
Viewing External Table Definitions 400
External Table DML Support 400
Using External Table Values 400
Using External Tables 402
Using CREATE EXTERNAL TABLE AS COPY Statement 402
Storing HP Vertica Data in External Tables 403
Using External Tables with User-Defined Load (UDL) Functions 403
Contents

Organizing External Table Data 403
Altering Table Definitions 403
External Table Restrictions 404
Exclusive ALTER TABLE Clauses 404
Using Consecutive ALTER TABLE Commands 405
Adding Table Columns 405
Updating Associated Table Views 405
Specifying Default Expressions 406
About Using Volatile Functions 406
Updating Associated Table Views 406
Altering Table Columns 406
Adding Columns with a Default Derived Expression 407
Add a Default Column Value Derived From Another Column 407
Add a Default Column Value Derived From a UDSF 409
Changing a column's Data Type 410
Examples 410
How to Perform an Illegitimate Column Conversion 411
Adding Constraints on Columns 413
Adding and Removing NOT NULL Constraints 413
Examples 413
Dropping a Table Column 414
Restrictions 414
Using CASCADE to Force a Drop 414
Examples 415
Moving a Table to Another Schema 416
Changing a Table Owner 416
Notes 417
Example 417
Table Reassignment with Sequences 419
Changing a Sequence Owner 419
Example 420
Contents

Renaming Tables 420
Using Rename to Swap Tables Within a Schema 421
Using Named Sequences 422
Types of Incrementing Value Objects 422
Using a Sequence with an Auto_Increment or Identity Column 423
Named Sequence Functions 423
Using DDL Commands and Functions With Named Sequences 424
Creating Sequences 424
Altering Sequences 426
Examples 426
Distributed Sequences 427
Loading Sequences 437
Creating and Instantiating a Sequence 437
Using a Sequence in an INSERT Command 437
Dropping Sequences 438
Example 438
Synchronizing Table Data with MERGE 438
Optimized Versus Non-Optimized MERGE 439
Troubleshooting the MERGE Statement 441
Dropping and Truncating Tables 442
Dropping Tables 442
Truncating Tables 442
About Constraints 445
Adding Constraints 446
Adding Column Constraints with CREATE TABLE 446
Adding Two Constraints on a Column 447
Adding a Foreign Key Constraint on a Column 447
Adding Multicolumn Constraints 448
Adding Constraints on Tables with Existing Data 449
Adding and Changing Constraints on Columns Using ALTER TABLE 449
Adding and Dropping NOT NULL Column Constraints 450
Contents

Enforcing Constraints 450
Primary Key Constraints 451
Foreign Key Constraints 451
Examples 452
Unique Constraints 453
Not NULL Constraints 454
Dropping Constraints 456
Notes 456
Enforcing Primary Key and Foreign Key Constraints 458
Enforcing Primary Key Constraints 458
Enforcing Foreign Key Constraints 458
Detecting Constraint Violations Before You Commit Data 458
Detecting Constraint Violations 459
Fixing Constraint Violations 464
Reenabling Error Reporting 467
Working with Table Partitions 468
Differences Between Partitioning and Segmentation 468
Partition Operations 468
Defining Partitions 469
Table 3: Partitioning Expression and Results 470
Partitioning By Year and Month 470
Restrictions on Partitioning Expressions 471
Best Practices for Partitioning 471
Dropping Partitions 471
Examples 472
Partitioning and Segmenting Data 473
Partitioning and Data Storage 475
Partitions and ROS Containers 475
Partition Pruning 475
Managing Partitions 475
Notes 477
Contents

Partitioning, Repartitioning, and Reorganizing Tables 477
Reorganizing Data After Partitioning 478
Monitoring Reorganization 478
Auto Partitioning 479
Examples 479
Eliminating Partitions 481
Making Past Partitions Eligible for Elimination 482
Verifying the ROS Merge 483
Examples 483
Moving Partitions 484
Archiving Steps 485
Preparing and Moving Partitions 485
Creating a Snapshot of the Intermediate Table 485
Copying the Config File to the Storage Location 486
Drop the Intermediate Table 486
Restoring Archived Partitions 486
Bulk Loading Data 489
Checking Data Format Before or After Loading 490
Converting Files Before Loading Data 491
Checking UTF-8 Compliance After Loading Data 491
Performing the Initial Database Load 491
Extracting Data From an Existing Database 492
Checking for Delimiter Characters in Load Data 492
Moving Data From an Existing Database to HP Vertica Nodes 493
Loading From a Local Hard Disk 493
Loading Over the Network 493
Loading From Windows 494
Using Load Scripts 494
Using Absolute Paths in a Load Script 494
Running a Load Script 494
Using COPY and COPY LOCAL 495
Contents

Copying Data From an HP Vertica Client 496
Transforming Data During Loads 496
Understanding Transformation Requirements 496
Loading FLOAT Values 497
Using Expressions in COPY Statements 497
Handling Expression Errors 497
Transformation Example 498
Deriving Table Columns From Data File Columns 498
Specifying COPY FROM Options 499
Loading From STDIN 500
Loading From a Specific Path 500
Loading BZIP and GZIP Files 500
Loading with Wildcards (glob) ON ANY NODE 500
Loading From a Local Client 501
Choosing a Load Method 501
Loading Directly into WOS (AUTO) 501
Loading Directly to ROS (DIRECT) 502
Loading Data Incrementally (TRICKLE) 502
Loading Data Without Committing Results (NO COMMIT) 502
Using NO COMMIT to Detect Constraint Violations 503
Using COPY Interactively 503
Canceling a COPY Statement 503
Specifying a COPY Parser 503
Specifying Load Metadata 504
Interpreting Last Column End of Row Values 505
Using a Single End of Row Definition 506
Using a Delimiter and Record Terminator End of Row Definition 506
Loading UTF-8 Format Data 507
Loading Special Characters As Literals 507
Using a Custom Column Separator (DELIMITER) 508
Using a Custom Column Option DELIMITER 508
Contents

Defining a Null Value (NULL) 509
Loading NULL Values 509
Filling Columns with Trailing Nulls (TRAILING NULLCOLS) 510
Attempting to Fill a NOT NULL Column with TRAILING NULLCOLS 511
Changing the Default Escape Character (ESCAPE AS) 512
Eliminating Escape Character Handling 512
Delimiting Characters (ENCLOSED BY) 512
Using ENCLOSED BY for a Single Column 513
Specifying a Custom End of Record String (RECORD TERMINATOR) 514
Examples 514
Loading Native Varchar Data 515
Loading Binary (Native) Data 515
Loading Hexadecimal, Octal, and Bitstring Data 516
Hexadecimal Data 517
Octal Data 517
BitString Data 518
Examples 518
Loading Fixed-Width Format Data 519
Supported Options for Fixed-Width Data Loads 519
Using Nulls in Fixed-Width Data 519
Defining a Null Character (Statement Level) 520
Defining a Custom Record Terminator 520
Copying Fixed-Width Data 521
Skipping Content in Fixed-Width Data 521
Trimming Characters in Fixed-Width Data Loads 522
Using Padding in Fixed-Width Data Loads 523
Ignoring Columns and Fields in the Load File 524
Using the FILLER Parameter 524
FILLER Parameter Examples 524
Loading Data into Pre-Join Projections 525
Foreign and Primary Key Constraints 525
Contents

Concurrent Loads into Pre-Join Projections 526
Using Parallel Load Streams 528
Monitoring COPY Loads and Metrics 528
Using HP Vertica Functions 528
Using the CURRENT_LOAD_SOURCE() Function 529
Using the LOAD_STREAMS System Table 529
Using the STREAM NAME Parameter 529
Other LOAD_STREAMS Columns for COPY Metrics 530
Capturing Load Rejections and Exceptions 531
Using COPY Parameters To Handle Rejections and Exceptions 531
Enforcing Truncating or Rejecting Rows (ENFORCELENGTH) 532
Specifying Maximum Rejections Before a Load Fails (REJECTMAX) 533
Aborting Data Loads for Any Error (ABORT ON ERROR) 533
Understanding Row Rejections and Rollback Errors 533
Saving Load Exceptions (EXCEPTIONS) 535
Saving Load Rejections (REJECTED DATA) 536
Saving Rejected Data to a Table 537
Rejection Records for Table Files 538
Querying a Rejection Records Table 538
Exporting the Rejected Records Table 540
COPY Rejected Data and Exception Files 541
Specifying Rejected Data and Exceptions Files 542
Saving Rejected Data and Exceptions Files to a Single Server 542
Using VSQL Variables for Rejected Data and Exceptions Files 543
COPY LOCAL Rejection and Exception Files 543
Specifying Rejected Data and Exceptions Files 544
Referential Integrity Load Violation 544
Trickle Loading Data 547
Using INSERT, UPDATE, and DELETE 547
WOS Overflow 547
Copying and Exporting Data 549
Contents

Moving Data Directly Between Databases 549
Creating SQL Scripts to Export Data 549
Exporting Data 550
Exporting Identity Columns 551
Examples of Exporting Data 551
Copying Data 552
Importing Identity Columns 553
Examples 553
Using Public and Private IP Networks 555
Identify the Public Network to HP Vertica 555
Identify the Database or Nodes Used for Import/Export 556
Using EXPORT Functions 557
Saving Scripts for Export Functions 557
Exporting the Catalog 558
Function Summary 558
Exporting All Catalog Objects 558
Projection Considerations 559
Exporting Database Designer Schema and Designs 559
Exporting Table Objects 559
Exporting Tables 560
Function Syntax 561
Exporting All Tables and Related Objects 561
Exporting a List Tables 561
Exporting a Single Table or Object 562
Exporting Objects 562
Function Syntax 563
Exporting All Objects 563
Exporting a List of Objects 564
Exporting a Single Object 565
Bulk Deleting and Purging Data 567
Choosing the Right Technique for Deleting Data 568
Contents

Best Practices for DELETE and UPDATE 569
Performance Considerations for DELETE and UPDATE Queries 569
Optimizing DELETEs and UPDATEs for Performance 570
Projection Column Requirements for Optimized Deletes 570
Optimized Deletes in Subqueries 570
Projection Sort Order for Optimizing Deletes 571
Purging Deleted Data 573
Setting a Purge Policy 573
Specifying the Time for Which Delete Data Is Saved 574
Specifying the Number of Epochs That Are Saved 574
Disabling Purge 575
Manually Purging Data 575
Managing the Database 577
Connection Load Balancing 577
Native Connection Load Balancing Overview 577
IPVS Overview 578
Choosing Whether to Use Native Connection Load Balancing or IPVS 578
About Native Connection Load Balancing 579
Load Balancing Schemes 580
Enabling and Disabling Native Connection Load Balancing 580
Resetting the Load Balancing State 581
Monitoring Native Connection Load Balancing 581
Determining to which Node a Client Has Connected 582
Connection Load Balancing Using IPVS 583
Configuring HP Vertica Nodes 585
Notes 585
Set Up the Loopback Interface 586
Disable Address Resolution Protocol (ARP) 587
Configuring the Directors 589
Install the HP Vertica IPVS Load Balancer Package 589
Before You Begin 589
Contents

If You Are Using Red Hat Enterprise Linux 5.x: 589
If You Are Using Red Hat Enterprise Linux 6.x: 590
Configure the HP Vertica IPVS Load Balancer 590
Public and Private IPs 591
Set up the HP Vertica IPVS Load Balancer Configuration File 592
Connecting to the Virtual IP (VIP) 593
Monitoring Shared Node Connections 594
Determining Where Connections Are Going 595
Virtual IP Connection Problems 597
Issue 597
Resolution 597
Expected E-mail Messages From the Keepalived Daemon 597
Troubleshooting Keepalived Issues 598
Managing Nodes 600
Stop HP Vertica on a Node 600
Restart HP Vertica on a Node 601
Restarting HP Vertica on a node 601
Fault Groups 601
About the Fault Group Script 602
Creating Fault Groups 604
Modifying Fault Groups 605
How to modify a fault group 606
Dropping Fault Groups 607
How to drop a fault group 607
How to remove all fault groups 607
How to add nodes back to a fault group 608
Monitoring Fault Groups 608
Monitoring fault groups through system tables 608
Monitoring fault groups through Management Console 608
Large Cluster 609
Control nodes on large clusters 610
Contents

Control nodes on small clusters 610
Planning a Large Cluster 610
Installing a Large Cluster 611
If you want to install a new large cluster 611
Sample rack-based cluster hosts topology 612
If you want to expand an existing cluster 614
Defining and Realigning Control Nodes on an Existing Cluster 614
Rebalancing Large Clusters 615
How to rebalance the cluster 616
How long will rebalance take? 616
Expanding the Database to a Large Cluster 616
Monitoring Large Clusters 617
Large Cluster Best Practices 617
Planning the number of control nodes 618
Allocate standby nodes 619
Plan for cluster growth 619
Write custom fault groups 619
Use segmented projections 619
Use the Database Designer 619
Elastic Cluster 620
The Elastic Cluster Scaling Factor 620
Enabling and Disabling Elastic Cluster 621
Scaling Factor Defaults 621
Viewing Scaling Factor Settings 621
Setting the Scaling Factor 622
Local Data Segmentation 622
Enabling and Disabling Local Segmentation 623
Elastic Cluster Best Practices 623
When to Enable Local Data Segmentation 624
Upgraded Database Consideration 624
Monitoring Elastic Cluster Rebalancing 624
Contents

Historical Rebalance Information 625
Adding Nodes 626
Adding Hosts to a Cluster 627
Prerequisites and Restrictions 627
Procedure to Add Hosts 627
Examples: 628
Adding Nodes to a Database 629
To Add Nodes to a Database Using MC 629
To Add Nodes to a Database Using the Administration Tools: 629
Removing Nodes 631
Lowering the K-Safety Level to Allow for Node Removal 631
Removing Nodes From a Database 631
Prerequisites 632
Remove Unused Hosts From the Database Using MC 632
Remove Unused Hosts From the Database Using the Administration Tools 632
Removing Hosts From a Cluster 633
Prerequisites 633
Procedure to Remove Hosts 633
Replacing Nodes 635
Prerequisites 635
Best Practice for Restoring Failed Hardware 635
Replacing a Node Using the Same Name and IP Address 636
Replacing a Failed Node Using a node with Different IP Address 637
Replacing a Functioning Node Using a Different Name and IP Address 638
Using the Administration Tools to Replace Nodes 638
Replace the Original Host with a New Host Using the Administration Tools 638
Using the Management Console to Replace Nodes 639
Rebalancing Data Across Nodes 641
K-safety and Rebalancing 641
Rebalancing Failure and Projections 641
Permissions 642
Contents

Rebalancing Data Using the Administration Tools UI 642
Rebalancing Data Using Management Console 643
Rebalancing Data Using SQL Functions 643
Redistributing Configuration Files to Nodes 643
Stopping and Starting Nodes on MC 644
Managing Disk Space 645
Monitoring Disk Space Usage 645
Adding Disk Space to a Node 645
Replacing Failed Disks 647
Catalog and Data Files 647
Understanding the Catalog Directory 648
Reclaiming Disk Space From Deleted Records 650
Rebuilding a Table 650
Notes 650
Managing Tuple Mover Operations 651
Understanding the Tuple Mover 652
Moveout 652
ROS Containers 653
Mergeout 653
Mergeout of Deletion Markers 654
Tuning the Tuple Mover 654
Tuple Mover Configuration Parameters 655
Resource Pool Settings 656
Loading Data 657
Using More Threads 657
Active Data Partitions 657
Managing Workloads 659
Statements 659
System Tables 660
The Resource Manager 660
Resource Manager Impact on Query Execution 661
Contents

Resource Pool Architecture 662
Modifying and Creating Resource Pools 662
Monitoring Resource Pools and Resource Usage By Queries 662
Examples 662
User Profiles 666
Example 666
Target Memory Determination for Queries in Concurrent Environments 668
Managing Resources At Query Run Time 668
Setting Run-Time Priority for the Resource Pool 669
Prioritizing Queries Within a Resource Pool 669
How to Set Run-Time Priority and Run-Time Priority Threshold 669
Changing Run-Time Priority of a Running Query 670
How To Change the Run-Time Priority of a Running Query 670
Using CHANGE_RUNTIME_PRIORITY 671
Restoring Resource Manager Defaults 671
Best Practices for Managing Workload Resources 672
Basic Principles for Scalability and Concurrency Tuning 672
Guidelines for Setting Pool Parameters 672
Setting a Run-Time Limit for Queries 677
Example: 678
Using User-Defined Pools and User-Profiles for Workload Management 679
Scenario: Periodic Batch Loads 679
Scenario: The CEO Query 680
Scenario: Preventing Run-Away Queries 681
Scenario: Restricting Resource Usage of Ad Hoc Query Application 682
Scenario: Setting a Hard Limit on Concurrency For An Application 683
Scenario: Handling Mixed Workloads (Batch vs. Interactive) 684
Scenario: Setting Priorities on Queries Issued By Different Users 685
Scenario: Continuous Load and Query 686
Scenario: Prioritizing Short Queries At Run Time 687
Scenario: Dropping the Runtime Priority of Long Queries 687
Contents

Tuning the Built-In Pools 689
Scenario: Restricting HP Vertica to Take Only 60% of Memory 689
Scenario: Tuning for Recovery 689
Scenario: Tuning for Refresh 689
Scenario: Tuning Tuple Mover Pool Settings 690
Reducing Query Run Time 690
Real-Time Profiling 691
Managing System Resource Usage 692
Managing Sessions 692
Viewing Sessions 693
Interrupting and Closing Sessions 693
Controlling Sessions 694
Managing Load Streams 695
Working With Storage Locations 697
How HP Vertica Uses Storage Locations 697
Viewing Storage Locations and Policies 698
Viewing Disk Storage Information 698
Viewing Location Labels 698
Viewing Storage Tiers 699
Viewing Storage Policies 700
Adding Storage Locations 700
Planning Storage Locations 700
Adding the Location 701
Storage Location Subdirectories 702
Adding Labeled Storage Locations 702
Adding a Storage Location for USER Access 703
Altering Storage Location Use 704
USER Storage Location Restrictions 704
Effects of Altering Storage Location Use 704
Altering Location Labels 705
Adding a Location Label 705
Contents

Removing a Location Label 706
Effects of Altering a Location Label 706
Creating Storage Policies 707
Creating Policies Based on Storage Performance 707
Storage Levels and Priorities 708
Using the SET_OBJECT_STORAGE_POLICY Function 708
Effects of Creating Storage Policies 709
Moving Data Storage Locations 710
Moving Data Storage While Setting a Storage Policy 710
Effects of Moving a Storage Location 711
Clearing Storage Policies 711
Effects on Same-Name Storage Policies 712
Measuring Storage Performance 713
Measuring Performance on a Running HP Vertica Database 714
Measuring Performance Before a Cluster Is Set Up 714
Setting Storage Performance 714
How HP Vertica Uses Location Performance Settings 715
Using Location Performance Settings With Storage Policies 715
Dropping Storage Locations 716
Altering Storage Locations Before Dropping Them 716
Dropping USER Storage Locations 716
Retiring Storage Locations 716
Restoring Retired Storage Locations 717
Backing Up and Restoring the Database 719
Compatibility Requirements for Using vbr.py 719
Automating Regular Backups 719
Types of Backups 719
Full Backups 720
Object-Level Backups 720
Hard Link Local Backups 721
When to Back up the Database 721
Contents

Configuring Backup Hosts 721
Configuring Single-Node Database Hosts for Backup 722
Creating Configuration Files for Backup Hosts 722
Estimating Backup Host Disk Requirements 723
Estimating Log File Disk Requirements 723
Making Backup Hosts Accessible 723
Setting Up Passwordless SSH Access 724
Testing SSH Access 724
Changing the Default SSH Port on Backup Hosts 725
Increasing the SSH Maximum Connection Settings for a Backup Host 725
Copying Rsync and Python to the Backup Hosts 726
Configuring Hard Link Local Backup Hosts 726
Listing Host Names 726
Creating vbr.py Configuration Files 728
Specifying a Backup Name 728
Backing Up the Vertica Configuration File 729
Saving Multiple Restore Points 729
Specifying Full or Object-Level Backups 729
Entering the User Name 730
Saving the Account Password 730
Specifying the Backup Host and Directory 730
Saving the Configuration File 731
Continuing to Advanced Settings 731
Sample Configuration File 731
Changing the Overwrite Parameter Value 732
Configuring Required VBR Parameters 732
Sample Session Configuring Required Parameters 733
Configuring Advanced VBR Parameters 733
Example of Configuring Advanced Parameters 734
Configuring the Hard Link Local Parameter 734
Restrictions for Backup Encryption Option 735
Contents

Example Backup Configuration File 735
Using Hard File Link Local Backups 737
Planning Hard Link Local Backups 737
Specifying Backup Directory Locations 737
Understanding Hard Link Local Backups and Disaster Recovery 738
Creating Full and Incremental Backups 738
Running Vbr Without Optional Commands 739
Best Practices for Creating Backups 739
Object-Level Backups 740
Backup Locations and Storage 740
Saving Incremental Backups 740
When vbr.py Deletes Older Backups 741
Backup Directory Structure and Contents 741
Directory Tree 742
Multiple Restore Points 742
Creating Object-Level Backups 744
Invoking vbr.py Backup 744
Backup Locations and Naming 744
Best Practices for Object-Level Backups 745
Naming Conventions 745
Creating Backups Concurrently 746
Determining Backup Frequency 746
Understanding Object-Level Backup Contents 746
Making Changes After an Object-Level Backup 747
Understanding the Overwrite Parameter 747
Changing Principal and Dependent Objects 748
Considering Contraint References 748
Configuration Files for Object-Level Backups 748
Backup Epochs 749
Maximum Number of Backups 749
Creating Hard Link Local Backups 749
Contents

Specifying the Hard Link Local Backup Location 750
Creating Hard Link Local Backups for Tape Storage 750
Interrupting the Backup Utility 751
Viewing Backups 751
List Backups With vbr.py 752
Monitor database_snapshots 752
Query database_backups 752
Restoring Full Database Backups 753
Restoring the Most Recent Backup 754
Restoring an Archive 754
Attempting to Restore a Node That Is UP 755
Attempting to Restore to an Alternate Cluster 755
Restoring Object-Level Backups 755
Backup Locations 755
Cluster Requirements for Object-Level Restore 756
Restoring Objects to a Changed Cluster Topology 756
Projection Epoch After Restore 756
Catalog Locks During Backup Restore 757
Catalog Restore Events 757
Restoring Hard Link Local Backups 758
Restoring Full- and Object-Level Hard Link Local Backups 758
Avoiding OID and Epoch Conflicts 758
Transferring Backups to and From Remote Storage 759
Restoring to the Same Cluster 760
Removing Backups 761
Deleting Backup Directories 761
Copying the Database to Another Cluster 762
Identifying Node Names for Target Cluster 763
Configuring the Target Cluster 763
Creating a Configuration File for CopyCluster 764
Copying the Database 765
Contents

Backup and Restore Utility Reference 766
VBR Utility Reference 766
Syntax 766
Parameters 767
VBR Configuration File Reference 767
[Misc] Miscellaneous Settings 767
[Database] Database Access Settings 769
[Transmission] Data Transmission During Backup Process 770
[Mapping] 771
Recovering the Database 773
Failure Recovery 773
Recovery Scenarios 774
Notes 775
Restarting HP Vertica on a Host 775
Restarting HP Vertica on a Host Using the Administration Tools 776
Restarting HP Vertica on a Host Using the Management Console 776
Restarting the Database 776
Recovering the Cluster From a Backup 779
Monitoring Recovery 779
Viewing Log Files on Each Node 779
Viewing the Cluster State and Recover Status 779
Using System Tables to Monitor Recovery 780
Monitoring Cluster Status After Recovery 780
Exporting a Catalog 781
Best Practices for Disaster Recovery 781
Monitoring HP Vertica 785
Monitoring Log Files 785
When a Database Is Running 785
When the Database / Node Is Starting up 785
Rotating Log Files 786
Using Administration Tools Logrotate Utility 786
Contents

Manually Rotating Logs 786
Manually Creating Logrotate Scripts 787
Monitoring Process Status (ps) 789
Monitoring Linux Resource Usage 790
Monitoring Disk Space Usage 791
Monitoring Database Size for License Compliance 791
Viewing Your License Compliance Status 792
Manually Auditing Columnar Data Usage 792
Manually Auditing Flex Table Data Usage 793
Targeted Auditing 793
Using Management Console to Monitor License Compliance 793
Monitoring Shared Node Connections 794
Monitoring Elastic Cluster Rebalancing 795
Historical Rebalance Information 796
Monitoring Parameters 796
Monitoring Events 798
Event Logging Mechanisms 798
Event Severity Types 798
Event Data 802
Configuring Event Reporting 805
Configuring Reporting for Syslog 805
Enabling HP Vertica to Trap Events for Syslog 805
Defining Events to Trap for Syslog 806
Defining the SyslogFacility to Use for Reporting 807
Configuring Reporting for SNMP 808
Configuring Event Trapping for SNMP 809
To Configure HP Vertica to Trap Events for SNMP 809
To Enable Event Trapping for SNMP 810
To Define Where HP Vertica Send Traps 810
To Define Which Events HP Vertica Traps 810
Verifying SNMP Configuration 811
Contents

Event Reporting Examples 812
Vertica.log 812
SNMP 812
Syslog 812
Using System Tables 814
Where System Tables Reside 814
How System Tables Are Organized 814
Querying Case-Sensitive Data in System Tables 815
Examples 816
Retaining Monitoring Information 818
Data Collector 818
Where Is DC Information retained? 818
DC Tables 819
Enabling and Disabling Data Collector 819
Viewing Current Data Retention Policy 819
Configuring Data Retention Policies 820
Working with Data Collection Logs 821
Clearing the Data Collector 822
Flushing Data Collector Logs 823
Monitoring Data Collection Components 823
Related Topics 824
Querying Data Collector Tables 824
Clearing PROJECTION_REFRESHES History 825
Monitoring Query Plan Profiles 826
Monitoring Partition Reorganization 826
Monitoring Resource Pools and Resource Usage By Queries 827
Examples 827
Monitoring Recovery 830
Viewing Log Files on Each Node 830
Viewing the Cluster State and Recover Status 831
Using System Tables to Monitor Recovery 831
Contents

Monitoring Cluster Status After Recovery 832
Monitoring HP Vertica Using MC 833
About Chart Updates 833
Viewing MC Home Page 834
Tasks 834
Recent Databases 835
Monitoring Same-Name Databases on MC 835
Monitoring Cluster Resources 836
Database 836
Messages 836
Performance 837
CPU/Memory Usage 837
User Query Type Distribution 837
Monitoring Cluster Nodes 838
Filtering What You See 838
If You don't See What You Expect 838
Monitoring Cluster CPU/Memory 839
Investigating Areas of Concern 839
Monitoring Cluster Performance 839
How to Get Metrics on Your Cluster 839
Node Colors and What They Mean 840
Filtering Nodes From the View 840
Monitoring System Resources 841
How up to Date Is the information? 841
Monitoring Query Activity 841
Monitoring Key Events 842
Filtering Chart Results 843
Viewing More Detail 843
Monitoring Internal Sessions 844
Monitoring User Sessions 844
Contents

What Chart Colors Mean 844
Chart Results 845
Monitoring System Memory Usage 845
Types of System Memory 845
Monitoring System Bottlenecks 846
How MC Gathers System Bottleneck Data 846
The Components MC Reports on 846
How MC Handles Conflicts in Resources 846
Example 847
Monitoring User Query Phases 847
Monitoring Table Utilization 849
Monitoring Node Activity 850
Monitoring MC-managed Database Messages 853
Message Severity 854
Viewing Message Details 854
Search and Export Messages 854
Searching MC-managed Database Messages 854
Changing Message Search Criteria 855
Specifying Date Range Searches 855
Filtering Messages Client Side 856
Exporting MC-managed Database Messages and Logs 856
Monitoring MC User Activity 859
Background Cleanup of Audit Records 860
Filter and Export Results 861
If You Perform a Factory Reset 861
Analyzing Workloads 862
About the Workload Analyzer 862
Getting Tuning Recommendations Through an API 862
Contents

What and When 862
Record the Events 863
Observation Count and Time 864
Knowing What to Tune 864
The Tuning Description (recommended action) and Command 864
What a Tuning Operation Costs 864
Examples 864
Getting Recommendations From System Tables 866
Understanding WLA's Triggering Events 866
Getting Tuning Recommendations Through MC 866
Understanding WLA Triggering Conditions 867
Collecting Database Statistics 875
Statistics Used By the Query Optimizer 876
How Statistics Are Collected 876
Using the ANALYZE ROW COUNT Operation 877
Using ANALYZE_STATISTICS 877
Using ANALYZE_HISTOGRAM 877
Examples 878
How Statistics Are Computed 879
How Statistics Are Reported 879
Determining When Statistics Were Last Updated 880
Reacting to Stale Statistics 884
Example 885
Canceling Statistics Collection 886
Best Practices for Statistics Collection 886
When to Gather Full Statistics 887
Save Statistics 888
Using Diagnostic Tools 889
Determining Your Version of HP Vertica 889
Collecting Diagnostics (scrutinize Command) 889
How to Run Scrutinize 894
Contents

How Scrutinize Collects and Packages Diagnostics 894
How to Upload Scrutinize Results to Support 896
Examples for the Scrutinize Command 898
Get Help with scrutinize Options 898
Collect Defaults in Your Cluster 898
Collect Information for a Database 898
Collect Information from a Specific Node 899
Use a Staging Area Other Than /tmp 899
Include Gzipped Log Files 899
Include a Message in Your File 899
Send Results to Support 900
Collecting Diagnostics (diagnostics Command) 900
Syntax 900
Arguments 900
Using the Diagnostics Utility 901
Examples 901
Exporting a Catalog 902
Exporting Profiling Data 902
Syntax 902
Parameters 903
Example 903
Understanding Query Plans 904
How to Get Query Plan Information 905
How to save query plan information 906
Viewing EXPLAIN Output in Management Console 907
About the Query Plan in Management Console 908
Expanding and Collapsing Query Paths in EXPLAIN Output 909
Clearing Query Data 909
Viewing Projection and Column Metadata in EXPLAIN output 909
Viewing EXPLAIN Output in vsql 910
About EXPLAIN Output 911
Contents

Textual output of query plans 911
Viewing the Statistics Query Plan Output 912
Viewing the Cost and Rows Path 914
Viewing the Projection Path 915
Viewing the Join Path 916
Outer versus inner join 917
Hash and merge joins 917
Inequality joins 919
Event series joins 920
Viewing the Path ID 920
Viewing the Filter Path 921
Viewing the GROUP BY Paths 922
GROUPBY HASH Query Plan Example 922
GROUPBY PIPELINED Query Plan Example 923
Partially Sorted GROUPBY Query Plan Example 924
Viewing the Sort Path 925
Viewing the Limit Path 926
Viewing the Data Redistribution Path 926
Viewing the Analytic Function Path 928
Viewing Node Down Information 929
Viewing the MERGE Path 930
Linking EXPLAIN Output to Error Messages and Profiling Information 931
Using the QUERY_PLAN_PROFILES table 933
Profiling Database Performance 935
How to Determine If Profiling Is Enabled 936
How to Enable Profiling 936
How to Disable Profiling 937
About Real-Time Profiling 938
About profiling counters 938
About query plan profiles 938
System tables with profile data 939
Contents

What to look for in query profiles 939
Viewing Profile Data in Management Console 940
Monitoring Profiling Progress 941
Viewing Updated Profile Metrics 941
Expanding and collapsing query path profile data 942
About Profile Data in Management Console 942
Projection metadata 943
Query phase duration 944
Profile metrics 944
Execution events 945
Optimizer events 946
Clearing Query Data 947
Viewing Profile Data in vsql 947
How to profile a single statement 948
Real-Time Profiling Example 948
How to Use the Linux watch Command 948
How to Find Out Which Counters are Available 949
Sample views for counter information 950
Running scripts to create the sample views 950
Viewing counter values using the sample views 950
Combining sample views 950
Viewing real-time profile data 951
How to label queries for profiling 951
Label syntax 952
Profiling query plans 953
What you need for query plan profiling 953
How to get query plan status for small queries 954
How to get query plan status for large queries 955
Improving the readability of QUERY_PLAN_PROFILES output 957
Managing query profile data 958
Configuring data retention policies 958
Contents

Reacting to suboptimal query plans 958
About Locales 961
Unicode Character Encoding: UTF-8 (8-bit UCS/Unicode Transformation Format) 961
Locales 961
Notes 962
Locale Aware String Functions 962
UTF-8 String Functions 963
Locale Specification 965
Long Form 965
Syntax 965
Parameters 965
Collation Keyword Parameters 969
Notes 971
Examples 971
Short Form 972
Determining the Short Form of a Locale 972
Specifying a Short Form Locale 972
Supported Locales 973
Locale Restrictions and Workarounds 984
Appendix: Binary File Formats 987
Creating Native Binary Format Files 987
File Signature 987
Column Definitions 987
Row Data 990
Example 991
We appreciate your feedback! 997
Contents

Administration Overview
This document describes the functions performed by an HP Vertica database administrator (DBA).
Perform these tasks using only the dedicated database administrator account that was created
when you installed HP Vertica. The examples in this documentation set assume that the
administrative account name is dbadmin.
l To perform certain cluster configuration and administration tasks, the DBA (users of the
administrative account) must be able to supply the root password for those hosts. If this
requirement conflicts with your organization's security policies, these functions must be
performed by your IT staff.
l If you perform administrative functions using a different account from the account provided
during installation, HP Vertica encounters file ownership problems.
l If you share the administrative account password, make sure that only one user runs the
Administration Tools at any time. Otherwise, automatic configuration propagation does not
work correctly.
l The Administration Tools require that the calling user's shell be /bin/bash. Other shells give
unexpected results and are not supported.
Administration Overview

Managing Licenses
You must license HP Vertica in order to use it. Hewlett-Packard supplies your license information
to you in the form of one or more license files, which encode the terms of your license. Two licenses
are available:
l vlicense.dat, for columnar tables
l flextables.key, for Flex Zone flexible tables.
To prevent introducing special characters (such as line endings or file terminators) into the license
key file, do not open the file in an editor or e-mail client. Though special characters are not always
visible in an editor, their presence invalidates the license.
Copying Enterprise, Evaluation, and Flex Zone
License Files
For ease of HP Vertica Enterprise Edition installation, HP recommends that you copy the license
file to /tmp/vlicense.dat on the Administration host.
If you have a license for Flex Zone, HP recommends that you copy the license file to
/opt/vertica/config/share/license.com.vertica.flextables.key.
Be careful not to change the license key file in any way when copying the file between Windows
and Linux, or to any other location. To help prevent applications from trying to alter the file, enclose
the license file in an archive file (such as a .zip or .tar file).
After copying the license file from one location to another, check that the copied file size is identical
to that of the one you received from HP Vertica.
Obtaining a License Key File
To obtain an Enterprise Edition, Evaluation, or Flex Zone flex table license key, contact HP Vertica
at: http://www.vertica.com/about/contact-us/
Your HP Vertica Community Edition download package includes the Community Edition license,
which allows three nodes and 1TB of data. The HP Vertica Community Edition license does not
expire.
Understanding HP Vertica Licenses
HP Vertica has flexible licensing terms. It can be licensed on the following bases:
l Term-based (valid until a specific date)
l Raw data size based (valid to store up to some amount of raw data)
Managing Licenses

l Both term-based and data-size-based
l Unlimited duration and data storage
l Raw data size based and a limit of 3 nodes (HP Vertica Community Edition)
Your license key has your licensing bases encoded into it. If you are unsure of your current license,
you can view your license information from within HP Vertica.
License Types
HP Vertica Community Edition. You can download and start using Community Edition for free.
The Community Edition license allows customers the following:
l 3 node limit
l 1 terabyte columnar table data limit
l 1 terabyte Flex table data limit
HP Vertica Enterprise Edition. You can purchase the Enterprise Edition license. The Enterprise
Edition license entitles customers to:
l No node limit
l Columnar data, amount specified by the license
l 1 terabyte Flex table data
Flex Zone. Flex Zone is a license for the flex tables technology, available in version 7.0.
Customers can separately purchase and apply a Flex Zone license to their installation. The Flex
Zone license entitles customers to the licensed amount of Flex table data and removes the 3 node
restriction imposed by the Community Edition.
Customers whose primary goal is to work with flex tables can purchase a Flex Zone license. When
they purchase Flex Zone, customers receive a complimentary Enterprise License, which entitles
them to one terabyte of columnar data and imposes no node limit.
Note: Customers who purchase a Flex Zone license must apply two licenses: their Enterprise
Edition license and their Flex Zone license.
Allowances Community
Edition
Enterprise
Edition
Enterprise Edition + Flex
Zone
Flex
Zone
Node Limit 3 nodes Unlimited Unlimited Unlimited
Columnar
Data
1 terabyte Per license Per license 1
terabyte
Flex Data 1 terabyte 1 terabyte Per license Per
license
Managing Licenses

Installing or Upgrading a License Key
The steps you follow to apply your HP Vertica license key vary, depending on the type of license
you are applying and whether you are upgrading your license. This section describes the following:
l New HP Vertica License Installations
l HP Vertica License Renewals or Upgrades
l Flex Zone License Installations
New HP Vertica License Installations
1. Copy the license key file to your Administration Host.
2. Ensure the license key's file permissions are set to at least 666 (read and write permissions for
all users).
3. Install HP Vertica as described in the Installation Guide if you have not already done so. The
interface prompts you for the license key file.
4. To install Community Edition, leave the default path blank and press OK. To apply your
evaluation or Enterprise Edition license, enter the absolute path of the license key file you
downloaded to your Administration Host and press OK. The first time you log in as the
Database Administrator and run the Administration Tools, the interface prompts you to
accept the End-User License Agreement (EULA).
Note: If you installed Management Console, the MC administrator can point to the
location of the license key during Management Console configuration.
5. Choose View EULA to review the EULA.
6. Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing
the license, or choose Reject EULA to reject the EULA and return to the Advanced Menu.
HP Vertica License Renewals or Upgrades
If your license is expiring or you want your database to grow beyond your licensed data size, you
must renew or upgrade your license. Once you have obtained your renewal or upgraded license key
file, you can install it using Administration Tools or Management Console.
Managing Licenses

Uploading or Upgrading a License Key Using
Administration Tools
1. Copy the license key file to your Administration Host.
2. Ensure the license key's file permissions are set to at least 666 (read and write permissions for
all users).
3. Start your database, if it is not already running.
4. In the Administration Tools, select Advanced > Upgrade License Key and click OK.
5. Enter the path to your new license key file and click OK. The interface prompts you to accept
the End-User License Agreement (EULA).
6. Choose View EULA to review the EULA.
7. Exit the EULA and choose Accept EULA to officially accept the EULA and continue installing
the license, or choose Reject EULA to reject the EULA and return to the Advanced Tools
menu.
Uploading or Upgrading a License Key Using
Management Console
1. From your database's Overview page in Management Console, click the License tab. The
License page displays. You can view your installed licenses on this page.
2. Click the Install New License button at the top of the License page.
3. Browse to the location of the license key from your local computer (where the web browser is
installed) and upload the file.
4. Click the Apply button at the top of the page. The interface prompts you to accept the End-User
License Agreement (EULA).
5. Select the check box to officially accept the EULA and continue installing the license, or click
Cancel to exit.
Note: As soon as you renew or upgrade your license key from either your Administration
Host or Management Console, HP Vertica applies the license update. No further warnings
appear.
Managing Licenses

Flex Table License Installations
Installing a Flex Table license using vsql
1. Install HP Vertica as described in the Installation Guide if you have not already done so.
2. Copy the Flex Zone flex tables license key file to your Administration Host. HP recommends
that you copy the license file to
/opt/vertica/config/share/license.com.vertica.flextables.key.
3. Start your database, if it is not already running.
4. In the Administration Tools, connect to your database.
5. At the vsql prompt, select INSTALL_LICENSE as described in the SQL Reference Manual.
=> SELECT INSTALL_LICENSE('/opt/vertica/config/share/license.com.vertica.flextables.k
ey');
Installing a Flex Table license using Management
Console
1. Start Management Console.
2. From your database's Overview page in Management Console, click the License tab. The
License page displays. You can view your installed licenses on this page.
3. Click the Install New License button at the top of the License page.
4. Browse to the location of the license key from your local computer (where the web browser is
installed) and upload the file.
5. Click the Apply button at the top of the page. The interface prompts you to accept the End-User
License Agreement (EULA).
6. Select the check box to officially accept the EULA and continue installing the license, or click
Cancel to exit.
Viewing Your License Status
HP Vertica has several functions to display your license terms and current status.
Managing Licenses

Examining Your License Key
Use the DISPLAY_LICENSE SQL function described in the SQL Reference Manual to display the
license information. This function displays the dates for which your license is valid (or Perpetual if
your license does not expire) and any raw data allowance. For example:
=> SELECT DISPLAY_LICENSE();
DISPLAY_LICENSE
----------------------------------------------------
HP Vertica Systems, Inc.
1/1/2011
12/31/2011
30
50TB
(1 row)
Or, use the LICENSES table described in the SQL Reference Manual to view information about all
your installed licenses. This table displays your license types, the dates for which your licenses are
valid, and the size and node limits your licenses impose. In the example below, the licenses table
displays the Community Edition license and the default license that controls HP Vertica's flex data
capacity.
=> SELECT * FROM licenses; x
-[ RECORD 1 ]--------+----------------------------------------
license_id | 45035996273704986
name | vertica
licensee | Vertica Community Edition
start_date | 2011-11-22
end_date | Perpetual
size | 1TB
is_community_edition | t
node_restriction | 3
-[ RECORD 2 ]--------+----------------------------------------
license_id | 45035996274085644
name | com.vertica.flextable
licensee | Vertica Community Edition, Flextable
start_date | 2013-10-29
end_date | Perpetual
size | 1TB
is_community_edition | t
node_restriction |
You can also view the LICENSES table in Management Console. On your database's Overview
page in Management Console, click the License tab. The License page displays information about
your installed licenses.
Viewing Your License Status
If your license includes a raw data size allowance, HP Vertica periodically audits your database's
size to ensure it remains compliant with the license agreement. If your license has an end date, HP
Managing Licenses

Vertica also periodically checks to see if the license has expired. You can see the result of the
latest audits using the GET_COMPLIANCE_STATUS function.
GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------
Raw Data Size: 2.00GB +/- 0.003GB
License Size : 4.000GB
Utilization : 50%
Audit Time : 2011-03-09 09:54:09.538704+00
Compliance Status : The database is in compliance with respect to raw data size.
License End Date: 04/06/2011
Days Remaining: 28.59
(1 row)
Viewing Your License Status Through MC
Information about license usage is on the Settings page. See Monitoring Database Size for License
Compliance.
Calculating the Database Size
You can use your HP Vertica software until your columnar data reaches the maximum raw data
size that the license agreement provides. This section describes when data is monitored, what data
is included in the estimate, and the general methodology used to produce an estimate. For more
information about monitoring for data size, see Monitoring Database Size for License Compliance.
How HP Vertica Estimates Raw Data Size
HP Vertica uses statistical sampling to calculate an accurate estimate of the raw data size of the
database. In this context, raw data means the uncompressed, unfederated data stored in a single
HP Vertica database. For the purpose of license size audit and enforcement, HP Vertica evaluates
the raw data size as if the data had been exported from the database in text format, rather than as
compressed data.
HP Vertica conducts your database size audit using statistical sampling. This method allows HP
Vertica to estimate the size of the database without significantly impacting database performance.
The trade-off between accuracy and impact on performance is a small margin of error, inherent in
statistical sampling. Reports on your database size include the margin of error, so you can assess
the accuracy of the estimate. To learn more about simple random sampling, see Simple Random
Sampling.
Excluding Data From Raw Data Size Estimate
Not all data in the HP Vertica database is evaluated as part of the raw data size. Specifically, HP
Vertica excludes the following data:
Managing Licenses

l Multiple projections (underlying physical copies) of data from a logical database entity (table).
Data appearing in multiple projections of the same table is counted only once.
l Data stored in temporary tables.
l Data accessible through external table definitions.
l Data that has been deleted, but which remains in the database. To understand more about
deleting and purging data, see Purging Deleted Data.
l Data stored in the WOS.
l Data stored in system and work tables such as monitoring tables, Data Collector tables, and
Database Designer tables.
Evaluating Data Type Footprint Size
The data sampled for the estimate is treated as if it had been exported from the database in text
format (such as printed from vsql). This means that HP Vertica evaluates the data type footprint
sizes as follows:
l Strings and binary types (CHAR, VARCHAR, BINARY, VARBINARY) are counted as their
actual size in bytes using UTF-8 encoding. NULL values are counted as 1-byte values (zero
bytes for the NULL, and 1-byte for the delimiter).
l Numeric data types are counted as if they had been printed. Each digit counts as a byte, as does
any decimal point, sign, or scientific notation. For example, -123.456 counts as eight bytes (six
digits plus the decimal point and minus sign).
l Date/time data types are counted as if they had been converted to text, including any hyphens or
other separators. For example, a timestamp column containing the value for noon on July 4th,
2011 would be 19 bytes. As text, vsql would print the value as 2011-07-04 12:00:00, which is 19
characters, including the space between the date and the time.
Note: Each column has an additional byte for the column delimiter.
Using AUDIT to Estimate Database Size
To supply a more accurate database size estimate than statistical sampling can provide, use the
AUDIT function to perform a full audit. This function has parameters to set both the error_
tolerance and confidence_level. Using one or both of these parameters increases or decreases
the function's performance impact.
For instance, lowering the error_tolerance to zero (0) and raising the confidence_level to 100,
provides the most accurate size estimate, and increases the performance impact of calling the
AUDIT function. During a detailed, low error-tolerant audit, all of the data in the database is dumped
to a raw format to calculate its size. Since performing a stringent audit can significantly impact
database performance, never perform a full audit of a production database. See AUDIT for details.
Managing Licenses

Note: Unlike estimating raw data size using statistical sampling, a full audit performs SQL
queries on the full database contents, including the contents of the WOS.
Monitoring Database Size for License Compliance
Your HP Vertica license can include a data storage allowance. The allowance can be for columnar
data, for flex table data, or for both types of data (two separate licenses). The audit() function
estimates the columnar table data size, while the audit_flex() function calculates the amount of
flex table data storage. Monitoring data sizes for columnar and flex tables lets you plan either to
schedule deleting old data to keep your database in compliance with your license, or to budget for a
license upgrade for additional data storage.
Note: An audit of columnar data includes any materialized columns in flex tables.
Viewing Your License Compliance Status
HP Vertica periodically runs an audit of the columnar data size to verify that your database remains
compliant with your license. You can view the results of the most recent audit by calling the GET_
COMPLIANCE_STATUS function.
GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------
Raw Data Size: 2.00GB +/- 0.003GB
License Size : 4.000GB
Utilization : 50%
Audit Time : 2011-03-09 09:54:09.538704+00
Compliance Status : The database is in compliance with respect to raw data size.
License End Date: 04/06/2011
Days Remaining: 28.59
(1 row)
Periodically running GET_COMPLIANCE_STATUS to monitor your database's license status is
usually enough to ensure that your database remains compliant with your license. If your database
begins to near its columnar data allowance, you can use the other auditing functions described
below to determine where your database is growing and how recent deletes have affected the size
of your database.
Manually Auditing Columnar Data Usage
You can manually check license compliance for all columnar data in your database using the
AUDIT_LICENSE_SIZE SQL function. This function performs the same audit that HP Vertica
periodically performs automatically. The AUDIT_LICENSE_SIZE SQL check runs in the
background, so the function returns immediately. You can then view the audit results using GET_
COMPLIANCE_STATUS.
Managing Licenses

Note: When you audit columnar data, the results include any flexible table virtual columns that
you have materialized. Materialized columns include columns that you specify when creating a
flex table, and any that promote from virtual columns to real columns.
An alternative to AUDIT_LICENSE_SIZE is to use the AUDIT SQL function to audit the size of the
columnar tables in your entire database by passing an empty string to the function. Unlike AUDIT_
LICENSE_SIZE, this function operates synchronously, returning when it has estimated the size of
the database.
=> SELECT AUDIT('');
AUDIT
----------
76376696
(1 row)
The size of the database is reported in bytes. The AUDIT function also allows you to control the
accuracy of the estimated database size using additional parameters. See the entry for the AUDIT
function in the SQL Reference Manual for full details. HP Vertica does not count the AUDIT
function results as an official audit. It takes no license compliance actions based on the results.
Note: The results of the AUDIT function do not include flexible table data. Use the AUDIT_
FLEX function to monitor data usage for your Flex Tables license .
Manually Auditing Flex Table Data Usage
You can use the AUDIT_FLEX function to manually audit data usage for one or more flexible tables.
The function measures encoded, compressed data stored in ROS containers for the __raw__
column of one or more flexible tables. The audit results include only virtual columns in flex tables,
not data included in materialized columns. Temporary flex tables are not included in the audit.
Targeted Auditing
If audits determine that the columnar table estimates are unexpectedly large, consider schemas,
tables, or partitions are using the most storage. You can use the AUDIT function to perform
targeted audits of schemas, tables, or partitions by supplying the name of the entity whose size you
want to find. For example, to find the size of the online_sales schema in the VMart example
database, run the following command:
VMart=> SELECT AUDIT('online_sales');
AUDIT
----------
35716504
(1 row)
You can also change the granularity of an audit to report the size of each entity in a larger entity (for
example, each table in a schema) by using the granularity argument of the AUDIT function. See the
AUDIT function's entry in the SQL Reference Manual.
Managing Licenses

Using Management Console to Monitor License
Compliance
You can also get information about data storage of columnar data (for columnar tables and for
materialized columns in flex tables) through the Management Console. This information is available
in the database Overview page, which displays a grid view of the database's overall health.
l The needle in the license meter adjusts to reflect the amount used in megabytes.
l The grace period represents the term portion of the license.
l The Audit button returns the same information as the AUDIT() function in a graphical
representation.
l The Details link within the License grid (next to the Audit button) provides historical information
about license usage. This page also shows a progress meter of percent used toward your
license limit.
Managing License Warnings and Limits
Term License Warnings and Expiration
The term portion of an HP Vertica license is easy to manage—you are licensed to use HP Vertica
until a specific date. If the term of your license expires, HP Vertica alerts you with messages
appearing in the Administration Tools and vsql. For example:
=> CREATE TABLE T (A INT);NOTICE: Vertica license is in its grace period
HINT: Renew at http://www.vertica.com/
CREATE TABLE
Contact HP Vertica at http://www.vertica.com/about/contact-us/ as soon as possible to renew
your license, and then install the new license. After the grace period expires, HP Vertica stops
processing queries.
Data Size License Warnings and Remedies
If your HP Vertica license includes a raw data size allowance, HP Vertica periodically audits the
size of your database to ensure it remains compliant with the license agreement. For details of this
audit, see Calculating the Database Size. You should also monitor your database size to know
when it will approach licensed usage. Monitoring the database size helps you plan to either upgrade
your license to allow for continued database growth or delete data from the database so you remain
compliant with your license. See Monitoring Database Size for License Compliance for details.
If your database's size approaches your licensed usage allowance, you will see warnings in the
Administration Tools and vsql. You have two options to eliminate these warnings:
Managing Licenses

l Upgrade your license to a larger data size allowance.
l Delete data from your database to remain under your licensed raw data size allowance. The
warnings disappear after HP Vertica's next audit of the database size shows that it is no longer
close to or over the licensed amount. You can also manually run a database audit (see
Monitoring Database Size for License Compliance for details).
If your database continues to grow after you receive warnings that its size is approaching your
licensed size allowance, HP Vertica displays additional warnings in more parts of the system after
a grace period passes.
If Your HP Vertica Enterprise Edition Database Size
Exceeds Your Licensed Limits
If your Enterprise Edition database size exceeds your licensed data allowance, all successful
queries from ODBC and JDBC clients return with a status of SUCCESS_WITH_INFO instead of
the usual SUCCESS. The message sent with the results contains a warning about the database
size. Your ODBC and JDBC clients should be prepared to handle these messages instead of
assuming that successful requests always return SUCCESS.
If Your HP VerticaCommunity Edition Database Size
Exceeds Your Licensed Limits
If your Community Edition database size exceeds your licensed data allowance, you will no longer
be able to load or modify data in your database. In addition, you will not be able to delete data from
your database.
To bring your database under compliance, you can choose to:
l Drop database tables
l Upgrade to HP Vertica Enterprise Edition (or an evaluation license)
Managing Licenses

of 997HP Vertica Analytic Database (7.0.x)
Managing Licenses

Configuring the Database
This section provides information about:
l The Configuration Procedure
l Configuration Parameters
l Designing a logical schema
l Creating the physical schema
You'll also want to set up a security scheme. See Implementing Security.
See also implementing locales for international data sets.
Note: Before you begin this section, HP strongly recommends that you follow the Tutorial in
the Getting Started Guide to quickly familiarize yourself with creating and configuring a fully-
functioning example database.

Configuration Procedure
This section describes the tasks required to set up an HP Vertica database. It assumes that you
have obtained a valid license key file, installed the HP Vertica rpm package, and run the installation
script as described in the Installation Guide.
You'll complete the configuration procedure using the:
l Administration Tools
If you are unfamiliar with Dialog-based user interfaces, read Using the Administration Tools
Interface before you begin. See also the Administration Tools Reference for details.
l vsql interactive interface
l The Database Designer, described fully in Creating a Database Design
Note: Users can also perform certain tasks using the Management Console. Those tasks will
point to the appropriate topic.
IMPORTANT NOTES
l Follow the configuration procedure in the order presented in this book.
l HP strongly recommends that you first use the Tutorial in the Getting Started Guide to
experiment with creating and configuring a database.
l Although you may create more than one database (for example, one for production and one for
testing), you may create only one active database for each installation of Vertica Analytic
Database
l The generic configuration procedure described here can be used several times during the
development process and modified each time to fit changing goals. You can omit steps such as
preparing actual data files and sample queries, and run the Database Designer without
optimizing for queries. For example, you can create, load, and query a database several times
for development and testing purposes, then one final time to create and load the production
database.

Prepare Disk Storage Locations
You must create and specify directories in which to store your catalog and data files (physical
schema). You can specify these locations when you install or configure the database, or later
during database operations.
The directory you specify for your catalog files (the catalog path) is used across all nodes. That is, if
you specify /home/catalog for your catalog files, HP Vertica will use /home/catalog as the catalog
path on all nodes. The catalog directory should always be separate from any data files.
Note: Do not use a shared directory for more than one node. Data and catalog directories must
be distinct for each node. Multiple nodes must not be allowed to write to the same data or
catalog directory.
The same is true for your data path. If you specify that your data should be stored in /home/data,
HP Vertica ensures this is the data path used on all database nodes.
Do not use a single directory to contain both catalog and data files. You can store the catalog and
data directories on different drives, which can be either on drives local to the host (recommended for
the catalog directory) or on a shared storage location, such as an external disk enclosure or a SAN.
Both the catalog and data directories must be owned by the database administrator.
Before you specify a catalog or data path, be sure the parent directory exists on all nodes of your
database. The database creation process in admintools creates the actual directories, but the
parent directory must exist on all nodes.
You do not need to specify a disk storage location during installation. However, you can by using
the --data-dir parameter to the install_vertica script. See Specifying Disk Storage Location
During Installation
See Also
l Specifying Disk Storage Location on MC
l Specifying Disk Storage Location During Database Creation
l Configuring Disk Usage to Optimize Performance
l Using Shared Storage With HP Vertica
Specifying Disk Storage Location During Installation
There are three ways to specify the disk storage location. You can specify the location when you:

l Install HP Vertica
l Create a database using the Administration Tools
l Install and configure Management Console
To Specify the Disk Storage Location When You install:
When you install HP Vertica, the --data_dir parameter in the install_vertica script (see
Installing with the Script) lets you specify a directory to contain database data and catalog files. The
script defaults to the Database Administrator's default home directory: /home/dbadmin.
You should replace this default with a directory that has adequate space to hold your data and
catalog files.
Before you create a database, verify that the data and catalog directory exists on each node in the
cluster. Also verify that the directory on each node is owned by the database administrator.
Notes
l Catalog and data path names must contain only alphanumeric characters and cannot have
leading space characters. Failure to comply with these restrictions will result in database
creation failure.
l HP Vertica refuses to overwrite a directory if it appears to be in use by another database.
Therefore, if you created a database for evaluation purposes, dropped the database, and want to
reuse the database name, make sure that the disk storage location previously used has been
completely cleaned up. See Working With Storage Locations for details.
Specifying Disk Storage Location During Database
Creation
When you invoke the Create Database command in the Administration Tools, a dialog box allows
you to specify the catalog and data locations. These locations must exist on each host in the
cluster and must be owned by the database administrator.
When you click OK, HP Vertica automatically creates the following subdirectories:

catalog-pathname/database-name/node-name_catalog/data-pathname/database-name/node-name_da
ta/
For example, if you use the default value (the database administrator's home directory) of
/home/dbadmin for the Stock Exchange example database, the catalog and data directories are
created on each node in the cluster as follows:
/home/dbadmin/Stock_Schema/stock_schema_node1_host01_catalog/home/dbadmin/Stock_Schema/st
ock_schema_node1_host01_data
Notes
l Catalog and data path names must contain only alphanumeric characters and cannot have
leading space characters. Failure to comply with these restrictions will result in database
creation failure.
l HP Vertica refuses to overwrite a directory if it appears to be in use by another database.
Therefore, if you created a database for evaluation purposes, dropped the database, and want to
reuse the database name, make sure that the disk storage location previously used has been
completely cleaned up. See Working With Storage Locations for details.
Specifying Disk Storage Location on MC
You can use the MC interface to specify where you want to store database metadata on the cluster
in the following ways:
l When you configure MC the first time
l When you create new databases using on MC
See Configuring Management Console.
Configuring Disk Usage to Optimize Performance
Once you have created your initial storage location, you can add additional storage locations to the
database later. Not only does this provide additional space, it lets you control disk usage and
increase I/O performance by isolating files that have different I/O or access patterns. For example,
consider:
l Isolating execution engine temporary files from data files by creating a separate storage location
for temp space.
l Creating labeled storage locations and storage policies, in which selected database objects are
stored on different storage locations based on measured performance statistics or predicted
access patterns.
See Working With Storage Locations for details.

Using Shared Storage With HP Vertica
If using shared SAN storage, ensure there is no contention among the nodes for disk space or
bandwidth.
l Each host must have its own catalog and data locations. Hosts cannot share catalog or data
locations.
l Configure the storage so that there is enough I/O bandwidth for each node to access the storage
independently.
Viewing Database Storage Information
You can view node-specific information on your HP Vertica cluster through the Management
Console. See Monitoring HP Vertica Using MC for details.
Disk Space Requirements for HP Vertica
HP Vertica requires disk space for several data reorganization operations. For best results, HP
recommends that disk utilization per node not exceed sixty (60) percent.
Disk Space Requirements for Management Console
You can install MC on any node in the cluster, so there are no special disk requirements for MC—
other than disk space you would normally allocate for your database cluster. See Disk Space
Requirements for HP Vertica.
Prepare the Logical Schema Script
Designing a logical schema for an HP Vertica database is no different from designing one for any
other SQL database. Details are described more fully in Designing a Logical Schema.
To create your logical schema, prepare a SQL script (plain text file, typically with an extension of
.sql) that:
1. Creates additional schemas (as necessary). See Using Multiple Schemas.
2. Creates the tables and column constraints in your database using the CREATE TABLE
command.
3. Defines the necessary table constraints using the ALTER TABLE command.
4. Defines any views on the table using the CREATE VIEW command.
You can generate a script file using:

l A schema designer application.
l A schema extracted from an existing database.
l A text editor.
l One of the example database example-name_define_schema.sql scripts as a template. (See
the example database directories in /opt/vertica/examples.)
In your script file, make sure that:
l Each statement ends with a semicolon.
l You use data types supported by HP Vertica, as described in the SQL Reference Manual.
Once you have created a database, you can test your schema script by executing it as described in
Create the Logical Schema. If you encounter errors, drop all tables, correct the errors, and run the
script again.
Prepare Data Files
Prepare two sets of data files:
l Test data files. Use test files to test the database after the partial data load. If possible, use part
of the actual data files to prepare the test data files.
l Actual data files. Once the database has been tested and optimized, use your data files for your
initial Bulk Loading Data.
How to Name Data Files
Name each data file to match the corresponding table in the logical schema. Case does not matter.
Use the extension .tbl or whatever you prefer. For example, if a table is named Stock_Dimension,
name the corresponding data file stock_dimension.tbl. When using multiple data files, append _
nnn (where nnn is a positive integer in the range 001 to 999) to the file name. For example, stock_
dimension.tbl_001, stock_dimension.tbl_002, and so on.
Prepare Load Scripts
Note: You can postpone this step if your goal is to test a logical schema design for validity.
Prepare SQL scripts to load data directly into physical storage using the COPY...DIRECT
statement using vsql, or through ODBC as described in the Programmer's Guide.
You need scripts that load the:

l Large tables
l Small tables
HP recommends that you load large tables using multiple files. To test the load process, use files of
10GB to 50GB in size. This size provides several advantages:
l You can use one of the data files as a sample data file for the Database Designer.
l You can load just enough data to Perform a Partial Data Load before you load the remainder.
l If a single load fails and rolls back, you do not lose an excessive amount of time.
l Once the load process is tested, for multi-terabyte tables, break up the full load in file sizes of
250–500GB.
See the Bulk Loading Data and the following additional topics for details:
l Bulk Loading Data
l Using Load Scripts
l Using Parallel Load Streams
l Loading Data into Pre-Join Projections
l Enforcing Constraints
l About Load Errors
Tip: You can use the load scripts included in the example databases in the Getting Started
Guide as templates.
Create an Optional Sample Query Script
The purpose of a sample query script is to test your schema and load scripts for errors.
Include a sample of queries your users are likely to run against the database. If you don't have any
real queries, just write simple SQL that collects counts on each of your tables. Alternatively, you
can skip this step.

Create an Empty Database
Two options are available for creating an empty database:
l Using the Management Console
l Using Administration Tools
Creating a Database Name and Password
Database name must conform to the following rules:
l Be between 1-30 characters
l Begin with a letter
l Follow with any combination of letters (upper and lowercase), numbers, and/or underscores.
Database names are case sensitive; however, HP strongly recommends that you do not create
databases with the same name that uses different case; for example, do not create a database
called mydatabase and another database called MyDataBase.
Database Passwords
Database passwords may contain letters, digits, and certain special characters; however, no non-
ASCII Unicode characters may be used. The following table lists special (ASCII) characters that
HP Vertica permits in database passwords. Special characters can appear anywhere within a
password string; for example, mypas$word or $mypassword or mypassword$ are all permitted.
Caution: Using special characters in database passwords that are not listed in the following
table could cause database instability.
Character Description
# pound sign
! exclamation point
+ plus sign
* asterisk
? question mark
, comma
. period

/ forward slash
= equals sign
~ tilde
- minus sign
$ dollar sign
_ underscore
: colon
space
" double quote
' single quote
% percent sign
& ampersand
( parenthesis
) parenthesis
; semicolon
< less than sign
> greater than sign
@ at sign
` back quote
[ square bracket
] square bracket
backslash
^ caret
| vertical bar
{ curly bracket
} curly bracket

See Also
l Password Guidelines
Create an Empty Database Using MC
You can create a new database on an existing HP Vertica cluster through the Management
Console interface.
Database creation can be a long-running process, lasting from minutes to hours, depending on the
size of the target database. You can close the web browser during the process and sign back in to
MC later; the creation process continues unless an unexpected error occurs. See the Notes
section below the procedure on this page.
You currently need to use command line scripts to define the database schema and load data.
Refer to the topics in Configuration Procedure. You should also run the Database Designer, which
you access through the Administration Tools, to create either a comprehensive or incremental
design. Consider using the Tutorial in the Getting Started Guide to create a sample database you
can start monitoring immediately.
How to Create an Empty Database on an MC-managed Cluster
1. If you are already on the Databases and Clusters page, skip to the next step; otherwise:
a. Connect to MC and sign in as an MC administrator.
b. On the Home page, click the Databases and Clusters task.
2. If no databases exist on the cluster, continue to the next step; otherwise:
a. If a database is running on the cluster on which you want to add a new database, select the
database and click Stop.
b. Wait for the running database to have a status of Stopped.
3. Click the cluster on which you want to create the new database and click Create Database.
4. The Create Database wizard opens. Provide the following information:
n Database name and password. See Creating a Database Name and Password for rules.
n Optionally click Advanced to open the advanced settings and change the port and catalog,
data, and temporary data paths. By default the MC application/web server port is 5450 and
paths are /home/dbadmin, or whatever you defined for the paths when you ran the Cluster
Creation Wizard or the install_vertica script. Do not use the default agent port 5444 as a
new setting for the MC port. See MC Settings > Configuration for port values.
5. Click Continue.

6. Select nodes to include in the database.
The Database Configuration window opens with the options you provided and a graphical
representation of the nodes appears on the page. By default, all nodes are selected to be part of
this database (denoted by a green check mark). You can optionally click each node and clear
Include host in new database to exclude that node from the database. Excluded nodes are
gray. If you change your mind, click the node and select the Include check box.
7. Click Create in the Database Configuration window to create the database on the nodes.
The creation process takes a few moments, after which the database starts and a Success
message appears on the interface.
8. Click OK to close the success message.
MC's Manage page opens and displays the database nodes. Nodes not included in the database
are colored gray, which means they are standby nodes you can include later. To add nodes to or
remove nodes from your HP Vertica cluster, which are not shown in standby mode, you must run
the install_vertica script.
Notes
l If warnings occur during database creation, nodes will be marked on the UI with an Alert icon
and a message.
n Warnings do not prevent the database from being created, but you should address warnings
after the database creation process completes by viewing the database Message Center
from the MC Home page.
n Failure messages display on the database Manage page with a link to more detailed
information and a hint with an actionable task that you must complete before you can
continue. Problem nodes are colored red for quick identification.
n To view more detailed information about a node in the cluster, double-click the node from the
Manage page, which opens the Node Details page.
l To create MC users and grant them access to an MC-managed database, see About MC Users
and Creating an MC User.
See Also
l Creating a Cluster Using MC
l Troubleshooting Management Console
l Restarting MC

Create a Database Using Administration Tools
1. Run the Administration Tools from your Administration Host as follows:
$ /opt/vertica/bin/admintools
If you are using a remote terminal application, such as PuTTY or a Cygwin bash shell, see
Notes for Remote Terminal Users.
2. Accept the license agreement and specify the location of your license file. See Managing
Licenses for more information.
This step is necessary only if it is the first time you have run the Administration Tools
3. On the Main Menu, click Configuration Menu, and click OK.
4. On the Configuration Menu, click Create Database, and click OK.
5. Enter the name of the database and an optional comment, and click OK.
6. Establish the superuser password for your database.
n To provide a password enter the password and click OK. Confirm the password by entering
it again, and then click OK.
n If you don't want to provide the password, leave it blank and click OK. If you don't set a
password, HP Vertica prompts you to verify that you truly do not want to establish a
superuser password for this database. Click Yes to create the database without a password
or No to establish the password.
Caution: If you do not enter a password at this point, the superuser password is set to
empty. Unless the database is for evaluation or academic purposes, HP strongly
recommends that you enter a superuser password. See Creating a Database Name and
Password for guidelines.
7. Select the hosts to include in the database from the list of hosts specified when HP Vertica
was installed (install_vertica -s), and click OK.
8. Specify the directories in which to store the data and catalog files, and click OK.
Note: Do not use a shared directory for more than one node. Data and catalog directories
must be distinct for each node. Multiple nodes must not be allowed to write to the same
data or catalog directory.
9. Catalog and data pathnames must contain only alphanumeric characters and cannot have

leading spaces. Failure to comply with these restrictions results in database creation failure.
For example:
Catalog pathname: /home/dbadmin
Data Pathname: /home/dbadmin
10. Review the Current Database Definition screen to verify that it represents the database you
want to create, and then click Yes to proceed or No to modify the database definition.
11. If you click Yes, HP Vertica creates the database you defined and then displays a message to
indicate that the database was successfully created.
Note: : For databases created with 3 or more nodes, HP Vertica automatically sets K-
safety to 1 to ensure that the database is fault tolerant in case a node fails. For more
information, see the Failure Recovery in the Administrator's Guide and MARK_DESIGN_
KSAFE in the SQL Reference Manual.
12. Click OK to acknowledge the message.
If you receive an error message, see Startup Problems.
Create the Logical Schema
1. Connect to the database.
In the Administration Tools Main Menu, click Connect to Database and click OK.
See Connecting to the Database for details.
The vsql welcome script appears:

Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: h or ? for help with vsql commands
g or terminate with semicolon to execute query
q to quit
=>
2. Run the logical schema script
Using the i meta-command in vsql to run the SQL logical schema script that you prepared
earlier.
3. Disconnect from the database
Use the q meta-command in vsql to return to the Administration Tools.
Perform a Partial Data Load
HP recommends that for large tables, you perform a partial data load and then test your database
before completing a full data load. This load should load a representative amount of data.
1. Load the small tables.
Load the small table data files using the SQL load scripts and data files you prepared earlier.
2. Partially load the large tables.
Load 10GB to 50GB of table data for each table using the SQL load scripts and data files that
you prepared earlier.
For more information about projections, see Physical Schema in the Concepts Guide.
Test the Database
Test the database to verify that it is running as expected.
Check queries for syntax errors and execution times.
1. Use the vsql timing meta-command to enable the display of query execution time in
milliseconds.
2. Execute the SQL sample query script that you prepared earlier.
3. Execute several ad hoc queries.

Optimize Query Performance
Optimizing the database consists of optimizing for compression and tuning for queries. (See
Creating a Database Design.)
To optimize the database, use the Database Designer to create and deploy a design for optimizing
the database. See the Tutorial in the Getting Started Guide for an example of using the Database
Designer to create a Comprehensive Design.
After you have run the Database Designer, use the techniques described in Optimizing Query
Performance in the Programmer's Guide to improve the performance of certain types of queries.
Note: The database response time depends on factors such as type and size of the application
query, database design, data size and data types stored, available computational power, and
network bandwidth. Adding nodes to a database cluster does not necessarily improve the
system response time for every query, especially if the response time is already short, e.g.,
less then 10 seconds, or the response time is not hardware bound.
Complete the Data Load
To complete the load:
1. Monitor system resource usage
Continue to run the top, free, and df utilities and watch them while your load scripts are
running (as described in Monitoring Linux Resource Usage). You can do this on any or all
nodes in the cluster. Make sure that the system is not swapping excessively (watch kswapd in
top) or running out of swap space (watch for a large amount of used swap space in free).
Note: HP Vertica requires a dedicated server. If your loader or other processes take up
significant amounts of RAM, it can result in swapping.
2. Complete the large table loads
Run the remainder of the large table load scripts.
Test the Optimized Database
Check query execution times to test your optimized design:
1. Use the vsql timing meta-command to enable the display of query execution time in
milliseconds.
Execute a SQL sample query script to test your schema and load scripts for errors.

Note: Include a sample of queries your users are likely to run against the database. If you
don't have any real queries, just write simple SQL that collects counts on each of your
tables. Alternatively, you can skip this step.
2. Execute several ad hoc queries
a. Run Administration Tools and select Connect to Database.
b. Use the i meta-command to execute the query script; for example:
vmartdb=> i vmart_query_03.sql customer_name | annual_income
------------------+---------------
James M. McNulty | 999979
Emily G. Vogel | 999998
(2 rows)
Time: First fetch (2 rows): 58.411 ms. All rows formatted: 58.448 ms
vmartdb=> i vmart_query_06.sql
store_key | order_number | date_ordered
-----------+--------------+--------------
45 | 202416 | 2004-01-04
113 | 66017 | 2004-01-04
121 | 251417 | 2004-01-04
24 | 250295 | 2004-01-04
9 | 188567 | 2004-01-04
166 | 36008 | 2004-01-04
27 | 150241 | 2004-01-04
148 | 182207 | 2004-01-04
198 | 75716 | 2004-01-04
(9 rows)
Time: First fetch (9 rows): 25.342 ms. All rows formatted: 25.383 ms
Once the database is optimized, it should run queries efficiently. If you discover queries that you
want to optimize, you can modify and update the design. See Incremental Design in the
Administrator's Guide.
Set Up Incremental (Trickle) Loads
Once you have a working database, you can use trickle loading to load new data while concurrent
queries are running.
Trickle load is accomplished by using the COPY command (without the DIRECT keyword) to load
10,000 to 100,000 rows per transaction into the WOS. This allows HP Vertica to batch multiple
loads when it writes data to disk. While the COPY command defaults to loading into the WOS, it
will write ROS if the WOS is full.
See Trickle Loading Data for details.

See Also
l COPY
l Loading Data Through ODBC

Implement Locales for International Data Sets
The locale is a parameter that defines the user's language, country, and any special variant
preferences, such as collation. HP Vertica uses the locale to determine the behavior of various
string functions as well for collation for various SQL commands that require ordering and
comparison; for example, GROUP BY, ORDER BY, joins, the analytic ORDER BY clause, and so
forth.
By default, the locale for the database is en_US@collation=binary (English US). You can
establish a new default locale that is used for all sessions on the database, as well as override
individual sessions with different locales. Additionally the locale can be set through ODBC, JDBC,
and ADO.net.
ICU Locale Support
HP Vertica uses the ICU library for locale support; thus, you must specify locale using the ICU
Locale syntax. While the locale used by the database session is not derived from the operating
system (through the LANG variable), HP Vertica does recommend that you set the LANG
appropriately for each node running vsql, as described in the next section.
While ICU library services can specify collation, currency, and calendar preferences, HP Vertica
supports only the collation component. Any keywords not relating to collation are rejected.
Projections are always collated using the en_US@collation=binary collation regardless of the
session collation. Any locale-specific collation is applied at query time.
The SET DATESTYLE TO ... command provides some aspects of the calendar, but HP Vertica
supports only dollars as currency.
Changing DB Locale for a Session
This examples sets the session locale to Thai.
1. At the OS level for each node running vsql, set the LANG variable to the locale language as
follows:
export LANG=th_TH.UTF-8
Note: If setting the LANG= as noted does not work, OS support for locales may not be
installed.
2. For each HP Vertica session (from ODBC/JDBC or vsql) set the language locale.
From vsql:

locale th_TH
3. From ODBC/JDBC:
"SET LOCALE TO th_TH;"
4. In PUTTY (or ssh terminal), change the settings as follows:
settings > window > translation > UTF-8
5. Click Apply, and Save.
All data being loaded must be in UTF-8 format, not an ISO format, as described in Loading UTF-8
Format Data. Character sets like ISO 8859-1 (Latin1), which are incompatible with UTF-8 are not
supported, so functions like substring do not work correctly for multi-byte characters. Thus, ISO
settings for locale should NOT work correctly. If the translation setting ISO-8859-11:2001
(Latin/Thai) works, the data is loaded incorrectly. To convert data correctly, use a utility program
such as Linux iconv (see the man page).
Note: The maximum length parameter for VARCHAR and CHAR data type refers to the
number of octets (bytes) that can be stored in that field and not number of characters. When
using multi-byte UTF-8 characters, size fields to accommodate from 1 to 4 bytes per character,
depending on the data
See Also
l Supported Locales
l About Locales
l SET LOCALE
l ICU User Guide
Specify the Default Locale for the Database
The default locale configuration parameter sets the initial locale for every database session once
the database has been restarted. Sessions may override this value.
To set the local for the database, use the configuration parameter as follows:
SELECT SET_CONFIG_PARAMETER('DefaultSessionLocale' ,
'<ICU-locale-identifier>');
For example:

mydb=> SELECT SET_CONFIG_PARAMETER('DefaultSessionLocale','en_GB');
SET_CONFIG_PARAMETER
----------------------------
Parameter set successfully
(1 row)
Override the Default Locale for a Session
To override the default locale for a specific session, use one of the following commands:
l The vsql command locale <ICU-locale-identifier>.
For example:
locale en_GBINFO: Locale: 'en_GB'
INFO: English (United Kingdom)
INFO: Short form: 'LEN'
l The statement SET LOCALE TO <ICU-locale-identifier>.
SET LOCALE TO en_GB;SET LOCALE TO en_GB;
INFO: Locale: 'en_GB'
INFO: English (United Kingdom)
You can also use the Short Form of a locale in either of these commands:
SET LOCALE TO LEN;INFO: Locale: 'en'
INFO: English
locale LEN
INFO: Locale: 'en'
INFO: English
You can use these commands to override the locale as many times as needed within a session.
The session locale setting applies to any subsequent commands issued in the session.
See Also
l SET LOCALE
Best Practices for Working with Locales
It is important to understand the distinction between the locale settings on the database server and
locale settings at the client application level. The server locale settings impact only the collation
behavior for server-side query processing. The client application is responsible for ensuring that the

correct locale is set in order to display the characters correctly. Below are the best practices
recommended by HP to ensure predictable results:
Server Locale
Server session locale should be set using the set as described in Specify the Default Locale for the
Database. If using different locales in different session, set the server locale at the start of each
session from your client.
vsql Client
l If there is no default session locale at database level, the server locale for the session should be
set to the desired locale, as described in Override the Default Locale for a Session.
l The locale setting in the terminal emulator where vsql client is run should be set to be equivalent
to session locale setting on server side (ICU locale) so data is collated correctly on the server
and displayed correctly on the client.
l All input data for vsql should be in UTF-8 and all output data is encoded in UTF-8
l Non UTF-8 encodings and associated locale values should not be used because they are not
supported.
l Refer to the documentation of your terminal emulator for instructions on setting locale and
encoding.
ODBC Clients
l ODBC applications can be either in ANSI or Unicode mode. If Unicode, the encoding used by
ODBC is UCS-2. If the user application is ANSI, the data must be in single-byte ASCII, which is
compatible with UTF-8 used on the database server. The ODBC driver converts UCS-2 to UTF-
8 when passing to the HP Vertica server and converts data sent by the HP Vertica server from
UTF-8 to UCS-2.
l If the user application is not already in UCS-2, the application is responsible for converting the
input data to UCS-2, or unexpected results could occur. For example:
n On non-UCS-2 data passed to ODBC APIs, when it is interpreted as UCS-2, it could result in
an invalid UCS-2 symbol being passed to the APIs, resulting in errors.
n The symbol provided in the alternate encoding could be a valid UCS-2 symbol; in this case,
incorrect data is inserted into the database.
l If there is no default session locale at database level, ODBC applications should set the desired
server session locale using SQLSetConnectAttr (if different from database wide setting) in
order to get expected collation and string functions behavior on the server.

JDBC and ADO.NET Clients
l JDBC and ADO.NET applications use a UTF-16 character set encoding and are responsible for
converting any non-UTF-16 encoded data to UTF-16. The same cautions apply as for ODBC if
this encoding is violated.
l The JDBC and ADO.NET drivers convert UTF-16 data to UTF-8 when passing to the HP
Vertica server and convert data sent by HP Vertica server from UTF-8 to UTF-16.
l If there is no default session locale at the database level, JDBC and ADO.NET applications
should set the correct server session locale by executing the SET LOCALE TO command in
order to get expected collation and string functions behavior on the server. See the SET
LOCALE command in the SQL Reference Manual.
Notes and Restrictions
Session related:
l The locale setting is session scoped and applies to queries only (no DML/DDL) run in that
session. You cannot specify a locale for an individual query.
l The default locale for new sessions can be set using a configuration parameter
Query related:
The following restrictions apply when queries are run with locale other than the default en_
US@collation=binary:
l Multicolumn NOT IN subqueries are not supported when one or more of the left-side NOT IN
columns is of CHAR or VARCHAR data type. For example:
=> CREATE TABLE test (x VARCHAR(10), y INT);
=> SELECT ... FROM test WHERE (x,y) NOT IN (SELECT ...);
ERROR: Multi-expression NOT IN subquery is not supported because a left hand expres
sion could be NULL
Note: An error is reported even if columns test.x and test.y have a "NOT NULL"
constraint.
l Correlated HAVING clause subqueries are not supported if the outer query contains a GROUP
BY on a CHAR or a VARCHAR column. In the following example, the GROUP BY x in the outer
query causes the error:
=> DROP TABLE test CASCADE;

=> CREATE TABLE test (x VARCHAR(10));
=> SELECT COUNT(*) FROM test t GROUP BY x HAVING x
IN (SELECT x FROM test WHERE t.x||'a' = test.x||'a' );
ERROR: subquery uses ungrouped column "t.x" from outer query
l Subqueries that use analytic functions in the HAVING clause are not supported. For example:
=> DROP TABLE test CASCADE;
=> CREATE TABLE test (x VARCHAR(10));
=> SELECT MAX(x)OVER(PARTITION BY 1 ORDER BY 1)
FROM test GROUP BY x HAVING x IN (SELECT MAX(x) FROM test);
ERROR: Analytics query with having clause expression that involves aggregates and subq
uery is not supported
DML/DDL related:
l SQL identifiers (such as table names, column names, and so on) can use UTF-8 Unicode
characters. For example, the following CREATE TABLE statement uses the ß (German eszett)
in the table name:
=> CREATE TABLE straße(x int, y int);
CREATE TABLE
l Projection sort orders are made according to the default en_US@collation=binary collation.
Thus, regardless of the session setting, issuing the following command creates a projection
sorted by col1 according to the binary collation:
=> CREATE PROJECTION p1 AS SELECT * FROM table1 ORDER BY col1;
Note that in such cases, straße and strasse would not be near each other on disk.
Sorting by binary collation also means that sort optimizations do not work in locales other than
binary. HP Vertica returns the following warning if you create tables or projections in a non-
binary locale:
WARNING: Projections are always created and persisted in the default HP Vertica local
e. The current locale is de_DE
l When creating pre-join projections, the projection definition query does not respect the locale or
collation setting. This means that when you insert data into the fact table of a pre-join projection,
referential integrity checks are not locale or collation aware.
For example:

locale LDE_S1 -- German
=> CREATE TABLE dim (col1 varchar(20) primary key);
=> CREATE TABLE fact (col1 varchar(20) references dim(col1));
=> CREATE PROJECTION pj AS SELECT * FROM fact JOIN dim
ON fact.col1 = dim.col1 UNSEGMENTED ALL NODES;
=> INSERT INTO dim VALUES('ß');
=> COMMIT;
The following INSERT statement fails with a "nonexistent FK" error even though 'ß' is in the dim
table, and in the German locale 'SS' and 'ß' refer to the same character.
=> INSERT INTO fact VALUES('SS');
ERROR: Nonexistent foreign key value detected in FK-PK join (fact x dim)
using subquery and dim_node0001; value SS
=> => ROLLBACK;
=> DROP TABLE dim, fact CASCADE;
l When the locale is non-binary, the collation function is used to transform the input to a binary
string which sorts in the proper order.
This transformation increases the number of bytes required for the input according to this
formula:
result_column_width = input_octet_width * CollationExpansion + 4
CollationExpansion defaults to 5.
l CHAR fields are displayed as fixed length, including any trailing spaces. When CHAR fields are
processed internally, they are first stripped of trailing spaces. For VARCHAR fields, trailing
spaces are usually treated as significant characters; however, trailing spaces are ignored when
sorting or comparing either type of character string field using a non-BINARY locale.
Change Transaction Isolation Levels
By default, HP Vertica uses the READ COMMITTED isolation level for every session. If you prefer,
you can change the default isolation level for the database or for a specific session.
To change the isolation level for a specific session, use the SET SESSION CHARACTERISTICS
command.
To change the isolation level for the database, use the TransactionIsolationLevel configuration
parameter. Once modified, HP Vertica uses the new transaction level for every new session.
The following examples set the default isolation for the database to SERIALIZABLE and then back
to READ COMMITTED:
=> SELECT SET_CONFIG_PARAMETER('TransactionIsolationLevel','SERIALIZABLE');
=> SELECT SET_CONFIG_PARAMETER('TransactionIsolationLevel','READ COMMITTED');

Notes
l A change to isolation level only applies to future sessions. Existing sessions and their
transactions continue to use the original isolation level.
A transaction retains its isolation level until it completes, even if the session's transaction
isolation level changes mid-transaction. HP Vertica internal processes (such as the Tuple
Mover and refresh operations) and DDL operations are always run at SERIALIZABLE isolation
level to ensure consistency.
See Also
l Transactions

Configuration Parameters
You can modify certain parameters to configure your HP Vertica database using one of the
following options:
l Dynamically through the Management Console browser-based interface
l At the command line directly
l From vsql
Important: Before you modify a database parameter, review all documentation about the
parameter to determine the context under which you can change it. Parameter changes take
effect after you restart the database.
Configuring HP Vertica Settings Using MC
To change database settings for any MC-managed database, click the Settings tab at the bottom
of the Overview, Activity, or Manage pages. The database must be running.
The Settings page defaults to parameters in the General category. To change other parameters,
click an option from the tab panel on the left.
Some settings require that you restart the database, and MC will prompt you to do so. You can
ignore the prompt, but those changes will not take effect until after you restart the database.
If you want to change settings that are specific to Management Console, such as change MC or
agent port assignments, see Managing MC Settings for more information.

See Also

Configuring HP Vertica At the Command Line
The tables in this section list parameters for configuring HP Vertica at the command line.
General Parameters
The following table describes the general parameters for configuring HP Vertica.
Parameters Default Description
AnalyzeRowCountInterval 60 seconds Automatically runs every 60 seconds
to collect the number of rows in the
projection and aggregates row counts
calculated during loads. See
Collecting Statistics.
CompressCatalogOnDisk 0 When enabled (set value to 1 or 2)
compresses the size of the catalog on
disk. Values can be:
l 1—Compress checkpoints, but not
logs
l 2—Compress checkpoints and
logs
Consider enabling this parameter if the
catalog disk partition is small (<50GB)
and the metadata is large (hundreds of
tables, partitions, or nodes).
CompressNetworkData 0 When enabled (set to value 1), HP
Vertica will compress all of the data it
sends over the network. This speeds
up network traffic at the expense of
added CPU load. You can enable this
if you find that the network is throttling
your database performance.
EnableCooperativeParse 1 Enabled by default. Implements multi-
threaded cooperative parsing
capabilities for delimited and fixed-
width loads.
EnableResourcePoolCPUAffinity 1 Enabled by default. When disabled
(set value to 0), queries run on any
CPU, regardless of the CPU_
AFFINITY_SET of the resource pool.

ExternalTablesExceptionsLimit 100 Determines the maximum number of
COPY exceptions and rejections that
can occur when a SELECT statement
references an external table. Setting
this parameter to -1 removes any
exceptions limit. For more information,
see Validating External Tables.
FencedUDxMemoryLimitMB -1 Sets the maximum amount of memory
(in MB) that a fenced-mode UDF can
use. Any UDF that attempts to
allocate more memory than this limit
triggers an exception. When set to -1,
there is no limit on the amount of
memory a UDF can allocate. For more
information, see Fenced Mode in the
Programmer's Guide.
FlexTableDataTypeGuessMultipli
er
2.0 Specifies the multiplier to use for a
key value when creating a view for a
flex keys table. See Specifying
Unstructured Parameters.
FlexTableRawSize 130000 The default value (in bytes) of the __
raw__ column size of flex table. The
maximum value of is 32000000. You
can change the default value as with
other configuration parameters, or
update the __raw__ column size on a
per-table basis using ALTER TABLE
once an unstructured table exists. See
Specifying Unstructured Parameters.
JavaBinaryForUDx The full path to the Java executable
that HP Vertica uses to execute Java
UDxs. See Installing Java on HP
Vertica Hosts in the Programmer's
Guide.
JavaClassPathForUDx ${vertica_home}
/packages/hcat/li
b/*
Sets the Java classpath for the
JVM that executes Java UDxs. This
parameter must list all directories
containing JAR files that Java UDxs
import. See Handling Java UDx
Dependencies in the Programmer's
Guide.

MaxAutoSegColumns 32 Specifies the number of columns (0 -
1024) to segment automatically when
creating auto-projections from COPY
and INSERT INTO statements. Setting
this parameter to zero (0) indicates to
use all columns in the hash
segmentation expression.
MaxClientSessions 50 Determines the maximum number of
client sessions that can run on a
single node of the database. The
default value allows for 5 additional
administrative logins, which prevents
DBAs from being locked out of the
system if the limit is reached by non-
dbadmin users.
Tip: Setting this parameter to 0 is
useful for preventing new client
sessions from being opened while you
are shutting down the database. Be
sure to restore the parameter to its
original setting once you've restarted
the database. See the section
"Interrupting and Closing Sessions" in
Managing Sessions.
PatternMatchAllocator 0 If set to 1, overrides the heap memory
allocator for the Perl Compatible
Regular Expressions (PCRE) pattern-
match library, which evaluates regular
expressions. You must restart the
database for this parameter to take
effect. For more information, see
Regular Expression Functions.
PatternMatchStackAllocator 1 If set to 1, overrides the stack memory
allocator for the Perl Compatible
Regular Expressions (PCRE) pattern-
match library, which evaluates regular
expressions. You must restart the
database for this parameter to take
effect. For more information, see
Regular Expression Functions.

SegmentAutoProjection 1 Determines whether auto-projections
are segmented by default. Setting this
parameter to zero (0) disables the
feature.
TransactionIsolationLevel READ
COMMITTED
Changes the isolation level for the
database. Once modified, HP Vertica
uses the new transaction level for
every new session. Existing sessions
and their transactions continue to use
the original isolation level. See
Change Transaction Isolation Levels.
TransactionMode READ WRITE Controls whether transactions are
read/write or read-only. Read/write is
the default. Existing sessions and
their transactions continue to use the
original isolation level.
Setting Configuration Parameters
You can set a new value for a configuration parameter with a select statement as follows. These
examples illustrate changing the parameters listed in the table:
SELECT SET_CONFIG_PARAMETER ('AnalyzeRowCountInterval',3600);
SELECT SET_CONFIG_PARAMETER ('CompressNetworkData',1);
SELECT SET_CONFIG_PARAMETER ('ExternalTablesExceptionsLimit',-1);
SELECT SET_CONFIG_PARAMETER ('MaxClientSessions', 100);
SELECT SET_CONFIG_PARAMETER ('PatternMatchAllocator',1);
SELECT SET_CONFIG_PARAMETER ('PatternMatchStackAllocator',0);
SELECT SET_CONFIG_PARAMETER ('TransactionIsolationLevel','SERIALIZABLE');
SELECT SET_CONFIG_PARAMETER ('TransactionMode','READ ONLY');
SELECT SET_CONFIG_PARAMETER ('FlexTableRawSize',16000000);
SELECT SET_CONFIG_PARAMETER ('CompressCatalogOnDisk',2);
Tuple Mover Parameters
These parameters control how the Tuple Mover operates.

Parameters Description
ActivePartitionCount Sets the number of partitions, called active partitions, that are
currently being loaded. For information about how the Tuple
Mover treats active (and inactive) partitions during a
mergeout operation, see Understanding the Tuple Mover.
Default Value: 1
Example:
SELECT SET_CONFIG_PARAMETER
('ActivePartitionCount', 2);
MergeOutInterval The number of seconds the Tuple Mover waits between
checks for new ROS files to merge out. If ROS containers
are added frequently, you may need to decrease this value.
Default Value: 600
Example:
('MergeOutInterval', 1200);
MoveOutInterval The number of seconds the Tuple mover waits between
checks for new data in the WOS to move to ROS.
Default Value: 300
Example:
('MoveOutInterval', 600);
MoveOutMaxAgeTime The specified interval (in seconds) after which the tuple
mover is forced to write the WOS to disk. The default interval
is 30 minutes.
Tip: If you had been running the force_moveout.sh script in
previous releases, you no longer need to run it.
Default Value: 1800
Example:
('MoveOutMaxAgeTime', 1200);
MoveOutSizePct The percentage of the WOS that can be filled with data before
the Tuple Mover performs a moveout operation.
Default Value: 0
Example:
('MoveOutSizePct', 50);

Epoch Management Parameters
The following table describes the epoch management parameters for configuring HP Vertica.
AdvanceAHMInterval Determines how frequently (in seconds) HP Vertica checks
the history retention status. By default the AHM interval is
set to 180 seconds (3 minutes).
Note: AdvanceAHMInterval cannot be set to a value less
than the EpochMapInterval.
Default Value: 180
Example:
('AdvanceAHMInterval', '3600');
EpochMapInterval Determines the granularity of mapping between epochs and
time available to historical queries. When a historical
queries AT TIME T is issued, HP Vertica maps it to an
epoch within a granularity of EpochMapInterval seconds. It
similarly affects the time reported for Last Good Epoch
during Failure Recovery. Note that it does not affect
internal precision of epochs themselves.
By default, EpochMapInterval is set to 180 seconds (3
minutes).
Tip: Decreasing this interval increases the number of
epochs saved on disk. Therefore, you might want to reduce
the HistoryRetentionTime parameter to limit the number of
history epochs that HP Vertica retains.
Default Value: 180
Example:
('EpochMapInterval', '300');

HistoryRetentionTime Determines how long deleted data is saved (in seconds) as
a historical reference. The default is 0, which means that
HP Vertica saves historical data only when nodes are
down. Once the specified time has passed since the
delete, the data is eligible to be purged. Use the -1 setting if
you prefer to use HistoryRetentionEpochs for
determining which deleted data can be purged.
Note: The default setting of 0 effectively prevents the use
of the Administration Tools 'Roll Back Database to Last
Good Epoch' option because the AHM remains close to the
current epoch and a rollback is not permitted to an epoch
prior to the AHM.
Tip: If you rely on the Roll Back option to remove recently
loaded data, consider setting a day-wide window for
removing loaded data; for example:
('HistoryRetentionTime', '86400');
Default Value: 0
Example:
('HistoryRetentionTime', '240');
HistoryRetentionEpochs Specifies the number of historical epochs to save, and
therefore, the amount of deleted data.
Unless you have a reason to limit the number of epochs,
HP recommends that you specify the time over which
delete data is saved. The -1 setting disables this
configuration parameter.
If both History parameters are specified,
HistoryRetentionTime takes precedence, and if both
parameters are set to -1, all historical data is preserved.
See Setting a Purge Policy.
Default Value: -1
Example:
('HistoryRetentionEpochs','40');
Monitoring Parameters
The following table describes the monitoring parameters for configuring HP Vertica.

SnmpTrapDestinationsList Defines where HP Vertica send traps for
SNMP. See Configuring Reporting for
SNMP.
Default Value: none
Example:
('SnmpTrapDestinationsList',
'localhost 162 public' );
SnmpTrapsEnabled Enables event trapping for SNMP. See
Configuring Reporting for SNMP.
Default Value: 0
Example:
('SnmpTrapsEnabled', 1 );
SnmpTrapEvents Define which events HP Vertica traps
through SNMP. See Configuring Reporting
for SNMP.
Default Value:Low Disk Space, Read
Only File System, Loss of K Safety,
Current Fault Tolerance at Critical Level,
Too Many ROS Containers, WOS Over
Flow, Node State Change, Recovery
Failure, and Stale Checkpoint
Example:
('SnmpTrapEvents', 'Low Disk
Space, Recovery Failure');
SyslogEnabled Enables event trapping for syslog. See
Configuring Reporting for Syslog.
Default Value: 0
Example:
('SyslogEnabled', 1 );

SyslogEvents Defines events that generate a syslog
entry. See Configuring Reporting for
Syslog.
Default Value: none
Example:
('SyslogEvents', 'Low Disk
Space, Recovery Failure');
SyslogFacility Defines which SyslogFacility HP Vertica
uses. See Configuring Reporting for
Syslog.
Default Value: user
Example:
('SyslogFacility' , 'ftp');
Profiling Parameters
The following table describes the profiling parameters for configuring HP Vertica. See Profiling
Database Performance for more information on profiling queries.
GlobalEEProfiling Enables profiling for query execution runs in all sessions,
on all nodes.
Default Value: 0
Example:
('GlobalEEProfiling',1);
GlobalQueryProfiling Enables query profiling for all sessions on all nodes.
Default Value: 0
Example:
('GlobalQueryProfiling',1);
GlobalSessionProfiling Enables session profiling for all sessions on all nodes.
Default Value: 0
Example:
('GlobalSessionProfiling',1);

Security Parameters
The following table describes the parameters for configuring the client authentication method and
enabling SSL for HP Vertica.
ClientAuthentication Configures client authentication. By default, HP Vertica uses user name
and password (if supplied) to grant access to the database.
The preferred method for establishing client authentication is to use the
Administration Tools. See Implementing Client Authentication and How
to Create Authentication Records.
Default Value: local all trust
Example:
SELECT SET_CONFIG_PARAMETER ('ClientAuthentication',
'hostnossl dbadmin 0.0.0.0/0 trust');
EnableSSL Configures SSL for the server. See Implementing SSL.
Default Value:0
Example:
SELECT SET_CONFIG_PARAMETER('EnableSSL', '1');
See Also
Kerberos Authentication Parameters
Database Designer Parameters
The following table describes the parameters for configuring the HP Vertica Database Designer.
Parameter Description
DBDCorrelationSampleRow
Count
Minimum number of table rows at which Database Designer
discovers and records correlated columns.
Default value: 4000
Example:
SELECT SET_CONFIGURATION_PARAMETER ('DBDCorrelationSampleRowCo
unt', 3000);
Internationalization Parameters
The following table describes the internationalization parameters for configuring HP Vertica.

DefaultIntervalStyle Sets the default interval style to use. If set to 0 (default), the
interval is in PLAIN style (the SQL standard), no interval units on
output. If set to 1, the interval is in UNITS on output. This
parameter does not take effect until the database is restarted.
Default Value: 0
Example:
('DefaultIntervalStyle', 1);
DefaultSessionLocale Sets the default session startup locale for the database. This
parameter does not take effect until the database is restarted.
Default Value: en_US@collation=binary
Example:
('DefaultSessionLocale','en_GB');
EscapeStringWarning Issues a warning when back slashes are used in a string literal.
This is provided to help locate back slashes that are being treated
as escape characters so they can be fixed to follow the Standard
conforming string syntax instead.
Default Value: 1
Example:
('EscapeStringWarning','1');
StandardConformingStrings In HP Vertica 4.0, determines whether ordinary string literals ('...')
treat backslashes () as string literals or escape characters. When
set to '1', backslashes are treated as string literals, when set to '0',
back slashes are treated as escape characters.
Tip: To treat backslashes as escape characters, use the
Extended string syntax:
(E'...');
See String Literals (Character) in the SQL Reference Manual.
Default Value: 1
Example:
SELECT SET_CONFIG_PARAMETER('StandardConformingStrings'
,'0');
Data Collector Parameters
The following table lists the Data Collector parameter for configuring HP Vertica.

EnableDataCollector Enables and disables the Data Collector (the Workload Analyzer's
internal diagnostics utility) for all sessions on all nodes. Default is 1,
enabled. Use 0 to turn off data collection.
Default value: 1
Example:
SELECT SET_CONFIG_PARAMETER ('EnableDataCollector', 0);
For more information, see the following topics in the SQL Reference Manual:
l Data Collector Functions
l ANALYZE_WORKLOAD
l V_MONITOR.DATA_COLLECTOR
l V_MONITOR.TUNING_RECOMMENDATIONS
See also the following topics in the Administrator's Guide
l Retaining Monitoring Information
l Analyzing Workloads and Tuning Recommendations
l Analyzing Workloads Through Management Console and Through an API
Kerberos Authentication Parameters
The following parameters let you configure the HP Vertica principal for Kerberos authentication and
specify the location of the Kerberos keytab file.
KerberosServiceName Provides the service name portion of the HP Vertica Kerberos principal.
By default, this parameter is 'vertica'. For example:
vertica/host@EXAMPLE.COM.

KerberosHostname [Optional] Provides the instance or host name portion of the HP Vertica
Kerberos principal. For example: vertica/host@EXAMPLE.COM
Notes:
l If you omit the optional KerberosHostname parameter, HP Vertica
uses the return value from the gethostname() function. Assuming
each cluster node has a different host name, those nodes will each
have a different principal, which you must manage in that node's
keytab file.
l Consider specifying the KerberosHostname parameter to get a
single, cluster-wide principal that is easier to manage than multiple
principals.
KerberosRealm Provides the realm portion of the HP Vertica Kerberos principal. A realm
is the authentication administrative domain and is usually formed in
uppercase letters; for example: vertica/host@EXAMPLE.COM.
KerberosKeytabFile Provides the location of the keytab file that contains credentials for the
HP Vertica Kerberos principal. By default, this file is located in /etc.
For example: KerberosKeytabFile=/etc/krb5.keytab.
Notes:
l The principal must take the form
KerberosServiceName/KerberosHostName@KerberosRealm
l The keytab file must be readable by the file owner who is running
the process (typically the Linux dbadmin user assigned file
permissions 0600).
HCatalog Connector Parameters
The following table describes the parameters for configuring the HCatalog Connector. See Using
the HCatalog Connector in the Hadoop Integration Guide for more information.

HCatConnectionTimeout The number of seconds the HCatalog Connector waits for a
successful connection to the WebHCat server before returning a
timeout error. A value of 0 (the default) means wait indefinitely.
Default Value: 0
Requires Restart: No
Example:
SELECT SET_CONFIG_PARAMETER('HCatConnectionTimeout', 30);
HCatSlowTransferLimit The lowest transfer speed (in bytes per second) that the HCatalog
Connector allows when retrieving data from the WebHCat server. If
the data transfer rate from the WebHCat server to HP Vertica is
below this threshold after the number of seconds set in the
HCatSlowTransferTime parameter, the HCatalog Connector cancels
the query and closes the connection.
Default Value: 65536
Example:
SELECT SET_CONFIG_PARAMETER('HCatSlowTransferLimit', 32000);
HCatSlowTransferTime The number of seconds the HCatalog Connector waits before testing
whether the data transfer from the WebHCat server is too slow. See
the HCatSlowTransferLimit parameter.
Default Value: 60
Example:
SELECT SET_CONFIG_PARAMETER('HCatSlowTransferTime', 90);
Note: These configuration parameters can be overridden when creating an HCatalog schema.
See CREATE HCATALOG SCHEMA in the SQL Reference Manual for an explanation.

Designing a Logical Schema
Designing a logical schema for an HP Vertica database is no different than designing for any other
SQL database. A logical schema consists of objects such as Schemas, Tables, Views and
Referential Integrity constraints that are visible to SQL users. HP Vertica supports any relational
schema design of your choice.

Using Multiple Schemas
Using a single schema is effective if there is only one database user or if a few users cooperate in
sharing the database. In many cases, however, it makes sense to use additional schemas to allow
users and their applications to create and access tables in separate namespaces. For example,
using additional schemas allows:
l Many users to access the database without interfering with one another.
Individual schemas can be configured to grant specific users access to the schema and its
tables while restricting others.
l Third-party applications to create tables that have the same name in different schemas,
preventing table collisions.
Unlike other RDBMS, a schema in an HP Vertica database is not a collection of objects bound to
one user.
Multiple Schema Examples
This section provides examples of when and how you might want to use multiple schemas to
separate database users. These examples fall into two categories: using multiple private schemas
and using a combination of private schemas (i.e. schemas limited to a single user) and shared
schemas (i.e. schemas shared across multiple users).
Using Multiple Private Schemas
Using multiple private schemas is an effective way of separating database users from one another
when sensitive information is involved. Typically a user is granted access to only one schema and
its contents, thus providing database security at the schema level. Database users can be running
different applications, multiple copies of the same application, or even multiple instances of the
same application. This enables you to consolidate applications on one database to reduce
management overhead and use resources more effectively. The following examples highlight using
multiple private schemas.
l Using Multiple Schemas to Separate Users and Their Unique Applications
In this example, both database users work for the same company. One user (HRUser) uses a
Human Resource (HR) application with access to sensitive personal data, such as salaries,
while another user (MedUser) accesses information regarding company healthcare costs
through a healthcare management application. HRUser should not be able to access company
healthcare cost information and MedUser should not be able to view personal employee data.
To grant these users access to data they need while restricting them from data they should not
see, two schemas are created with appropriate user access, as follows:

n HRSchema—A schema owned by HRUser that is accessed by the HR application.
n HealthSchema—A schema owned by MedUser that is accessed by the healthcare
management application.
l Using Multiple Schemas to Support Multitenancy
This example is similar to the last example in that access to sensitive data is limited by
separating users into different schemas. In this case, however, each user is using a virtual
instance of the same application.
An example of this is a retail marketing analytics company that provides data and software as a
service (SaaS) to large retailers to help them determine which promotional methods they use are
most effective at driving customer sales.
In this example, each database user equates to a retailer, and each user only has access to its
own schema. The retail marketing analytics company provides a virtual instance of the same
application to each retail customer, and each instance points to the user’s specific schema in
which to create and update tables. The tables in these schemas use the same names because
they are created by instances of the same application, but they do not conflict because they are
in separate schemas.
Example of schemas in this database could be:
n MartSchema—A schema owned by MartUser, a large department store chain.
n PharmSchema—A schema owned by PharmUser, a large drug store chain.
l Using Multiple Schemas to Migrate to a Newer Version of an Application
Using multiple schemas is an effective way of migrating to a new version of a software
application. In this case, a new schema is created to support the new version of the software,
and the old schema is kept as long as necessary to support the original version of the software.
This is called a “rolling application upgrade.”
For example, a company might use a HR application to store employee data. The following
schemas could be used for the original and updated versions of the software:
n HRSchema—A schema owned by HRUser, the schema user for the original HR application.
n V2HRSchema—A schema owned by V2HRUser, the schema user for the new version of the
HR application.

Using Combinations of Private and Shared Schemas
The previous examples illustrate cases in which all schemas in the database are private and no
information is shared between users. However, users might want to share common data. In the
retail case, for example, MartUser and PharmUser might want to compare their per store sales of a
particular product against the industry per store sales average. Since this information is an industry
average and is not specific to any retail chain, it can be placed in a schema on which both users are
granted USAGE privileges. (For more information about schema privileges, see Schema
Privileges.)
Example of schemas in this database could be:
l MartSchema—A schema owned by MartUser, a large department store chain.
l PharmSchema—A schema owned by PharmUser, a large drug store chain.
l IndustrySchema—A schema owned by DBUser (from the retail marketing analytics company)
on which both MartUser and PharmUser have USAGE privileges. It is unlikely that retailers
would be given any privileges beyond USAGE on the schema and SELECT on one or more of its
tables.
Creating Schemas
You can create as many schemas as necessary for your database. For example, you could create a
schema for each database user. However, schemas and users are not synonymous as they are in
Oracle.

By default, only a superuser can create a schema or give a user the right to create a schema. (See
GRANT (Database) in the SQL Reference Manual.)
To create a schema use the CREATE SCHEMA statement, as described in the SQL Reference
Manual.
Specifying Objects in Multiple Schemas
Once you create two or more schemas, each SQL statement or function must identify the schema
associated with the object you are referencing. You can specify an object within multiple schemas
by:
l Qualifying the object name by using the schema name and object name separated by a dot. For
example, to specify MyTable, located in Schema1, qualify the name as Schema1.MyTable.
l Using a search path that includes the desired schemas when a referenced object is unqualified.
By Setting Search Paths, HP Vertica will automatically search the specified schemas to find the
object.
Setting Search Paths
The search path is a list of schemas where HP Vertica looks for tables and User Defined Functions
(UDFs) that are referenced without a schema name. For example, if a statement references a table
named Customers without naming the schema that contains the table, and the search path is
public, Schema1, and Schema2, HP Vertica first searches the public schema for a table named
Customers. If it does not find a table named Customers in public, it searches Schema1 and then
Schema2.
HP Vertica uses the first table or UDF it finds that matches the unqualified reference. If the table or
UDF is not found in any schema in the search path, HP Vertica reports an error.
Note: HP Vertica only searches for tables and UDFs in schemas to which the user has access
privileges. If the user does not have access to a schema in the search path, HP Vertica silently
skips the schema. It does not report an error or warning if the user's search path contains one
or more schemas to which the user does not have access privileges. Any schemas in the
search path that do not exist (for example, schemas that have been deleted since being added
to the search path) are also silently ignored.
The first schema in the search path to which the user has access is called the current schema. This
is the schema where HP Vertica creates tables if a CREATE TABLE statement does not specify a
schema name.
The default schema search path is "$user", public, v_catalog, v_monitor, v_internal.
=> SHOW SEARCH_PATH;
name | setting
-------------+---------------------------------------------------
search_path | "$user", public, v_catalog, v_monitor, v_internal
(1 row)

The $user entry in the search path is a placeholder that resolves to the current user name, and
public references the public schema. The v_catalog and v_monitor schemas contain HP
Vertica system tables, and the v_internal schema is for HP Vertica's internal use.
Note: HP Vertica always ensures that the v_catalog, v_monitor, and v_internal schemas are
part of the schema search path.
The default search path has HP Vertica search for unqualified tables first in the user’s schema,
assuming that a separate schema exists for each user and that the schema uses their user name. If
such a user schema does not exist, or if HP Vertica cannot find the table there, HP Vertica next
search the public schema, and then the v_catalog and v_monitor built-in schemas.
A database administrator can set a user's default search schema when creating the user by using
the SEARCH_PATH parameter of the CREATE USER statement. An administrator or the user can
change the user's default search path using the ALTER USER statement's SEARCH_PATH
parameter. Changes made to the default search path using ALTER USER affect new user
sessions—they do not affect any current sessions.
A user can use the SET SEARCH_PATH statement to override the schema search path for the
current session.
Tip: The SET SEARCH_PATH statement is equivalent in function to the CURRENT_
SCHEMA statement found in some other databases.
To see the current search path, use the SHOW SEARCH_PATH statement. To view the current
schema, use SELECT CURRENT_SCHEMA(). The function SELECT CURRENT_SCHEMA()
also shows the resolved name of $user.
The following example demonstrates displaying and altering the schema search path for the current
user session:
name | setting
-------------+---------------------------------------------------
search_path | "$user", PUBLIC, v_catalog, v_monitor, v_internal
(1 row)
=> SET SEARCH_PATH TO SchemaA, "$user", public;
SET
name | setting
-------------+------------------------------------------------------------
search_path | SchemaA, "$user", public, v_catalog, v_monitor, v_internal
(1 row)
You can use the DEFAULT keyword to reset the schema search path to the default.
=> SET SEARCH_PATH TO DEFAULT;SET
name | setting

-------------+---------------------------------------------------
(1 row)
To view the default schema search path for a user, query the search_path column of the V_
CATALOG.USERS system table:
=> SELECT search_path from USERS WHERE user_name = 'ExampleUser'; sear
ch_path
---------------------------------------------------
"$user", public, v_catalog, v_monitor, v_internal
(1 row)
=> ALTER USER ExampleUser SEARCH_PATH SchemaA,"$user",public;
ALTER USER
=> SELECT search_path from USERS WHERE user_name = 'ExampleUser';
search_path
------------------------------------------------------------
SchemaA, "$user", public, v_catalog, v_monitor, v_internal
(1 row)
name | setting
-------------+---------------------------------------------------
(1 row)
Note that changing the default search path has no effect ion the user's current session. Even using
the SET SEARCH_PATH DEFAULT statement does not set the search path to the newly-defined
default value. It only has an effect in new sessions.
See Also
l System Tables
Creating Objects That Span Multiple Schemas
HP Vertica supports views or pre-join projections that reference tables across multiple schemas.
For example, a user might need to compare employee salaries to industry averages. In this case,
the application would query a shared schema (IndustrySchema) for salary averages in addition to
its own private schema (HRSchema) for company-specific salary information.

Best Practice: When creating objects that span schemas, use qualified table names. This
naming convention avoids confusion if the query path or table structure within the schemas
changes at a later date.
Tables in Schemas
In HP Vertica you can create both base tables and temporary tables, depending on what you are
trying to accomplish. For example, base tables are created in the HP Vertica logical schema while
temporary tables are useful for dividing complex query processing into multiple steps.
For more information, see Creating Tables and Creating Temporary Tables.
About Base Tables
The CREATE TABLE statement creates a table in the HP Vertica logical schema. The example
databases described in the Getting Started Guide include sample SQL scripts that demonstrate
this procedure. For example:
CREATE TABLE vendor_dimension (
vendor_key INTEGER NOT NULL PRIMARY KEY,
vendor_name VARCHAR(64),
vendor_address VARCHAR(64),
vendor_city VARCHAR(64),
vendor_state CHAR(2),
vendor_region VARCHAR(32),
deal_size INTEGER,
last_deal_update DATE
);
Automatic Projection Creation
To get your database up and running quickly, HP Vertica automatically creates a default projection
for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE
statements. Each projection created automatically (or manually) includes a base projection name

prefix. You must use the projection prefix when altering or dropping a projection (ALTER
PROJECTION RENAME, DROP PROJECTION).
How you use the CREATE TABLE statement determines when the projection is created:
l If you create a table without providing the projection-related clauses, HP Vertica automatically
creates a superprojection for the table when you use an INSERT INTO or COPY statement to
load data into the table for the first time. The projection is created in the same schema as the
table. Once HP Vertica has created the projection, it loads the data.
l If you use CREATE TABLE AS SELECT to create a table from the results of a query, the table
is created first and a projection is created immediately after, using some of the properties of the
underlying SELECT query.
l (Advanced users only) If you use any of the following parameters, the default projection is
created immediately upon table creation using the specified properties:
n column-definition (ENCODING encoding-type and ACCESSRANK integer)
n ORDER BY table-column
n hash-segmentation-clause
n UNSEGMENTED { NODE node | ALL NODES }
n KSAFE
Note: Before you define a superprojection in the above manner, read Creating Custom
Designs in the Administrator's Guide.
See Also
l Creating Base Tables
l Projection Concepts
l CREATE TABLE
About Temporary Tables
A common use case for a temporary table is to divide complex query processing into multiple steps.
Typically, a reporting tool holds intermediate results while reports are generated (for example, first
get a result set, then query the result set, and so on). You can also write subqueries.
Note: The default retention when creating temporary tables is ON COMMIT DELETE ROWS,
which discards data at transaction completion. The non-default value is ON COMMIT PRESERVE
ROWS, which discards data when the current session ends.
You create temporary tables Using the CREATE TEMPORARY TABLE statement.

Global Temporary Tables
HP Vertica creates global temporary tables in the public schema, with the data contents private to
the transaction or session through which data is inserted.
Global temporary table definitions are accessible to all users and sessions, so that two (or more)
users can access the same global table concurrently. However, whenever a user commits or rolls
back a transaction, or ends the session, HP Vertica removes the global temporary table data
automatically, so users see only data specific to their own transactions or session.
Global temporary table definitions persist in the database catalogs until they are removed explicitly
through a DROP TABLE statement.
Local Temporary Tables
Local temporary tables are created in the V_TEMP_SCHEMA namespace and inserted into the user's
search path transparently. Each local temporary table is visible only to the user who creates it, and
only for the duration of the session in which the table is created.
When the session ends, HP Vertica automatically drops the table definition from the database
catalogs. You cannot preserve non-empty, session-scoped temporary tables using the ON
COMMIT PRESERVE ROWS statement.
Creating local temporary tables is significantly faster than creating regular tables, so you should
make use of them whenever possible.
Automatic Projection Creation and Characteristics
Once local or global table exists, HP Vertica creates auto-projections for temporary tables
whenever you load or insert data.
The default auto-projection for a temporary table has the following characteristics:
l It is a superprojection.
l It uses the default encoding-type AUTO.
l It is automatically unsegmented on the initiator node, if you do not specify a hash-segmentation-
clause.
l The projection is not pinned.
l Temp tables are not recoverable, so the superprojection is not K-Safe (K-SAFE=0), and you
cannot make it so.
Auto-projections are defined by the table properties and creation methods, as follows:

If table... Sort order is... Segmentation is...
Is created from input
stream (COPY or
INSERT INTO)
Same as input
stream, if sorted.
On PK column (if any), on all FK columns (if
any), on the first 31 configurable columns of
the table
Is created from CREATE
TABLE AS SELECT
query
Same as input
stream, if sorted.
If not sorted,
sorted using
following rules.
Same segmentation columns f query output is
segmented
The same as the load, if output of query is
unsegmented or unknown
Has FK and PK
constraints
FK first, then PK
columns
PK columns
Has FK constraints only
(no PK)
FK first, then
remaining columns
Small data type (< 8 byte) columns first, then
large data type columns
Has PK constraints only
(no FK)
PK columns PK columns
Has no FK or PK
constraints
On all columns Small data type (< 8 byte) columns first, then
Advanced users can modify the default projection created through the CREATE TEMPORARY TABLE
statement by defining one or more of the following parameters:
l column-definition (temp table) (ENCODING encoding-type and ACCESSRANK integer)
l ORDER BY table-column
l hash-segmentation-clause
l UNSEGMENTED { NODE node | ALL NODES }
l NO PROJECTION
Note: Before you define the superprojection in this manner, read Creating Custom Designs in
the Administrator's Guide.
See Also
l Creating Temporary Tables
l CREATE TEMPORARY TABLE

Implementing Views
A view is a stored query that dynamically accesses and computes data from the database at
execution time. It differs from a projection in that it is not materialized: it does not store data on
disk. This means that it doesn't need to be refreshed whenever the data in the underlying tables
change, but it does require additional time to access and compute data.
Views are read-only and they support references to tables, temp tables, and other views. They do
not support inserts, deletes, or updates. You can use a view as an abstraction mechanism to:
l Hide the complexity of SELECT statements from users for support or security purposes. For
example, you could create a view that selects specific columns from specific tables to ensure
that users have easy access to the information they need while restricting them from
confidential information.
l Encapsulate the details of the structure of your tables, which could change as your application
evolves, behind a consistent user interface.
Creating Views
A view contains one or more SELECT statements that reference any combination of one or more
tables, temp tables, or views. Additionally, views can specify the column names used to display
results.
The user who creates the view must be a superuser or have the following privileges:
l CREATE on the schema in which the view is created.
l SELECT on all the tables and views referenced within the view's defining query.
l USAGE on all the schemas that contain the tables and views referenced within the view's
defining query.
To create a view:
1. Use the CREATE VIEW statement to create the view.
2. Use the GRANT (View) statement to grant users the privilege to use the view.
Note: Once created, a view cannot be actively altered. It can only be deleted and recreated.
Using Views
Views can be used in the FROM clause of any SQL query or subquery. At execution, HP Vertica
internally substitutes the name of the view used in the query with the actual contents of the view.
The following example defines a view (ship) and illustrates how a query that refers to the view is
transformed internally at execution time.

l New view
=> CREATE VIEW ship AS SELECT * FROM public.shipping_dimension;
l Original query
=> SELECT * FROM ship;
l Transformed query
=> SELECT * FROM (SELECT * FROM public.shipping_dimension) AS ship;
Tip: : To use a view, a user must be granted SELECT permissions on the view. See GRANT
(View).
The following example creates a view named myview that sums all individual incomes of
customers listed in the store.store_sales_fact table by state. The results are grouped in
ascending order by state.
=> CREATE VIEW myview AS SELECT SUM(annual_income), customer_state
FROM public.customer_dimension
WHERE customer_key IN
(SELECT customer_key
FROM store.store_sales_fact)
GROUP BY customer_state
ORDER BY customer_state ASC;
The following example uses the myview view with a WHERE clause that limits the results to
combined salaries of greater than 2,000,000,000.
=> SELECT * FROM myview where sum > 2000000000;
SUM | customer_state
-------------+----------------
2723441590 | AZ
29253817091 | CA
4907216137 | CO
3769455689 | CT
3330524215 | FL
4581840709 | IL
3310667307 | IN
2793284639 | MA
5225333668 | MI
2128169759 | NV
2806150503 | PA
2832710696 | TN

14215397659 | TX
2642551509 | UT
(14 rows)
Notes
If HP Vertica does not have to evaluate an expression that would generate a run-time error in order
to answer a query, the run-time error might not occur. See the following sequence of commands for
an example of this scenario.
If you run a query like the following, HP Vertica returns an error:
=> SELECT TO_DATE('F','dd mm yyyy') FROM customer_dimension;
ERROR: Invalid input for DD: "F"
Now create a view using the same query. Note that the view gets created when you would expect it
to return the same error:
=> CREATE VIEW temp AS SELECT TO_DATE('F','dd mm yyyy')
FROM customer_dimension;
CREATE VIEW
The view, however, cannot be used in all queries without generating the same error message. For
example, the following query returns the same error, which is what you would expect:
=> SELECT * FROM temp;
ERROR: Invalid input for DD: "F"
When you then issue a COUNT command, the returned rowcount is correct:
=> SELECT COUNT(*) FROM temp;
COUNT
-------
100
(1 row)
This behavior works as intended. You might want to create views that contain subqueries, where
not every row is intended to pass the predicate.

Creating a Database Design
Data in HP Vertica is physically stored in projections. When you initially load data into a table using
INSERT, COPY (or COPY LOCAL), HP Vertica creates a default superprojection for the table. This
superprojection ensures that all of the data is available for queries. However, these
superprojections might not optimize database performance, resulting in slow query performance
and low data compression.
To improve performance, create a physical design for your database that optimizes both query
performance and data compression. You can use the Database Designer or create this design by
hand.
Database Designer is a tool that recommends the design of design (projections) that provide the
best query performance. Using Database Designer minimizes the time you spend on manual
database tuning and provides the ability to re-design the database incrementally to optimize for
changing workloads over time.
Database Designer runs as a background process. If non-superusers are running Database
Designer on, or deploying for the same tables at the same time, Database Designer may not be
able to complete.
Tip: HP recommends that you first globally optimize your database using the Comprehensive
setting in Database Designer. If the performance of the comprehensive design is not adequate,
you can design custom projections using an incremental design and manually, as described in
Creating Custom Designs.
What Is a Design?
A design is a physical storage plan that optimizes query performance. Database Designer uses
sophisticated strategies to create a design that provides excellent performance for ad-hoc queries
and specific queries while using disk space efficiently. Database Designer bases the design on the
following information that you provide:
l Design type (comprehensive or incremental)
l Optimization objective (query, load, or balanced)
l K-safety
l Design queries: Typical queries that you run during normal database operations. Each query can
be assigned a weight that indicates its relative importance so that Database Designer can
prioritize it when creating the design. Database Designer groups queries that affect the design
that Database Designer creates in the same way and considers one weighted query when
creating a design.
l Design tables that contain sample data.

l Setting that specifies that Database Designer be guided to create only unsegmented
projections.
l Setting that specifies that Database Designer analyze statistics before creating the design.
The result of a Database Designer run is:
l A design script that creates the projections for the design in a way that meets the optimization
objectives and distributes data uniformly across the cluster.
l A deployment script that creates and refreshes the projections for your design. For
comprehensive designs, the deployment script contains commands that remove non-optimized
projections. The deployment script includes the full design script.
l A backup script that contains SQL statements to deploy the design that existed on the system
before deployment. This file is useful in case you need to revert to the pre-deployment design.
While running Database Designer, you can choose to deploy your design automatically after the
deployment script is created, or to deploy it manually, after you have reviewed and tested the
design. HP Vertica recommends that you test the design on a non-production server before
deploying the design to your production server.
How Database Designer Creates a Design
During the design process, Database Designer analyzes the logical schema definition, sample
data, and sample queries, and creates a physical schema (projections) in the form of a SQL script
that you deploy automatically or manually. The script creates a minimal set of superprojections to
ensure K-safety.
In most cases, the projections that Database Designer creates provide excellent query
performance within physical constraints while using disk space efficiently. Database Designer:
l Recommends buddy projections with the same sort order, which can significantly improve
load, recovery, and site node performance. All buddy projections have the same base name so
that they can be identified as a group.
Note: If you manually create projections, Database Designer recommends a buddy with the
same sort order, if one does not already exist. By default, Database Designer recommends
both super and non-super segmented projections with a buddy of the same sort order and
segmentation.
l Automatically rebalances data after you add or remove nodes.
l Accepts queries as design input.
l Runs the design and deployment processes in the background.

This is useful if you have a large design that you want to run overnight. An active SSH session is
not required, so design/deploy operations continue to run uninterrupted, even if the session is
terminated.
l Accepts a file of sample queries for Database Designer to consider when creating a design.
Providing this file is optional for comprehensive designs. If you do not provide this file, Database
Designer recommends a generic design that does not consider specific queries. For incremental
designs, you must provide sample queries; the query file can contain up to 100 queries.
l Accepts unlimited queries for a comprehensive design.
l Allows you to analyze column correlations, which the Database Designer and query optimizer
exploit to improve data compression and query performance. Correlation analysis typically only
needs to be performed once, and only if the table has more than
DBDCorrelationSampleRowCount (default: 4000) rows.
By default, Database Designer does not analyze column correlations. To set the correlation
analyis mode, use DESIGNER_SET_ANALYZE_CORRELATIONS_MODE
l Identifies similar design queries and assigns them a signature. Of queries with the same
signature, Database Designer weights the queries depending on how many have that signature
and considers the weighted query when creating a design.
l Creates projections in a way that minimizes data skew by distributing data uniformly across the
cluster.
l Produces higher quality designs by considering UPDATE and DELETE statements, as well as
SELECT statements.
l Does not sort, segment, or partition projections on LONG VARBINARY and LONG VARCHAR
columns.
Who Can Run Database Designer
To use Administration Tools to run Database Designer and create an optimal database design, you
must be a DBADMIN user.
To run Database Designer programmatically or using Management Console, you must be one of
two types of users:
l DBADMIN user
l Have been assigned the DBDUSER role and be the owner of database tables for which you are
creating a design
Granting and Enabling the DBDUSER Role
For a non-DBADMIN user to be able to run Database Designer using Management Console, follow
the steps described in Allowing the DBDUSER to Run Database Designer Using Management

Console.
For a non-DBADMIN user to be able to run Database Designer programmatically, following the
steps described in Allowing the DBDUSER to Run Database Designer Programmatically.
Important: When you grant the DBDUSER role, make sure to associate a resource pool with
that user to manage resources during Database Designer runs. (For instructions about how to
associate a resource pool with a user, see User Profiles.)
Multiple users can run Database Designer concurrently without interfering with each other or
using up all the cluster resources. When a user runs Database Designer, either using the
Management Console or programmatically, its execution is mostly contained by the user's
resource pool, but may spill over into system resource pools for less-intensive tasks.
Allowing the DBDUSER to Run Database Designer Using
Management Console
To allow a user with the DBDUSER role to run Database Designer using Management Console,
you first need to create the user on the HP Vertica server.
As DBADMIN, take these steps on the server:
1. Add a /tmp folder to all cluster nodes.
=> SELECT ADD_LOCATION('/tmp');
2. Create the user who needs access to Database Designer.
=> CREATE USER new_user;
3. Grant the user the privilege to create schemas on the database for which they want to create a
design.
=> GRANT CREATE on new_database TO new_user;
4. Grant the DBDUSER role to the new user.
=> GRANT DBDUSER TO new_user;
5. On all nodes in the cluster, grant the user access to the /tmp folder.
=> GRANT ALL ON LOCATION '/tmp' TO new_user;
6. Grant the new user access to the database schema and its tables.

=> GRANT ALL ON SCHEMA user_schema TO new_user;
=> GRANT ALL ON ALL TABLES IN SCHEMA user_schema TO new_user;
After you have completed this task, you need to do the following to map the MC user to the new_
user you created in the previous steps:
1. Log in to Management Console as an MC Super user.
2. Click MC Settings.
3. Click User Management.
4. To create a new MC user, click Add.To use an existing MC user, select the user and click
Edit.
5. Next to the DB access level window, click Add.
6. In the Add Permissions window, do the following:
a. From the Choose a database drop-down list, select the database for which you want the
user to be able to create a design.
b. In the Database username field, enter the user name you created on the HP Vertica
server, new_user in this example.
c. In the Database password field, enter the password for the database you selected in step
a.
d. In the Restrict access drop-down list, select the level of MC user you want for this user.
7. Click OK to save your changes.
8. Log out of the MC Super user account.
The MC user is now mapped to the user that you created on the HP Vertica server. Log in as the
MC user and use Database Designer to create an optimized design for your database.
For more information about mapping MC users, see Mapping an MC User to a Database user's
Privileges.
Allowing the DBDUSER to Run Database Designer
Programmatically
To allow a user with the DBDUSER role to run Database Designer programmatically, take these
steps:

1. The DBADMIN user must grant the DBDUSER role:
=> GRANT DBDUSER TO <username>;
This role persists until the DBADMIN user revokes it.
2. For a non-DBADMIN user to run the Database Designer programmatically or using
Management Console, one of the following two steps must happen first:
n If the user's default role is already DBDUSER, skip this step. Otherwise, The user must
enable the DBDUSER role:
=> SET ROLE DBDUSER;
n The DBADMIN must add DBDUSER as the default role for that user:
=> ALTER USER <username> DEFAULT ROLE DBDUSER;
DBDUSER Capabilities and Limitations
The DBDUSER role has the following capabilities and limitations:
l A DBDUSER cannot create a design with a K-safety less than the system K-safety. If the
designs violate the current K-safet by not having enough buddy projections for the tables, the
design does not complete.
l A DBDUSER cannot explicitly change the ancient history mark (AHM), even during deployment
of their design.
When you create a design, you automatically have privileges to manipulate the design. Other tasks
may require that the DBDUSER have additional privileges:
To... DBDUSER must have...
Add design tables l USAGE privilege on the design table schema
l OWNER privilege on the design table
Add a single design query l Privilege to execute the design query
Add a file of design queries l Read privilege on the storage location that contains the
query file
l Privilege to execute all the queries in the file

To... DBDUSER must have...
Add design queries from the result
of a user query
l Privilege to execute the user query
l Privilege to execute each design query retrieved from
the results of the user query
Create the design and deployment
scripts
l WRITE privilege on the storage location of the design
script
l WRITE privilege on the storage location of the
deployment script
Workflow for Running Database Designer
HP Vertica provides three ways to run Database Designer:
l Using Management Console to Create a Design
l Using Administration Tools to Create a Design
l About Running Database Designer Programmatically
The following workflow is common to all these ways to run Database Designer:

Specifying Parameters for Database Designer
Before you run Database Designer to create a design, provide information that allows Database
Designer to create the optimal physical schema:
l Design Name
l Design Types
l Optimization Objectives
l Design Tables with Sample Data
l Design Queries
l K-Safety for Design
l Replicated and Unsegmented Projections
l Statistics Analysis
Design Name
All designs that Database Designer creates must have a name that you specify. The design name
must be alphanumeric or underscore (_) characters, and can be no more than 32 characters long.
(Administrative Tools and Management Console limit the design name to 16 characters.)
The design name becomes part of the files that Database Designer generates, including the
deployment script, allowing the files to be easily associated with a particular Database Designer
run.
Design Types
The Database Designer can create two distinct design types. The design you choose depends on
what you are trying to accomplish:
l Comprehensive Design
l Incremental Design
Comprehensive Design
A comprehensive design creates an initial or replacement design for all the tables in the specified
schemas. Create a comprehensive design when you are creating a new database.
To help Database Designer create an efficient design, load representative data into the tables
before you begin the design process. When you load data into a table, HP Vertica creates an

unoptimized superprojection so that Database Designer has projections to optimize. If a table has
no data, Database Designer cannot optimize it.
Optionally, supply Database Designer with representative queries that you plan to use so Database
Designer can optimize the design for them. If you do not supply any queries, Database Designer
creates a generic optimization of the superprojections that minimizes storage, with no query-
specific projections.
During a comprehensive design, Database Designer creates deployment scripts that:
l Create new projections to optimize query performance, only when they do not already exist.
l Create replacement buddy projections when Database Designer changes the encoding of pre-
existing projections that it has decided to keep.
Incremental Design
An incremental design creates an enhanced design with additional projections, if required, that are
optimized specifically for the queries that you provide. Create an incremental design when you have
one or more queries that you want to optimize.
Optimization Objectives
When creating a design, Database Designer can optimize the design for one of three objectives:
l Load Database Designer creates a design that is optimized for loads, minimizing database size,
potentially at the expense of query performance.
l Performance Database Designer creates a design that is optimized for fast query performance.
Because it recommends a design for optimized query performance, this design might
recommend more than the Load or Balanced objectives, potentially resulting in a larger database
storage size.
l Balanced Database Designer creates a design whose objectives are balanced between
database size and query performance.
Design Tables with Sample Data
You must specify one or more design tables for Database Designer to deploy a design. If your
schema is empty, it does not appear as a design table option.
When you specify design tables, consider the following:
l To create the most efficient projections for your database, load a moderate amount of
representative data into tables before running Database Designer. Database Designer considers
the data in this table when creating the design.

l If your design tables have a large amount if data, the Database Designer run takes a long time; if
your tables have too little data, the design is not optimized. HP Vertica recommends that 10
GB of sample data is sufficient for creating an optimal design.
l If you submit a design table with no data, Database Designer ignores it.
l If one of your design tables has been dropped, you will not be able to build or deploy your design.
Design Queries
If you supply representative queries that you run on your database to Database Designer, it
optimizes the performance of those queries.
If you are creating an incremental design, you must supply design queries; if you are creating a
comprehensive design, HP Vertica recommends you supply design queries to create an optimal
design.
Database Designer checks the validity of all queries when you add them to your design and again
when it builds the design. If a query is invalid, Database Designer ignores it.
Query Repository
Using Management Console, you can submit design queries from the QUERY_
REQUESTS system table. This is called the query repository.
The QUERY_REQUESTS table contains queries that users have run recently. For a
comprehensive design, you can submit up to 200 queries from the QUERY_REQUESTS table to
Database Designer to be considered when creating the design. For an incremental design, you can
submit up to 100 queries from the QUERY_REQUESTS table.
K-Safety for Design
When you create a comprehensive design, you can set a K-safety value for your design. Valid
values are 0, 1, and 2. The value you specify is limited by the maximum K-safety allowed by the
number of nodes in your cluster.
Note: If you are not a DBADMIN user, you cannot set the design K-safety to a value less than
the system K-safety.
The default K-safety is as follows:
l If your cluster has one or two nodes, the default K-safety is 0.
l If your cluster has three or more nodes, the default K-safety is 1. .
For a comprehensive design, you can make the following changes to the design K-safety before
deploying the design:

l If your cluster has one or two nodes, you cannot change the K-safety.
l If your cluster has three or four nodes, you change the K-safety to 1 or 0.
l If you cluster has five or more nodes, you can change the K-safety to 2, 1, or 0.
You cannot change the K-safety value of an incremental design. Incremental designs assume the
K-safety value of your cluster.
For more information about K-safety, see K-Safety in the Concepts Guide.
Replicated and Unsegmented Projections
When creating a comprehensive design, Database Designer creates projections based on data
statistics and queries. It also reviews the submitted design tables to decide whether projections
should be segmented (distributed across the cluster nodes) or replicated (duplicated on all cluster
nodes).
For detailed information, see the following sections:
l Replicated Projections
l Unsegmented Projections
Replicated Projections
Replication occurs when HP Vertica stores identical copies of data across all nodes in a cluster.
If you are running on a single-node database, all projections are replicated because segmentation is
not possible in a single-node database.
Assuming that largest-row-count equals the number of rows in the design table with the largest
number of rows, Database Designer recommends that a projection be replicated if any one of the
following is true:
l Condition 1: largest-row-count < 1,000000 and number of rows in the table <= 10% of largest-
row-count.
l Condition 2: largest-row-count >= 10,000,000 and number of rows in the table <= 1% of largest-
row-count.
l Condition 3: The number of rows in the table <= 100,000.
For more information about replication, see High Availability Through Projections in the Concepts
Guide.

Unsegmented Projections
Segmentation occurs when HP Vertica distributes data evenly across multiple database nodes so
that all nodes participate in query execution. Projection segmentation provides high availability and
recovery, and optimizes query execution.
When running Database Designer programmatically or using Management Console, you can
specify to allow Database Designer to recommend unsegmented projections in the design. If you
do not specify this, Database Designer recommends only segmented projections.
Database Designer recommends segmented superprojections for large tables when deploying to
multiple node clusters, and recommends replicated superprojections for smaller tables.
Database Designer does not segment projections on:
l Single-node clusters
l LONG VARCHAR and LONG VARBINARY columns
For more information about segmentation, see High Availability Through Projections in the
Concepts Guide.
Statistics Analysis
By default, Database Designer analyzes statistics for the design tables when adding them to the
design. This option is optional, but HP HP Vertica recommends that you analyze statistics because
accurate statistics help Database Designer optimize compression and query performance.
Analyzing statistics takes time and resources. If the current statistics for the design tables are up to
date, do not bother analyzing the statistics. When in doubt, analyze the statistics to make sure they
are current.
For more information, see Collecting Statistics.
Building a Design
After you have created design tables and loaded data into them, and then specified the parameters
you want Database Designer to use when creating the physical schema, direct Database Designer
to create the scripts necessary to build the design.
Note: You cannot stop a running database if Database Designer is building a database design.
When you build a database design, HP Vertica generates two scripts:
l Deployment script—<design_name>_deploy.sql—Contains the SQL statements that create
projections for the design you are deploying, deploy the design, and drop unused projections.
When the deployment script runs, it creates the optimized design. For details about how to run
this script and deploy the design, see Deploying a Design.

l Design script—<design_name>_design.sql—Contains the
CREATE PROJECTION statements that Database Designeruses to create the design. Review
this script to make sure you are happy with the design.
The design script is a subset of the deployment script. It serves as a backup of the DDL for the
projections that the deployment script creates.
If you run Database Designer from Administrative Tools,.HP HP Vertica also creates a backup
script named <design_name>_projection_backup_<unique id #>.sql. This script contains
SQL statements to deploy the design that existed on the system before deployment. This file is
useful in case you need to revert to the pre-deployment design.
When you create a design using Management Console:
l If you submit a large number of queries to your design and build it right immediately, a timing
issue could cause the queries not to load before deployment starts. If this occurs, you may see
one of the following errors:
n No queries to optimize for
n No tables to design projections for
To accommodate this timing issue, you may need to reset the design, check the Queries tab to
make sure the queries have been loaded, and then rebuild the design. Detailed instructions are
in:
n Using the Wizard to Create a Design
n Creating a Design Manually
l The scripts are deleted when deployment completes. To save a copy of the deployment script
after the design is built but before the deployment completes, go to the Output window and copy
and paste the SQL statements to a file.
Resetting a Design
You must reset a design when:
l You build a design and the output scripts described in Building a Design are not created.
l You build a design but Database Designer cannot complete the design because the queries it
expects are not loaded.
Resetting a design discards all the run-specific information of the previous Database Designer
build, but retains its configuration (design type, optimization objectives, K-safety, etc.) and tables
and queries.
After you reset a design, review the design to see what changes you need to make. For example,
you can fix errors, change parameters, or check for and add additional tables or queries. Then you
can rebuild the design.

You can only reset a design in Management Console or by using the DESIGNER_RESET_
DESIGN function.

Deploying a Design
After running Database Designer to generate a deployment script, HP Vertica recommends that
you test your design on a non-production server before you deploy it to your production server.
Both the design and deployment processes run in the background. This is useful if you have a large
design that you want to run overnight. Because an active SSH session is not required, the
design/deploy operations continue to run uninterrupted, even if the session is terminated.
Note: You cannot stop a running database if Database Designer is building or deploying a
database design.
Database Designer runs as a background process. Multiple users can run Database Designer
concurrently without interfering with each other or using up all the cluster resources. However, if
multiple users are deploying a design on the same tables at the same time, Database Designer may
not be able to complete the deployment. To avoid problems, consider the following:
l Schedule potentially conflicting Database Designer processes to run sequentially overnight so
that there are no concurrency problems.
l Avoid scheduling Database Designer runs on the same set of tables at the same time.
There are two ways to deploy your design:
l Deploying Designs Using Database Designer
l Deploying Designs Manually
Deploying Designs Using Database Designer
HP recommends that you run Database Designer and deploy optimized projections right after
loading your tables with sample data because Database Designer provides projections optimized
for the current state of your database.
If you choose to allow Database Designer to automatically deploy your script during a
comprehensive design and are running Administrative Tools, Database Designer creates a backup
script of your database's current design. This script helps you re-create the design of projections
that may have been dropped by the new design. The backup script is located in the output directory
you specified during the design process.
If you choose not to have Database Designer automatically run the deployment script (for example,
if you want to maintain projections from a pre-existing deployment), you can manually run the
deployment script later. See Deploying Designs Manually.
To deploy a design while running Database Designer, do one of the following:
l In Management Console, select the design and click Deploy Design.
l In the Administration Tools, select Deploy design in the Design Options window.

If you are running Database Designer programmatically, use DESIGNER_RUN_POPULATE_
DESIGN_AND_DEPLOY and set the deploy parameter to 'true'.
Once you have deployed your design, query the DEPLOY_STATUS system table to see the steps
that the deployment took:
vmartdb=> SELECT * FROM V_MONITOR.DEPLOY_STATUS;
Deploying Designs Manually
If you chose not to have Database Designer deploy your design at design time, you can deploy the
design later using the deployment script:
1. Make sure that you have a database that contains the same tables and projections as the
database on which you ran Database Designer. The database should also contain sample
data.
2. To deploy the projections to a test or production environment, use the following vsql command
to execute the deployment script, where <design_name> is the name of the database design:
=> i <design_name>_deploy.sql
How to Create a Design
There are three ways to create a design using Database Designer:
l From Management Console, open a database and select the Design page at the bottom of the
window.
For details about using Management Console to create a design, see Using Management
Console to Create a Design
l Programmatically, using the techniques described in About Running Database Designer
Programmatically in the Programmer's Guide. To run Database Designer programmatically, you
must be a DBADMIN or have been granted the DBDUSER role and enabled that role.
l From the Administration Tools menu, by selecting Configuration Menu > Run Database
Designer. You must be a DBADMIN user to run Database Designer from the Administration
Tools.
For details about using Administration Tools to create a design, see Using Administration Tools
to Create a Design.
The following table shows what Database Designer capabilities are available in each tool:

Database Designer
Capability
Management
Console
Running Database
Designer
Programmatically
Administrative
Tools
Create design Yes Yes Yes
Design name length (# of
characters)
16 32 16
Build design (create design
and deployment scripts)
Yes Yes Yes
Create backup script Yes
Set design type
(comprehensive or
incremental)
Yes Yes Yes
Set optimization objective Yes Yes Yes
Add design tables Yes Yes Yes
Add design queries file Yes Yes Yes
Add single design query Yes
Use query repository Yes Yes
Set K-safety Yes Yes Yes
Analyze statistics Yes Yes Yes
Require all unsegmented
projections
Yes Yes
View event history Yes Yes
Set correlation analysis
mode (Default = 0)
Yes
Using Management Console to Create a Design
To use Management Console to create an optimized design for your database, you must be a
DBADMIN user or have been assigned the DBDUSER role.
Management Console provides two ways to create a design
l Wizard—This option walks you through the process of configuring a new design. Click Back
and Next to navigate through the Wizard steps, or Cancel to cancel creating a new design.
To learn how to use the Wizard to create a design, see Using the Wizard to Create a Design.
l Manual—This option creates and saves a design with the default parameters.

To learn how to create a design manually, see Creating a Design Manually
Tip: If you have many design tables that you want Database Designer to consider, it might be
easier to use the Wizard to create your design. In the Wizard, you can submit all the tables in a
schema at once; creating a design manually requires that you submit the design tables one at a
time.
Using the Wizard to Create a Design
Take these steps to create a design using the Management Console's Wizard:
1. Log in to Management Console, select and start your database, and click Design at the bottom
of the window. The Database Designer window appears. If there are no existing designs, the
New Design window appears.
The left-hand side of the Database Designer window lists the database designs for which you
are the owner, with the most recent design you worked on selected. That pane also lists the
current status of the design.
The main pane contains details about the selected design.

2. To create a new design, click New Design.
3. Enter a name for your design, and click Wizard.
For more information, see Design Name.
4. Navigate through the Wizard using the Back and Next buttons.
5. To build the design immediately after exiting the Wizard, on the Execution Options window,
select Auto-build.
Important: Hewlett-Packard does not recommend that you auto-deploy the design from
the Wizard. There may be a delay in adding the queries to the design, so if the design is
deployed but the queries have not yet loaded, deployment may fail. If this happens, reset
the design, check the Queries tab to make sure the queries have been loaded, and deploy
the design.
6. When you have entered all the information, the Wizard displays a summary of your choices.
Click Submit Design to build your design.

Creating a Design Manually
To create a design using Management Console and specifying the configuration, take these steps.
1. Log in to Management Console, select and start your database, and click Design at the bottom
of the window. The Database Designer window appears.
The left-hand side of the Database Designer window lists the database designs for which you
are the owner, with the most recent design you worked on highlighted. That pane also lists the
current status of the design. Details about the most recent design appears in the main pane.
The main pane contains details about the selected design.

2. To create a new design, click New Design.
3. Enter a name for your design and select Manual.
After a few seconds, the main Database Design window opens, displaying the default design
parameters. HP Vertica has created and saved a design with the name you specified, and
assigned it the default parameters.
For more information, see Design Name.
4. On the General window, modify the design type, optimization objectives, K-safety, and the
setting that allows Database Designer to create unsegmented projections.
If you choose Incremental, the design automatically optimizes for load and the K-safety
defaults to the value of the cluster K-safety; you cannot change these values for an
incremental design.
5. Click the Tables tab. You must submit tables to your design.

6. To add tables of sample data to your design, click Add Tables. A list of available tables
appears; select the tables you want and click Save. If you want to remove tables from your
design, click the tables you want to remove, and click Remove Selected.
If a design table has been dropped from the database, a red circle with a white exclamation
point appears next to the table name. Before you can build or deploy the design, you must
remove any dropped tables from the design. To do this, select the dropped tables and and click
Remove Selected. You cannot build or deploy a design if any of the design tables have been
dropped.
7. Click the Queries tab.To add queries to your design, do one of the following:
n To add queries from the QUERY_REQUESTS system table, click Query Repository,
select the desired queries and click Save. All valid queries that you selected appear in
the Queries window.
n To add queries from a file, select Choose File. All valid queries in the file that you select are
added to the design and appear in the Queries window.
Database Designer checks the validity of the queries when you add the queries to the design
and again when you build the design. If it finds invalid queries, it ignores them.
If you have a large number of queries, it may take time to load them. Make sure that all the
queries you want Database Designer to consider when creating the design are listed in the
Queries window.
8. Once you have specified all the parameters for your design, you should build the design. To do
this, select your design and click Build Design.
9. Select Analyze Statistics if you want Database Designer to analyze the statistics before
building the design.
For more information see Statistics Analysis.
10. If you do not need to review the design before deploying it, select Deploy Immediately.
Otherwise, leave that option unselected.
11. Click Start. On the left-hand pane, the status of your design displays as Building until it is
complete.
12. To follow the progress of a build, click Event History. Status messages appear in this window
and you can see the current phase of the build operation. The information in the Event History
tab contains data from the OUTPUT_EVENT_HISTORY system table.
13. When the build completes, the left-hand pane displays Built. To view the deployment script,
select your design and click Output.

14. After you deploy the design using Management Console, the deployment script is deleted. To
keep a permanent copy of the deployment script, copy and paste the SQL commands from the
Output window to a file.
15. Once you have reviewed your design and are ready to deploy it, select the design and click
Deploy Design.
16. To follow the progress of the deployment, click Event History. Status messages appear in this
window and you can see the current phase of the deployment operation.
In the Event History window, while the design is running, you can do one of the following:
n Click the blue button next to the design name to refresh the event history listing.
n Click Cancel Design Run to cancel the design in progress.
n Click Force Delete Design to cancel and delete the design in progress.
17. When the deployment completes, the left-hand pane displays Deployment Completed. To
view the deployment script, select your design and click Output.
Your database is now optimized according to the parameters you set.
Using Administration Tools to Create a Design
To use the Administration Tools interface to create an optimized design for your database, you
must be a DBADMIN user. Follow these steps:
1. Log in as the dbadmin user and start Administration Tools.
2. From the main menu, start the database for which you want to create a design. The database
must be running before you can create a design for it.
3. On the main menu, select Configuration Menu and click OK.
4. On the Configuration Menu, select Run Database Designer and click OK.
5. On the Select a database to design window, enter the name of the database for which you
are creating a design and click OK.
6. On the Enter the directory for Database Designer output window, enter the full path to the
directory to contain the design script, deployment script, backup script, and log files, and click
OK.
For information about the scripts, see Building a Design.
7. On the Database Designer window, enter a name for the design and click OK.
For more information about design names, see Design Name.

8. On the Design Type window, choose which type of design to create and click OK.
For a description of the design types, see Design Types
9. The Select schema(s) to add to query search path window lists all the schemas in the
database that you selected. Select the schemas that contain representative data that you want
Database Designer to consider when creating the design and click OK.
For more information about choosing schema and tables to submit to Database Designer, see
Design Tables with Sample Data.
10. On the Optimization Objectives window, select the objective you want for the database
optimization:
n Optimize with Queries
For more information, see Design Queries.
n Update statistics
For more information see Statistics Analysis.
n Deploy design
For more information, see Deploying a Design.
For details about these objectives, see Optimization Objectives.
11. The final window summarizes the choices you have made and offers you two choices:
n Proceed with building the design, and deploying it if you specified to deploy it immediately.
If you did not specify to deploy, you can review the design and deployment scripts and
deploy them manually, as described in Deploying Designs Manually.
n Cancel the design and go back to change some of the parameters as needed.
12. Creating a design can take a long time.To cancel a running design from the Administration
Tools window, enter Ctrl+C.
To create a design for the VMart example database, see Using Database Designer to Create a
Comprehensive Design in the Getting Started Guide.

Creating Custom Designs
HP strongly recommends that you use the physical schema design produced by Database
Designer, which provides K-safety, excellent query performance, and efficient use of storage
space. If you find that any of your queries are not running as efficiently as you would like, you can
use the Database Designer incremental design process to optimize the database design for the
query.
If the projections created by Database Designer still do not meet your needs, you can write custom
projections, from scratch or based on projection designs created by Database Designer.
If you are unfamiliar with writing custom projections, start by modifying an existing design
generated by Database Designer.
The Design Process
To customize an existing design or create a new one, take these steps:
1. Plan the design or design modification.
As with most successful projects, a good design requires some up-front planning. See
Planning Your Design.
2. Create or modify projections.
For an overview of the CREATE PROJECTION statement and guidelines for creating
common projections, see Design Fundamentals. The CREATE PROJECTION section in the
SQL Reference Manual also provides more detail.
3. Deploy the projections to a test environment. See Writing and Deploying Custom Projections.
4. Test the projections.
5. Modify the projections as necessary.
6. Once you have finalized the design, deploy the projections to the production environment.

Planning Your Design
The syntax for creating a design is easy for anyone who is familiar with SQL. As with any
successful project, however, a successful design requires some initial planning. Before you create
your first design:
l Become familiar with standard design requirements and plan your design to include them. See
Design Requirements.
l Determine how many projections you need to include in the design. See Determining the
Number of Projections to Use.
l Determine the type of compression and encoding to use for columns. See Data Encoding and
Compression.
l Determine whether or not you want the database to be K-safe. HP Vertica recommends that all
production databases have a minimum K-safety of one (K=1). Valid K-safety values are 0, 1, and
2. See Designing for K-Safety.
Design Requirements
A physical schema design is a script that contains CREATE PROJECTION statements. These
statements determine which columns are included in projections and how they are optimized.
If you use Database Designer as a starting point, it automatically creates designs that meet all
fundamental design requirements. If you intend to create or modify designs manually, be aware that
all designs must meet the following requirements:
l Every design must create at least one superprojection for every table in the database that is
used by the client application. These projections provide complete coverage that enables users
to perform ad-hoc queries as needed. They can contain joins and they are usually configured to
maximize performance through sort order, compression, and encoding.
l Query-specific projections are optional. If you are satisfied with the performance provided
through superprojections, you do not need to create additional projections. However, you can
maximize performance by tuning for specific query work loads.
l HP recommends that all production databases have a minimum K-safety of one (K=1) to support
high availability and recovery. (K-safety can be set to 0, 1, or 2.) See High Availability Through
Projections in the Concepts Guide and Designing for K-Safety.
Determining the Number of Projections to Use
In many cases, a design that consists of a set of superprojections (and their buddies) provides
satisfactory performance through compression and encoding. This is especially true if the sort
orders for the projections have been used to maximize performance for one or more query
predicates (WHERE clauses).

However, you might want to add additional query-specific projections to increase the performance
of queries that run slowly, are used frequently, or are run as part of business-critical reporting. The
number of additional projections (and their buddies) that you create should be determined by:
l Your organization's needs
l The amount of disk space you have available on each node in the cluster
l The amount of time available for loading data into the database
As the number of projections that are tuned for specific queries increases, the performance of these
queries improves. However, the amount of disk space used and the amount of time required to load
data increases as well. Therefore, you should create and test designs to determine the optimum
number of projections for your database configuration. On average, organizations that choose to
implement query-specific projections achieve optimal performance through the addition of a few
query-specific projections.
Designing for K-Safety
Before creating custom physical schema designs, determine whether you want the database to be
K-safe and adhere to the appropriate design requirements for K-safe databases or databases with
no K-safety. HP requires that all production databases have a minimum K-safety of one (K=1).
Valid K-safety values for production databases are 1 and 2. Non-production databases do not have
to be K-safe and can be set to 0. You can start by creating a physical schema design with no K-
safety, and then modify it to be K-safe at a later point in time. See High Availability and Recovery
and High Availability Through Projections in the Concepts Guide for an explanation of how HP
Vertica implements high availability and recovery through replication and segmentation.
Requirements for a K-Safe Physical Schema Design
Database Designer automatically generates designs with a K-safety of 1 for clusters that contain at
least three nodes. (If your cluster has one or two nodes, it generates designs with a K-safety of 0.
You can modify a design created for a three-node (or greater) cluster, and the K-safe requirements
are already set.
If you create custom projections, your physical schema design must meet the following
requirements to be able to successfully recover the database in the event of a failure:
l Segmented projections must be segmented across all nodes. Refer to Designing for
Segmentation and Designing Segmented Projections for K-Safety.
l Replicated projections must be replicated on all nodes. See Designing Replicated Projections
for K-Safety.
l Segmented projections must have K buddy projections (projections that have identical columns
and segmentation criteria, except that corresponding segments are placed on different nodes).
You can use the MARK_DESIGN_KSAFE function to find out whether your schema design meets
requirements for K-safety.

Requirements for a Physical Schema Design with No K-Safety
If you use Database Designer to generate an comprehensive design that you can modify and you
do not want the design to be K-safe, set K-safety level to 0 (zero).
If you want to start from scratch, do the following to establish minimal projection requirements for a
functioning database with no K-safety (K=0):
1. Define at least one superprojection for each table in the logical schema.
2. Replicate (define an exact copy of) each dimension table superprojection on each node.
Designing Replicated Projections for K-Safety
If you are creating or modifying a design for a K-safe database, make sure that projections for
dimension tables are replicated on each node in the database.
You can accomplish this using a single CREATE PROJECTION command for each dimension
table. The UNSEGMENTED ALL NODES syntax within the segmentation clause automatically
creates an unsegmented projection on each node in the database.
When you run your design script, HP Vertica generates a list of nodes based on the number of
nodes in the database and replicates the projection accordingly. Replicated projections have the
name:
projection-name_node-name
If, for example, the nodes are named NODE01, NODE02, and NODE03, the projections are named
ABC_NODE01, ABC_NODE02, and ABC_NODE03.
Note: This naming convention can affect functions that provide information about projections,
for example, GET_PROJECTIONS or GET_PROJECTION_STATUS, where you must
provide the name ABC_NODE01 instead of just ABC. To view a list of the nodes in a database,
use the View Database command in the Administration Tools.
The following script uses the UNSEGMENTED ALL NODES syntax to create one unsegmented
superprojection for the store_dimension table on each node.
CREATE PROJECTION store_dimension(
C0_store_dimension_floor_plan_type ENCODING RLE ,
C1_store_dimension_photo_processing_type ENCODING RLE ,
C2_store_dimension_store_key ,
C3_store_dimension_store_name ,
C4_store_dimension_store_number ,
C5_store_dimension_store_street_address ,
C6_store_dimension_store_city ,
C7_store_dimension_store_state ,
C8_store_dimension_store_region ,
C9_store_dimension_financial_service_type ,
C10_store_dimension_selling_square_footage ,
C11_store_dimension_total_square_footage ,

C12_store_dimension_first_open_date ,
C13_store_dimension_last_remodel_date )
AS SELECT T_store_dimension.floor_plan_type,
T_store_dimension.photo_processing_type,
T_store_dimension.store_key,
T_store_dimension.store_name,
T_store_dimension.store_number,
T_store_dimension.store_street_address,
T_store_dimension.store_city,
T_store_dimension.store_state,
T_store_dimension.store_region,
T_store_dimension.financial_service_type,
T_store_dimension.selling_square_footage,
T_store_dimension.total_square_footage,
T_store_dimension.first_open_date,
T_store_dimension.last_remodel_date
FROM store_dimension T_store_dimension
ORDER BY T_store_dimension.floor_plan_type, T_store_dimension.photo_processing_type
UNSEGMENTED ALL NODES;
Note: Large dimension tables can be segmented. A dimension table is considered to be large
when it is approximately the same size as a fact table.
Designing Segmented Projections for K-Safety
If you are creating or modifying a design for a K-safe database, you need to create K-safe
projections for fact tables and large dimension tables. (A dimension table is considered to be large if
it is similar in size to a fact table.) To accomplish this, you must:
l Create a segmented projection for each fact and large dimension table.
l Create segmented buddy projections for each of these projections. The total number of
projections in a buddy set must be two for a K=1 database or three for a K=2 database.
For an overview of segmented projections and their buddies, see Projection Segmentation in the
Concepts Guide. For information about designing for K-safety, see Designing for K-Safety and
Designing for Segmentation.
Segmenting Projections
To segment a projection, use the segmentation clause to specify the:
l Segmentation method to use.
l Column to use to segment the projection.
l Nodes on which to segment the projection. You can segment projections across all the nodes, or
just the number of nodes necessary to maintain K-safety, either three for a K=1 database or five
for a K=2 database.

See the CREATE PROJECTION statement in the SQL Reference Manual.
The following segmentation clause uses hash segmentation to segment the projection across all
nodes based on the T_retail_sales_fact.pos_transaction_number column:
CREATE PROJECTION retail_sales_fact_P1...
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES;
Creating Buddy Projections
To create a buddy projection, copy the original projection and modify it as follows:
l Rename it to something similar to the name of the original projection. For example, a projection
named retail_sales_fact_P1 could have buddies named retail_sales_fact_P1_B1 and
retail_sales_fact_P1_B2.
l Modify the sort order as needed.
l Create an offset to store the segments for the buddy on different nodes. For example, the first
buddy in a projection set would have an offset of one (OFFSET1;) the second buddy in a
projection set would have an offset of two (OFFSET2;), and so on.
To create a buddy for the projection created in the previous example:
CREATE PROJECTION retail_sales_fact_P1_B1...
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES OFFSET 1;
Designing for Segmentation
You segment projections using hash segmentation. Hash segmentation allows you to segment a
projection based on a built-in hash function that provides even distribution of data across multiple
nodes, resulting in optimal query execution. In a projection, the data to be hashed consists of one or
more column values, each having a large number of unique values and an acceptable amount of
skew in the value distribution. Primary key columns that meet the criteria could be an excellent
choice for hash segmentation.
Note: For detailed information about using hash segmentation in a projection, see CREATE
PROJECTION in the SQL Reference Manual.
When segmenting projections, determine which columns to use to segment the projection. Choose
one or more columns that have a large number of unique data values and acceptable skew in their
data distribution. Primary key columns are an excellent choice for hash segmentation. The columns
must be unique across all the tables being used in a query.

Design Fundamentals
Although you can write custom projections from scratch, HP Vertica recommends that you use
Database Designer to create a design to use as a starting point. This ensures that you have
projections that meet basic requirements.
Writing and Deploying Custom Projections
Before you write custom projections, be sure to review the topics in Planning Your Design carefully.
Failure to follow these considerations can result in non-functional projections.
To manually modify or create a projection:
1. Write a script to create the projection, using the CREATE PROJECTION statement.
2. Use the i meta-command in vsql to run the script.
Note: You must have a database loaded with a logical schema.
3. For a K-safe database, use the function SELECT get_projections('table_name') to verify
that the projections were properly created. Good projections are noted as being "safe." This
means that the projection has enough buddies to be K-safe.
4. If you added the new projection to a database that already has projections that contain data,
you need to update the newly created projection to work with the existing projections. By
default, the new projection is out-of-date (not available for query processing) until you refresh
it.
5. Use the MAKE_AHM_NOW function to set the Ancient History Mark (AHM) to the greatest
allowable epoch (now).
6. Use the DROP_PROJECTION function to drop any previous projections that are no longer
needed.
These projections can waste disk space and reduce load speed if they remain in the database.
7. Run the ANALYZE_STATISTICS function on all projections in the database. This function
collects and aggregates data samples and storage information from all nodes on which a
projection is stored, and then writes statistics into the catalog. For example:
=>SELECT ANALYZE_STATISTICS ('');
Anatomy of a Projection
The CREATE PROJECTION statement defines the individual elements of a projection, as the
following graphic shows.

The previous example contains the following significant elements:
Column List and Encoding
Lists every column in the projection and defines the encoding for each column. Unlike traditional
database architectures, HP Vertica operates on encoded data representations. Therefore, HP
recommends that you use data encoding because it results in less disk I/O.
Base Query
Identifies all the columns to incorporate in the projection through column name and table name
references. The base query for large table projections can contain PK/FK joins to smaller tables.
Sort Order
The sort order optimizes for a specific query or commonalities in a class of queries based on the
query predicate. The best sort orders are determined by the WHERE clauses. For example, if a
projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of
the needed data is found together in the sort order, so the query runs almost instantaneously.
You can also optimize a query by matching the projection's sort order to the query's GROUP BY
clause. If you do not specify a sort order, HP Vertica uses the order in which columns are specified
in the column definition as the projection's sort order.
The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values
so that a disk read can pick up many results at once. For maximum performance, do not sort
projections on LONG VARBINARY and LONG VARCHAR columns.

Segmentation
The segmentation clause determines whether a projection is segmented across nodes within the
database. Segmentation distributes contiguous pieces of projections, called segments, for large
and medium tables across database nodes. Segmentation maximizes database performance by
distributing the load. Use SEGMENTED BY HASH to segment large table projections.
For small tables, use the UNSEGMENTED keyword to direct HP Vertica to replicate these tables,
rather than segment them. Replication creates and stores identical copies of projections for small
tables across all nodes in the cluster. Replication ensures high availability and recovery.
For maximum performance, do not segment projections on LONG VARBINARY and LONG
VARCHAR columns.
Designing Superprojections
Superprojections have the following requirements:
l They must contain every column within the table.
l For a K-safe design, superprojections must either be replicated on all nodes within the database
cluster (for dimension tables) or paired with buddies and segmented across all nodes (for very
large tables and medium large tables). See Physical Schema and High Availability Through
Projections in the Concepts Guide for an overview of projections and how they are stored. See
Designing for K-Safety for design specifics.
To provide maximum usability, superprojections need to minimize storage requirements while
maximizing query performance. To achieve this, the sort order for columns in superprojections is
based on storage requirements and commonly used queries.
Minimizing Storage Requirements
Minimizing storage not only saves on physical resources, it increases performance by requiring the
database to perform less disk I/O. To minimize storage space for a projection:
l Analyze the type of data stored in each projection column and choose the most effective
encoding method. See the CREATE PROJECTION statement and encoding-type in the SQL
Reference Manual.
The HP Vertica optimizer gives Run-Length Encoding (RLE) preference, so be sure to use it
whenever appropriate. Run Length Encoding (RLE) replaces sequences (runs) of identical
values with a single pair that contains the value and number of occurrences. Therefore, use it
only when the run length is large, such as when sorting low-cardinality columns.
l Prioritize low-cardinality columns in the column sort order. This minimizes the number of rows
that HP Vertica stores and accesses to retrieve query results.

For more information about minimizing storage requirements, see Choosing Sort Order: Best
Practices.
Maximizing Query Performance
In addition to minimizing storage requirements, the column sort order facilitates the most commonly
used queries for the table. This means that the column sort order prioritizes the lowest cardinality
columns that are actually used in queries. For examples that take into account both storage and
query requirements, see Choosing Sort Order: Best Practices.
Note: For maximum performance, do not sort projections on LONG VARBINARY and LONG
VARCHAR columns.
Projections within a buddy set can all have different sort orders. This enables you to maximize
query performance for groups of queries with common WHERE clauses, but different sort orders.
If, for example, you have a three-node cluster, your buddy set contains three interrelated
projections, each having its own sort order.
In a database with a K-safety of 1 or 2, buddy projections are used for data recovery. If a node fails,
it queries the other nodes to recover data through buddy projections. (See How Result Sets are
Stored in the Concepts Guide.) If a projection's buddies use different sort orders, it takes longer to
recover the projection because the data has to be resorted during recovery to match the sort order of
the projection. Therefore, consider using identical sort orders for tables that are rarely queried or that
are repeatedly accessed by the same query, and use multiple sort orders for tables that are
accessed by queries with common WHERE clauses, but different sort orders.
If you have queries that access multiple tables or you want to maintain the same sort order for
projections within buddy sets, create query-specific projections. Designs that contain projections
for specific queries are called optimized designs.
Projection Design for Merge Operations
The HP Vertica query optimizer automatically picks the best projections to use for queries, but you
can help improve the performance of MERGE operations by ensuring projections are designed for
optimal use.
Good projection design lets HP Vertica choose the faster merge join between the target and source
tables without having to perform additional sort and data transfer operations.
HP recommends that you first use Database Designer to generate a comprehensive design and
then customize projections, as needed. Be sure to first review the topics in Planning Your Design.
Failure to follow those considerations could result in non-functioning projections.
In the following MERGE statement, HP Vertica inserts and/or updates records from the source
table's column b into the target table's column a:
=> MERGE INTO target t USING source s ON t.a = s.b WHEN ....

HP Vertica can use a local merge join if tables target and source use one of the following
projection designs, where their inputs are pre-sorted through the CREATE PROJECTION ORDER BY
clause:
l Replicated projections that are sorted on:
n Column a for target
n Column b for source
l Segmented projections that are identically segmented on:
n Column a for target
n Column b for source
n Corresponding segmented columns
Tip: For best merge performance, the source table should be smaller than the target table.
See Also
l Optimized Versus Non-Optimized MERGE
l Best Practices for Optimizing MERGE Statements

Maximizing Projection Performance
This section explains how to design your projections in order to optimize their performance.
Choosing Sort Order: Best Practices
When choosing sort orders for your projections, HP Vertica has several recommendations that can
help you achieve maximum query performance, as illustrated in the following examples.
Combine RLE and Sort Order
When dealing with predicates on low-cardinality columns, use a combination of RLE and sorting to
minimize storage requirements and maximize query performance.
Suppose you have a students table contain the following values and encoding types:
Column # of Distinct Values Encoded With
gender 2 (M or F) RLE
pass_fail 2 (P or F) RLE
class 4 (freshman, sophomore, junior, or senior) RLE
name 10000 (too many to list) Auto
You might have queries similar to this one:
=> SELECT name FROM studentsWHERE gender = 'M' AND pass_fail = 'P' AND class = 'senior';
The fastest way to access the data is to work through the low-cardinality columns with the smallest
number of distinct values before the high-cardinality columns. The following sort order minimizes
storage and maximizes query performance for queries that have equality restrictions on gender,
class, pass_fail, and name. Specify the ORDER BY clause of the projection as follows:
ORDER BY students.gender, students.pass_fail, students.class, students.name
In this example, the gender column is represented by two RLE entries, the pass_fail column is
represented by four entries, and the class column is represented by 16 entries, regardless of the
cardinality of the students table. HP Vertica efficiently finds the set of rows that satisfy all the
predicates, resulting in a huge reduction of search effort for RLE encoded columns that occur early
in the sort order. Consequently, if you use low-cardinality columns in local predicates, as in the
previous example, put those columns early in the projection sort order, in increasing order of distinct
cardinality (that is, in increasing order of the number of distinct values in each column).

If you sort this table with student.class first, you improve the performance of queries that restrict
only on the student.class column, and you improve the compression of the student.class
column (which contains the largest number of distinct values), but the other columns do not
compress as well. Determining which projection is better depends on the specific queries in your
workload, and their relative importance.
Storage savings with compression decrease as the cardinality of the column increases; however,
storage savings with compression increase as the number of bytes required to store values in that
column increases.
Maximize the Advantages of RLE
To maximize the advantages of RLE encoding, use it only when the average run length of a column
is greater than 10 when sorted. For example, suppose you have a table with the following columns,
sorted in order of cardinality from low to high:
address.country, address.region, address.state, address.city, address.zipcode
The zipcode column might not have 10 sorted entries in a row with the same zip code, so there is
probably no advantage to run-length encoding that column, and it could make compression worse.
But there are likely to be more than 10 countries in a sorted run length, so applying RLE to the
country column can improve performance.
Put Lower Cardinality Column First for Functional
Dependencies
In general, put columns that you use for local predicates (as in the previous example) earlier in the
join order to make predicate evaluation more efficient. In addition, if a lower cardinality column is
uniquely determined by a higher cardinality column (like city_id uniquely determining a state_id), it
is always better to put the lower cardinality, functionally determined column earlier in the sort order
than the higher cardinality column.

For example, in the following sort order, the Area_Code column is sorted before the Number column
in the customer_info table:
ORDER BY = customer_info.Area_Code, customer_info.Number, customer_info.Address
In the query, put the Area_Code column first, so that only the values in the Number column that start
with 978 are scanned.
SELECT AddressFROM customer_info WHERE Area_Code='978' AND Number='9780123457';
Sort for Merge Joins
When processing a join, the HP Vertica optimizer chooses from two algorithms:
l Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses a merge
join, which is faster and uses less memory.
l Hash join—Using the hash join algorithm, HP Vertica uses the smaller (inner) joined table to
build an in-memory hash table on the join column. A hash join has no sort requirement, but it
consumes more memory because Vertica builds a hash table with the values in the inner table.
The optimizer chooses a hash join when projections are not sorted on the join columns.
If both inputs are pre-sorted, merge joins do not have to do any pre-processing, making the join
perform faster. HP Vertica uses the term sort-merge join to refer to the case when at least one of
the inputs must be sorted prior to the merge join. HP Vertica sorts the inner input side but only if the
outer input side is already sorted on the join columns.
To give the Vertica query optimizer the option to use an efficient merge join for a particular join,
create projections on both sides of the join that put the join column first in their respective
projections. This is primarily important to do if both tables are so large that neither table fits into
memory. If all tables that a table will be joined to can be expected to fit into memory simultaneously,
the benefits of merge join over hash join are sufficiently small that it probably isn't worth creating a
projection for any one join column.

Sort on Columns in Important Queries
If you have an important query, one that you run on a regular basis, you can save time by putting the
columns specified in the WHERE clause or the GROUP BY clause of that query early in the sort
order.
If that query uses a high-cardinality column such as Social Security number, you may sacrifice
storage by placing this column early in the sort order of a projection, but your most important query
will be optimized.
Sort Columns of Equal Cardinality By Size
If you have two columns of equal cardinality, put the column that is larger first in the sort order. For
example, a CHAR(20) column takes up 20 bytes, but an INTEGER column takes up 8 bytes. By
putting the CHAR(20) column ahead of the INTEGER column, your projection compresses better.
Sort Foreign Key Columns First, From Low to High
Distinct Cardinality
Suppose you have a fact table where the first four columns in the sort order make up a foreign key
to another table. For best compression, choose a sort order for the fact table such that the foreign
keys appear first, and in increasing order of distinct cardinality. Other factors also apply to the
design of projections for fact tables, such as partitioning by a time dimension, if any.
In the following example, the table inventory stores inventory data, and product_key and
warehouse_key are foreign keys to the product_dimension and warehouse_dimension tables:
=> CREATE TABLE inventory (
date_key INTEGER NOT NULL,
product_key INTEGER NOT NULL,
warehouse_key INTEGER NOT NULL,
...
);
=> ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
REFERENCES warehouse_dimension(warehouse_key);
=> ALTER TABLE inventory
ADD CONSTRAINT fk_inventory_product FOREIGN KEY(product_key)
REFERENCES product_dimension(product_key);
The inventory table should be sorted by warehouse_key and then product, since the cardinality of
the warehouse_key column is probably lower that the cardinality of the product_key.
Prioritizing Column Access Speed
If you measure and set the performance of storage locations within your cluster, HP Vertica uses
this information to determine where to store columns based on their rank. For more information, see

Setting Storage Performance.
How Columns are Ranked
HP Vertica stores columns included in the projection sort order on the fastest storage locations.
Columns not included in the projection sort order are stored on slower disks. Columns for each
projection are ranked as follows:
l Columns in the sort order are given the highest priority (numbers > 1000).
l The last column in the sort order is given the rank number 1001.
l The next-to-last column in the sort order is given the rank number 1002, and so on until the first
column in the sort order is given 1000 + # of sort columns.
l The remaining columns are given numbers from 1000–1, starting with 1000 and decrementing by
one per column.
HP Vertica then stores columns on disk from the highest ranking to the lowest ranking, with the
highest ranking columns placed on the fastest disks, and the lowest ranking columns placed on the
slowest disks.
Overriding Default Column Ranking
You can modify which columns are stored on fast disks by manually overriding the default ranks for
these columns. To accomplish this, set the ACCESSRANK keyword in the column list. Make sure to
use an integer that is not already being used for another column. For example, if you want to give a
column the fastest access rank, use a number that is significantly higher than 1000 + the number of
sort columns. This allows you to enter more columns over time without bumping into the access
rank you set.
The following example sets the access rank for the C1_retail_sales_fact_store_key column to
1500.
CREATE PROJECTION retail_sales_fact_P1 ( C1_retail_sales_fact_store_key ENCODING RLE ACC
ESSRANK 1500,
C2_retail_sales_fact_pos_transaction_number ,
C3_retail_sales_fact_sales_dollar_amount ,
C4_retail_sales_fact_cost_dollar_amount )

Projection Examples
This section provides examples that show you how to create projections.
New K-Safe=2 Database
In this example, projections are created for a new five-node database with a K-safety of 2. To
simplify the example, this database contains only two tables: retail_sale_fact and store_
dimension. Creating projections for this database consists of creating the following segmented and
unsegmented (replicated) superprojections:
l Segmented projections
To support K-safety=2, the database requires three segmented projections (one projection and
two buddy projections) for each fact table. In this case, it requires three segmented projections
for the retail_sale_fact table:
Projection Description
P1 The primary projection for the retail_sale_fact table.
P1_B1 The first buddy projection for P1. This buddy is required to provide K-safety=1.
P1_B2 The second buddy projection for P1. This buddy is required to provide K-
safety=2.
To support the database, one unsegmented superprojection must be created for each dimension
table on each node. In this case, one unsegmented superprojection must be created on each
node for the store_dimension table:
Node Unsegmented Projection
Node01 store_dimension_Node01
Creating Segmented Projections Example
The following SQL script creates the P1 projection and its buddies, P1_B1 and P1_B2, for the
retail_sales_fact table. The following syntax is significant:

l CREATE PROJECTION creates the named projection (retail_sales_fact_P1, retail_
sales_fact_ P1_B1, or retail_sales_fact_P1_B2).
l ALL NODES automatically segments the projections across all five nodes in the cluster without
specifically referring to each node.
l HASH evenly distributes the data across these nodes.
l OFFSET ensures that the same data is not stored on the same nodes for each of the buddies.
The first buddy uses OFFSET 1 to shift the storage locations by 1 and the second buddy uses
OFFSET 2 to shift the storage locations by 1. This is critical to ensure K-safety.
CREATE PROJECTION retail_sales_fact_P1 (
C1_retail_sales_fact_store_key ENCODING RLE ,
AS SELECT T_retail_sales_fact.store_key,
T_retail_sales_fact.pos_transaction_number,
T_retail_sales_fact.sales_dollar_amount,
T_retail_sales_fact.cost_dollar_amount
FROM retail_sales_fact T_retail_sales_fact
ORDER BY T_retail_sales_fact.store_key
----------------------------------------------------------
-- Projection # : 6
-- Projection storage (KBytes) : 4.8e+06
-- Note: This is a super projection for table: retail_sales_fact
CREATE PROJECTION retail_sales_fact_P1_B1 (
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODESOFFSET 1;
----------------------------------------------------------
-- Projection # : 6
SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODESOFFSET 2;

----------------------------------------------------------
Creating Unsegmented Projections Example
The following script uses the UNSEGMENTED ALL NODES syntax to create one unsegmented
superprojection for the store_dimension table on each node.
CREATE PROJECTION store_dimension ( C0_store_dimension_floor_plan_type ENCODING RLE ,
Adding Node to a Database
In this example, a fourth node (Node04) is being added to a three-node database cluster. The
database contains two tables: retail_sale_fact and store_dimension. It also contains the
following segmented and unsegmented (replicated) superprojections:
l Segmented projections
P1 and its buddy, B1, are projections for the retail_sale_fact table. They were created using
the ALL NODES syntax, so HP Vertica automatically segments the projections across all three
nodes.

Currently three unsegmented superprojections exist for the store_dimension table, one for
each node, as follows:
Node Unsegmented Projection
To support an additional node, replacement projections need to be created for the segmented
projections, P1 and B1. The new projections could be called P2 and B2, respectively. Additionally, an
unsegmented superprojection (store_dimension_Node04) needs to be created for the dimension
table on the new node (Node04).
Creating Segmented Projections Example
The following SQL script creates the original P1 projection and its buddy, B1, for the retail_sales_
fact table. Since the script uses the ALL NODES syntax, creating a new projection that includes
the fourth node is as easy as copying the script and changing the names of the projection and its
buddy to unique names (for example, P2 for the projection and P2_B2 for its buddy). The names that
need to be changed are highlighted within the example.
CREATE PROJECTION retail_sales_fact_P1 (  C1_retail_sales_fact_store_key ENCODING RLE ,
----------------------------------------------------------
-- Projection #                : 6

SEGMENTED BY HASH(T_retail_sales_fact.pos_transaction_number) ALL NODES
OFFSET 1;
----------------------------------------------------------
Creating Unsegmented Projections Example
The following script used the ALL NODES syntax to create the original three unsegmented
superprojections for the store_dimension table, one per node.
The following syntax is significant:
l CREATE PROJECTION creates a superprojection called store_dimension.
l ALL NODES automatically places a complete copy of the superprojection on each of the three
original nodes.
CREATE PROJECTION store_dimension (
C0_store_dimension_floor_plan_type ENCODING RLE ,
To create another copy of the superprojection on the fourth node (Node04), the best approach is to
create a copy of that projection on Node04 only. This means avoiding the ALL NODES syntax. The
following script shows how to create the fourth superprojection.
The following syntax is significant:

l CREATE PROJECTION creates a superprojection called store_dimension_Node04.
l UNSEGMENTED SITE Node04 creates the projection on just Node04.
CREATE PROJECTION store_dimension_Node04 ( C0_store_dimension_floor_plan_type ENCODING R
LE ,
UNSEGMENTED NODE Node04;

Implementing Security
In HP Vertica, there are three primary security concerns:
l Client authentication prevents unauthorized access to the database
l Connection encryption prevents the interception of data, as well as authenticating the identity of
the server and the client
l Client authorization (managing users and privileges) controls what users can access and
change in the database
Client Authentication
To gain access to HP Vertica, a user or client application must supply the name of a valid user
account. You can configure HP Vertica to require just a user name, but a more common practice is
to require an additional means of authentication, such as a password. There are several ways to
implement this added authentication:
l Password Authentication using passwords stored in the database.
l Authentication using outside means, such as LDAP or Kerberos.
You can use different authentication methods based on:
l Connection type
l Client IP address range
l User name for the client that is attempting to access the server
See Implementing Client Authentication.
Connection Encryption
To secure the connection between the client and the server, you can configure HP Vertica and
database clients to use Secure Socket Layer (SSL) to communicate. HP Vertica uses SSL to:
l Authenticate the server so the client can confirm the server's identity. HP Vertica supports
mutual authentication in which the server can also confirm the identity of the client. This
authentication helps prevent "man-in-the-middle" attacks.
l Encrypt data sent between the client and database server to significantly reduce the likelihood
that the data can be read if the connection between the client and server is compromised.
l Verify that data sent between the client and server has not been altered during transmission.
See Implementing SSL.

Client Authorization
Database users should have access to just the database resources they need to perform their
tasks. For example, some users need to query only specific sets of data. To prevent unauthorized
access to additional data, you can limit their access to just the data that they need to perform their
queries. Other users should be able to read the data but not be able to modify or insert new data.
Still other users might need more permissive access, such as the right to create and modify
schemas, tables, and views or even grant other users access to database resources.
A collection of SQL statements control authorization for the resources users can access. See
Managing Users and Privileges, in particular About Database Privileges. You can also use roles to
grant users access to a set of privileges, rather than directly grant the privileges for each user. See
About Database Roles.
Use the GRANT Statements to assign privileges to users and the REVOKE Statements to repeal
privileges. See the SQL Reference Manual for details.

Implementing Client Authentication
When a client (the user who runs a client application or the client application itself) connects to the
HP Vertica database server, it supplies the HP Vertica database user name to gain access. HP
Vertica restricts which database users can connect through client authentication, a process
where the database server establishes the identity of the requesting client and determines whether
that client is authorized to connect to the HP Vertica server using the supplied credentials.
HP Vertica offers several client authentication methods, which you set up using the
Administration Tools (see How to Create Authentication Records). Although you can configure
HP Vertica to require just a user name for connections, you likely require more secure means of
authentication, such as a password at a minimum.
Supported Client Authentication Types
HP Vertica supports the following types of authentication to prove a client's identity. For information
about syntax and formatting rules, see Authentication Record Format and Rules.
l Trust authentication—Authorizes any user that connects to the server using a valid user name.
l Reject authentication—Blocks the connection and prevents additional records from being
evaluated for the requesting client.
l Kerberos authentication—Uses a secure, single-sign-on, trusted third-party, mutual
authentication service to connect to HP Vertica using one of the following methods:
n krb5 uses the MIT Kerberos APIs (deprecated in HP Vertica7.0. Use the gss method).
n gss authentication uses the the GSSAPI standard and provides better compatibility with non-
MIT Kerberos implementations, such as for Java and Windows clients.
l Password authentication—Uses either md5 or password methods, which are similar except for
the manner in which the password is sent across the network:
n md5 method sends encrypted MD5-hashed passwords over the network, and the server
provides the client with salt.
n password method sends passwords over the network in clear text.
l LDAP authentication—Works like password authentication except the ldap method
authenticates the client against a Lightweight Directory Access Protocol server.
l Ident-based authentication—Authenticates the client against the username in an Ident server.
The method HP Vertica uses to authenticate a particular client connection can be automatically
selected on the basis of the connection type, client IP address, and user name. See About External
Authentication for more information.

If You Want Communication Layer Authentication
Topics in this section describe authentication methods supported at the database server layer. For
communication layer authentication between server and client, see Implementing SSL.

Password Authentication
The simplest method to authenticate a client connection is to assign the user account a password
in HP Vertica. If a user account has a password set, then the user or client using the account to
connect to the database must supply the correct password. If the user account does not have a
password set and HP Vertica is not configured to use another form of client authentication, the user
account is always allowed to log in.
Passwords are stored in the database in an encrypted format to prevent others from potentially
stealing them. However, the transmission of the password to HP Vertica is in plain text. This
means it is possible for a "man in the middle" attack to intercept the password. To secure the login,
consider implementing SSL security or MD5 authentication.
About Password Creation and Modification
A superuser creates passwords for user accounts when he or she runs the CREATE USER
statement. You can add a password afterword by using the ALTER USER statement. To change a
password, use ALTER USER or the vsql password command. A superuser can set any user
account's password. Users can change their own passwords.
To make password authentication more effective, enforce password policies that control how often
users are forced to change passwords and the required content of a password. These policies are
set using Profiles.
Default Password Authentication
By default, the vertica.conf file does not contain any authentication records. When the file is
empty, HP Vertica defaults to using password authentication for user accounts that have
passwords.
If you add authentication methods to vertica.conf, even for remote hosts, password
authentication is disabled. You must explicitly enable password authentication. To always enable
local users to log in using password authentication, you would add the following to the
vertica.conf file:
ClientAuthentication = local all password
The above entry allows users logged into a database host to connect to the database using HP
Vertica-based passwords, rather than some other form of authentication.
Profiles
You set password policies using profiles. A profile is a group of parameters that sets requirements
for user passwords. You assign users to a profile to set their password policy.
A profile controls:

l How often users must change their passwords.
l How many times users must change their passwords before they can reuse an old password.
l How many times users can fail to log in before their account is locked.
l The required length and content of the password (maximum and minimum amount of characters
and the minimum number of letters, capital letters, lowercase letters, digits, and symbols that
must be in a password).
You can create multiple profiles to enforce different password policies for different users. For
example, you might decide to create one profile for interactive users that requires them to frequently
change their passwords and another profile for user accounts that applications use to access the
database that aren't required to change passwords.
How You Create and Modify Profiles
You create profiles using the CREATE PROFILE statement and change profiles using ALTER
PROFILE. You can assign a user to a profile when you create the user (CREATE USER), or
afterwards using the ALTER USER statement. A user can be assigned to only one profile at a time.
All newly-created databases contain an initial profile named DEFAULT. All database users are
assigned to the DEFAULT profile if:
l You do not explicitly assign users a profile when you create them
l You drop the profile to which a user is currently assigned
You can change the policy parameters in the DEFAULT profile, but you cannot delete it.
Note: When upgrading from versions of HP Vertica prior to version 5.0, a DEFAULT profile is
added to each database, and all users are assigned to it.
The profiles you create can inherit some or all of their policy parameters from the DEFAULT profile.
When creating a profile using the CREATE PROFILE statement, any parameter you set to the
special value DEFAULT or any parameter to which you do not assign a value inherits its value from
the DEFAULT profile. Changing a parameter in the the DEFAULT profile changes that parameter's
value in every profile that inherited the parameter from DEFAULT.
When you assign users to a profile (or alter an existing profile that has users assigned to it), the
profile's policies for password content (maximum and minimum length, number of specific types of
characters) do not have an immediate effect on the users—HP Vertica does not test user's
passwords to ensure they comply with the new password criteria. These settings only affect the
users the next time they change their password. If you want to ensure users comply with the new
password policy, use the ALTER USER statement to expire user passwords. Users with expired
passwords are prompted to their change passwords when they next log in.
Note: Only the profile settings for how many failed login attempts trigger Account Locking and

how long accounts are locked have an effect on external password authentication methods
such as LDAP or Kerberos. All password complexity, reuse, and lifetime settings have an
effect on passwords managed by HP Vertica only.
See Also
l PROFILES
Password Expiration
Use profiles to control how often users must change their passwords. Initially, the DEFAULT profile
is set so that passwords never expire. You can change this default value, or you can create
additional profiles that set time limits for passwords and assign users to them.
When a password expires, the user is required to change his or her password when next logging in,
unless the profile to which the user is assigned has a PASSWORD_GRACE_TIME set. In that
case, the user is allowed to log in after the expiration, but HP Vertica warns about the password
expiration. Once the grace period elapses, the user is forced to change their password, unless they
have manually changed the password during the grace period.
Password expiration has no effect on any of the user's current sessions.
Note: You can expire a user's password immediately using the ALTER USER statement's
PASSWORD EXPIRE argument. Expiring a password is useful to force users to comply with a
change to their password policy, or when setting a new password for users who have forgotten
their old one.
Account Locking
One password policy you can set in a profile is how many consecutive failed login attempts (giving
the wrong password when trying to log in) a user account is allowed before the account is locked.
You set this value using the FAILED_LOGIN_ATTEMPTS parameter in the CREATE PROFILE or
ALTER PROFILE statement.
HP Vertica locks any user account that has more sequential failed login attempts than the value to
which you set FAILED_LOGIN_ATTEMPTS. A locked account is not allowed to log in, even if the user
supplies the correct password.
How to Unlock a Locked Account
There are two ways to unlock an account:
l A superuser can manually unlock the account using the ALTER USER command.
l HP Vertica automatically unlocks the account after the number of days set in the PASSWORD_
LOCK_TIME parameter of the user's profile has passed. However, if this parameter is set to

UNLIMITED, the user's account is never automatically unlocked and a superuser must be
manually unlock it.
This locking mechanism helps prevent dictionary-style brute-force attempts to crack users'
passwords.
Note: A superuser account cannot be locked, since it is the only user that can unlock
accounts. For this reason, you should ensure that you choose a very secure password for a
superuser account. See Password Guidelines for suggestions on choosing a secure password.
The following examples demonstrates failing to log in to an account whose profile is set to lock
accounts after three failed tries:
> vsql -U dbuserPassword:
vsql: FATAL: Invalid username or password
> vsql -U dbuser
Password:
vsql: FATAL: Invalid username or password
> vsql -U dbuser
Password:
vsql: FATAL: The user account "dbuser" is locked due to too many invalid logins
HINT: Please contact the database administrator
> vsql -U dbuser
Password:
vsql: FATAL: The user account "dbuser" is locked due to too many invalid logins
Password Guidelines
For passwords to be effective, they must be hard to guess. You need to protect passwords from:
l Dictionary-style brute-force attacks
l Users who have knowledge of the password holder (family names, dates of birth, etc.)
Use Profiles to enforce good password practices (password length and required content), and make
sure database users know not to use personal information in their passwords.
What to Use
Consider the following password guidelines, published by the Internet Engineering Task Force
(IETF), when you create passwords:
l Use mixed-case characters.
l Use non-alphabetic characters (for example, numeric digits and punctuation).

l Use a password that is easy to remember, so you don't need to write it down; for example,
i3atSandw1ches! instead of !a#^*!$&D)z.
l Use a password that you can type quickly without having to look at the keyboard.
What to Avoid
Avoid using the following practices to create a password:
l Do not use your login or user name in any form (as-is, reversed, capitalized, doubled, and so on).
l Do not use your first, middle, or last name in any form.
l Do not use your spouse's, partner's, child's, parent's, friend's, or pet's name in any form.
l Do not use other information easily obtained about you, including your date of birth, license plate
number, telephone number, Social Security number, make of your automobile, house address,
and so on.
l Do not use a password of all digits or all the same letter.
l Do not use a word contained in English or foreign language dictionaries, spelling lists, acronym
or abbreviation lists, or other lists of words.
l Do not use a password that contains fewer than six characters.
l Do not give your password to another person for any reason.
See Also
l Creating a Database Name and Password

About External Authentication
To help you implement external authentication methods, HP Vertica provides an editing
environment within the Administration Tools that lets you create, edit, and maintain
authentication records. The Administration Tools also verifies that the authentication records are
correctly formed, inserts the records into the vertica.conf configuration file, and implements the
changes on all cluster nodes.
The vertica.conf file supports multiple records, one per line, to provide options for client sessions
that might require a variety of authentication methods. Each record establishes the authentication
method to use based on:
l Connection type
l Client IP address range
l User name for the client that is attempting to access the database
For example, you could use multiple records to have application logins authenticated using HP
Vertica-based passwords, and interactive users authenticated using LDAP. See Example
Authentication Records.
HP Vertica uses the first record with a matching connection type, client address, and user name to
authenticate that connection. If authentication fails, the client is denied access to HP Vertica.
Access is also denied if no records match the client session. If, however, there are no records (the
DBA did not configure vertica.conf), HP Vertica reverts to using the user name and password (if
created) to control client access to the database.
Setting up Your Environment to Create Authentication
Records
Editing of vertica.conf is performed by the text editor set in your Linux or UNIX account's
VISUAL or EDITOR environment variable. If you have not specified a text editor, HP Vertica uses
the vim (vi) editor.
To switch your editor from vi to GNU Emacs, run the following command before you run the
Administration Tools:
$ export EDITOR=/usr/bin/emacs
You can also add the above line to the .profile file in your home directory to always use GNU
Emacs to edit the authentication records.
Caution: Never edit vertica.conf directly, because Administration Tools performs error
checking on your entries before adding them to the vertica.conf.

About Local Password Authentication
If you add authentication methods to vertica.conf but still want password authentication to work
locally, you must explicitly add a password authentication entry. See Password Authentication for
details.
How to Create Authentication Records
In this procedure, you'll use the Administration Tools to specify the authentication methods to use
for various client sessions.
How to Create Authentication Records
1. On the Main Menu in the Administration Tools, select View Database Cluster State, verify
that all cluster nodes are UP, and click OK.
2. Select Configuration Menu, and click OK.
3. On the Configuration Menu, select Edit Authentication, and click OK.
4. Select the database you want to create authentication records for and click OK.
Your system's default editor opens the vertica.conf file .
5. Enter one or more authentication records.
Tip: See Authentication Record Format and Rules and Authentication Record Format and
Rules for information about the content and rules required to create a record.
6. When you have finished entering authentication records, exit the editor. For example, in vi,
press the Esc key and type :wq to complete your editing session.
The Administration Tools verifies that the records are correctly formed and does one of the
following, the first of which prompts you for action:
n If the records are properly formed, they are inserted into the vertica.conf file, and the file
is automatically copied to other nodes in the database cluster. You are prompted to restart
the database. Click OK and go to step 7.
n If the records are not properly formed, a message describes the problem and gives you the
opportunity to: edit your errors (e), exit without saving your changes (a), or save and
implement your changes anyway (s). Saving your changes is not recommended because it
can cause client authentication to fail.
7. Restart the database.

If You Do Not Specify a Client Authentication Method
If you do not insert records into the vertica.conf file, HP Vertica defaults to the username and
password (if supplied) to grant access to the database. If you later add authentication methods, the
username/password default is no longer enabled. To continue using password authentication, you
must explicitly add it as described in Password Authentication.
See Also
l How to Modify Authentication Records
Authentication Record Format and Rules
If the ClientAuthentication record introduced in Security Parameters does not exist, HP Vertica
uses the password method to authenticate client connections. Otherwise, each authentication
record takes the following format:
Syntax
ClientAuthentication = connection_type user_name address method

Arguments
connection_type The access method the client uses to connect to an instance. Valid values are:
l local — Matches connection attempts made using local domain sockets.
When using the local connection type, do not specify the <address>
parameter.
l host — Matches connection attempts made using TCP/IP. Connection
attempts can be made using a plain (non-SSL) or SSL-wrapped TCP
socket.
l hostssl — Matches a SSL TCP connection only.
l hostnossl — Matches a plain TCP socket only.
Notes about client connections:
l Avoid using -h <hostname> from the client if a "local" connection type is
specified and you want to match the client authentication entry.
l Use -h <hostname> from the client if you specify a Kerberos connection
method (gss or krb5) connection method.
See the vsql command line option h Hostname --host Hostname.
user_name Identifies the client's user name that match this record. Valid user names are:
l all — Matches all users.
l One or more specific user names.
The user_name argument accepts either a single value or concatenated
values. To concatenate values, use a plus sign between the values, for
example: user1+user2.

address Identifies the client machine IP address range that match this record. Use a
format of <IP_Address>/<netmask_value>. You must specify the IP address
numerically, not as domain or host names. HP Vertica supports the following
formats:
l w.x.y.z/<mask_format> (For example, 10.10.0.25/24.)
l The mask length indicates the number of high-order bits of the client IP
address that must match. Do not insert white space between the IP
address, the slash (/), and the Classless Inter-Domain Routing (CIDR)
mask length.
l Separate dotted-IP address and mask values (For example, 10.10.0.25
255.255.255.0.)
l To allow users to connect from any IP address, use the value 0.0.0.0/0.
Note: If you are working with a multi-node cluster, be aware that any
IP/netmask settings in host-based ClientAuthentication parameters (host,
hostssl, or hostnossl) must match all nodes in the cluster. This setup allows
the database owner to authenticate with and administer every node in the
cluster. For example, specifying 10.10.0.8/30 would allow a CIDR address
range of 10.10.0.8–10.10.0.11.

method Identifies the authentication method to use for clients that match this record.
Use one of the following methods:
l trust — Authenticates clients based on valid user names only. You might
implement trust if a user connection has already been authenticated
through some external means, such as SSL or a firewall.
l reject — Rejects the connection and prevents additional records from
being evaluated for the client. This method is useful for blocking clients by
user name or IP address.
l gss — Authenticates the client using the GSSAPI standard, allowing for
better compatibility with non-MIT kerberos implementations, such as Java
and Windows clients. (HP Vertica follows RFC 1964.)
krb5 — Authenticates the client using the MIT Kerberos APIs. This method
is deprecated in HP Vertica 7.0. Use the gss method for all new records and
modify existing krb5 records to use gss as soon as possible.
l password — Requires that the client supply an unencrypted password for
authentication. Because the password is sent over the network in clear
text, never use this method on untrusted networks.
l md5 — Requires that the client supply a Message-Digest Algorithm 5 (MD5)
password across the network for authentication. By default, passwords are
encrypted MD5-hashed passwords and the server provides the client with
salt.
l ldap — Authenticates the client against a Lightweight Directory Access
Protocol (LDAP) server. This method is useful if your application uses
LDAP to query directory services.
l ident — Authenticates the client using an Ident server (HP Vertica follows
RFC 1413). Use this method only when the Ident server is installed on the
same system as the HP Vertica database server.
Formatting Rules
When you create authentication records, be aware of the following formatting rules:
l Only one authentication record is allowed per line.
l Each authentication record must be on one line.
l Fields that make up the authentication record can be separated by white space or tabs.
l Other than IP addresses and mask columns, field values cannot contain white space.
l Place more specific rules (a specific user or IP address) before broader rules (all users or a range

of IP addresses).
Note: The order of rules is important. HP Vertica scans the list of rules from top to bottom
and uses the first rule that matches the incoming connection.
See Also
l Security Parameters
l How to Create Authentication Records
l Example Authentication Records
Configuring LDAP Authentication
Lightweight Directory Access Protocol (LDAP) authentication works like password authentication.
The main difference is that the LDAP method authenticates clients trying to access your HP
Vertica database against an LDAP or Active Directory server. Use LDAP authentication when your
database needs to authenticate a user with an LDAP or Active Directory server.
For details about configuring LDAP authentication, see:
l What You Need to Know to Configure LDAP Authentication
l LDAP Configuration Considerations
l Workflow for Configuring LDAP Bind
l Workflow for Configuring LDAP Bind and Search
l Configuring Multiple LDAP Servers
What You Need to Know to Configure LDAP Authentication
Before you configure LDAP authentication for your HP Vertica database, review the following
information:
l Prerequisites for LDAP Authentication
l Terminology for LDAP Authentication
l Bind vs. Bind and Search
l LDAP Anonymous Binding
l Using LDAP over SSL and TLS

l LDAP Configuration Considerations
l Workflow for Configuring LDAP Bind
l Workflow for Configuring LDAP Bind and Search
l Configuring Multiple LDAP Servers
Prerequisites for LDAP Authentication
Before you configure LDAP authentication for your HP Vertica database you must have:
l IP address and host name for the LDAP server
l Your organization's Active Directory information
l A service account for bind and search
l Administrative access to your HP Vertica database
l open-ldap-tools package installed on at least one node
This package includes ldapsearch.
Terminology for LDAP Authentication
An LDAP authentication record contains the following components:
l Host—IP address or host name of the LDAP server.
l Common name (CN)—Depending on your LDAP environment, this value can be the user's
username or the user's full first and last name.
l Distinguished name (DN)—domain.com.
l Organizational unit (OU)—Unit in the organization with which the user is associated, for
example, Vertica Users.
l Domain component (DC)—Comma-separated list that contains your organization's domain
component broken up into separate values, for example:
dc=vertica, dc=com
l sAMAccountName—An Active Directory user account field. This value is usually the attribute to
be searched when you use bind and search against the Microsoft Active Directory server.
l UID—A commonly used LDAP account attribute used to store a username.

l Bind—LDAP authentication method that allows basic binding using the CN or UID.
l Bind and search—LDAP authentication method that needs to log in to the LDAP server to
search on the specified attribute.
l Service account—An LDAP user account that can be used to log in to the LDAP server during
bind and search. This account's password is usually shared.
l Anonymous binding—Allows a client to connect and search the directory (bind and search)
without needing to log in.
l ldapsearch—A command-line utility to search the LDAP directory. It returns information that
you use to configure LDAP bind and search.
l basedn—Distinguished name where the directory search should begin.
l binddn—Domain name to find in the directory search.
l searchattribute—UID of the user trying to connect.
DBADMIN Authentication Access and LDAP
The DBADMIN user should have easy access to the database at all times.
Typically, the DBADMIN account does not use the LDAP authentication to access the HP Vertica
database. The DBADMIN account should authenticate against the database using local trust or
local password authentication.
The Administrative Tools utility uses host authentication. Hewlett-Packard recommends that you
also configure host authentication for the DBADMIN user. Otherwise, you may run into
authentication problems when using the Administration Tools and LDAP authentication.
For example, add lines like the following to the vertica.conf file:
ClientAuthentication = local dbadmin trust;
ClientAuthentication = host dbadmin password;
l The first line configures local authentication for the DBADMIN user. The user can use vsql with
the -h option and does not need to enter a password.
l The second line configures host authentication for DBADMIN, allowing the user to access the
HP Vertica database using the assigned password. The DBADMIN user can access the
database using vsql -h, the Administrative Tools, or any other tools that connects to HP
Vertica.
Bind vs. Bind and Search
There are two LDAP methods that you use to authenticate your HP Vertica database against an
LDAP server.

l Bind—Use LDAP bind when HP Vertica connects to the LDAP server and binds using the CN
and password (the username and password of the user logging into the database). Use the bind
method when your LDAP account's CN field matches that of the username defined in your
database.
l Bind and search—Use LDAP bind and search when your LDAP account's CN field is a user's
full name or does not match the username defined in your database. For bind and search, the
username is usually in another field such as UID or sAMAccountName in a standard Active
Directory environment. Bind and search requires your organization's Active Directory
information to enable HP Vertica to log into the LDAP server and search on the specified
username field.
If you are using bind and search, having a service account simplifies your server side
configuration. In addition, you do not need to store your Active Directory password in clear text.
LDAP Anonymous Binding
Anonymous binding is an LDAP server function. Anonymous binding allows a client to connect and
search the directory (bind and search) without logging in because binddn and bindpasswd are not
needed.
You also do not need to log in when you configure LDAP authentication using Management
Console.
Using LDAP over SSL and TLS
If the URL in the authentication record in vertica.conf includes ldaps, the HP Vertica server
uses SSL on the specified port or on the LDAPS port (636). If the LDAP server does not support
SSL on that port, authentication fails.
If the method in the authentication record is ldaps, the HP Vertica server sends a StartTLS
request. This request determines if the LDAP server supports TLS on the specified port or on the
default LDAP port (389). If the LDAP server does not support TLS on that port, the HP Vertica
server proceeds with the authentication without TLS.
To use LDAP over SSL and TLS, you must specify the location of your certificate file in your
ldap.conf file on all nodes:
TLS_CACERT /full-path-to-ca-certificate-file.crt
LDAP Configuration Considerations
Before you configure LDAP authentication for your HP Vertica database, consider the following
steps. These recommendations can improve the effectiveness of LDAP-based security on your
system:

l Create a service account with your LDAP server. A service account is a single account that
is specifically set up so that individual account names and passwords are not used for
LDAP access.You create a service account and use that in your LDAP URL to avoid use of
account names and passwords, which change often. If you add, remove, or change users, you
do not have to modify the LDAP URL. Having a service account allows you to restrict individual
users from searching the LDAP server, but it allows applications like HP Vertica to search the
server.
l Set up an organizational unit (OU). Create an Active Directory OU, add all the HP Vertica
users to the OU, and specify it in the LDAP URL. Doing so allows the LDAP server to search
just the HP Vertica OU for the user, minimizing the search time. In addition, using OUs prevents
changes to the users' OUs for other applications have no impact on HP Vertica.
l Make sure that the DBADMIN user can always access the database locally. The
DBADMIN user should not be required to authenticate. If a problem occurs with the
LDAP authentication that blocks all users from logging in, the DBADMIN user needs access to
correct the problem.
Workflow for Configuring LDAP Bind
To configure your HP Vertica database to authenticate clients using LDAP bind, follow these steps:
1. Verify that the user's LDAP account attribute that you search for matches their Vertica
username.
For example, if John Smith's Active Directory (AD) sAMAccountName = jsmith, his HP
Vertica username must also be jsmith.
Note: For detailed information about creating database users, see CREATE USER.
2. Run ldapsearch from an HP Vertica node against your LDAP or AD server. Verify the
connection to the server and identify the value of relevant fields. Running ldapsearch helps
you build the client authentication string needed to configure LDAP authentication.
In the following example, ldapsearch returns the CN, DN, and sAMAccountName fields (if
they exist) for any user whose CN contains the username jsmith. This search succeeds only
for LDAP servers that allow anonymous binding:
ldapsearch -x -h 10.10.10.10 -b "ou=Vertica Users,dc=CompanyCorp,dc=com"
'(cn=jsmith*)' cn dn uid sAMAccountName
ldapsearch returns the following results. The relevant information for LDAP bind is in bold:
# extended LDIF

#
# LDAPv3
# base <ou=Vertica Users,dc=CompanyCorp,dc=com> with scope subtree
# filter: (cn=jsmith*)
# requesting: cn dn uid sAMAccountName
#
# jsmith, Users, CompanyCorp.com
dn:cn=jsmith,ou=Vertica Users,dc=CompanyCorp,dc=com
cn: jsmith
uid: jsmith
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
3. Assemble the ClientAuthentication string as follows:
ClientAuthentication = host all 0.0.0.0/0 ldap "ldap://10.10.10.10/basedn;cn=;,
OU=Vertica Users,DC=CompanyCorp,DC=com"
Because in the LDAP entry, the CN is username jsmith, you do not need to set it. HP Vertica
automatically sets the CN to the username of the user who is trying to connect. HP Vertica
uses that CN to bind against the LDAP server. You can also use the UID for this account to
perform the same bind operation.
Workflow for Configuring LDAP Bind and Search
To configure your HP Vertica database to authenticate clients using LDAP bind and search, follow
these steps:
1. If required, obtain a service account, as described in LDAP Configuration Considerations.
2. Verify that the user's LDAP account attribute that you search on matches their Vertica
username.
For example, if John Smith's Active Directory (AD) sAMAccountName = jsmith, his HP
Vertica username must also be jsmith. HP Vertica usernames cannot have spaces.
3. Run ldapsearch from an HP Vertica node against your LDAP or AD server. Verify the
connection to the server and identify the value of relevant fields. Running ldapsearch helps
you build the client authentication string needed to configure LDAP authentication.
In the following example, ldapsearch returns the CN, DN, and sAMAccountName fields (if
they exist) for any user whose CN contains the username John. This search succeeds only for
LDAP servers that allow anonymous binding:

ldapsearch -x -h 10.10.10.10 -b 'OU=Vertica Users,DC=CompanyCorp,DC=com' -s sub -D
'CompanyCorpjsmith' -W '(cn=John*)' cn dn uid sAMAccountNam
4. ldapsearch returns the following results. The relevant information for bind and search is in
bold:
# extended LDIF
#
# LDAPv3
# base <OU=Vertica Users,DC=CompanyCorp,DC=com> with scope subtree
# filter: (cn=John*)
# requesting: cn dn sAMAccountName
#
# John Smith, Vertica Users, CompanyCorp.com
dn: CN=John Smith,OU=Vertica Users,DC=CompanyCorp,DC=com
cn: John Smith
sAMAccountName: jsmith
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
5. Assemble the ClientAuthentication string. Because the sAMAccountName attribute contains
the username you want, jsmith, set your search attribute to that field for the search to find the
appropriate account.
These strings act just as any other ClientAuthentication string. Please see Authentication
Record Format and Rules for detailed information about setting these strings up in your
database configuration.
ClientAuthentication = host all 0.0.0.0/0 ldap "ldap://10.10.10.10/search;basedn=OU=V
ertica Users,
DC=CompanyCorp,DC=com;binddn=COMPANYCORPjsmith;bindpasswd=<password>;
searchattribute=sAMAccountName"
Configuring Multiple LDAP Servers
In the vertica.conf file, the ClientAuthentication record can contain multiple LDAP URLs,
separated by single spaces.
The following record specifies that the LDAP server to search the entire directory
(basedn=dc=example,dc=com) for a DN with an OU attribute that matches Sales. If the search
returns no results or otherwise fails, the LDAP server searches for a DN with the OU attribute that
matches Marketing:
ClientAuthentication = host all 10.0.0.0/8 ldap
"ldap://ldap.example.com/search;

basedn=dc=example,dc=com"
"ldap://ldap.example.com/search;
basedn=dc=example,dc=com"
Configuring Ident Authentication
The Ident protocol, defined in RFC 1413, identifies the system user of a particular connection. You
configure HP Vertica client authentication to query an Ident server to see if that system user can log
in as a certain database user without specifying a password. With this feature, system users can
run automated scripts to execute tasks on the HP Vertica server.
Caution: Ident responses can be easily spoofed by untrusted servers. Ident authentication
should take place only on local connections, where the Ident server is installed on the same
computer as the HP Vertica database server.
ClientAuthentication Records for Ident Authentication
To configure Ident authentication, the ClientAuthentication record in the vertica.conf file
must have one of the following formats:
ClientAuthentication = local <database_user> ident systemusers=<systemuser1:systemuser2:.
..> [ continue ]ClientAuthentication = local <database_user> ident [ continue ]
Where:
l local indicates that the Ident server is installed on the same computer as the database, a
requirement for Ident authentication on HP Vertica.
l <database_user>: The name of any valid user of the database. To allow the specified system
users to log in as any database user, use the word all instead of a database user name.
l <systemuser1:systemuser2:...>: Colon-delimited list of system user names.
l continue: Allows system users not specified in the systemusers list to authenticate using
methods specified in subsequent ClientAuthentication records. The continue keyword can
be used with or without the systemusers list.
The following examples show how to configure Ident authentication in HP Vertica:
l Allow the system's root user to log in to the database as the dbadmin user:
ClientAuthentication = local dbadmin ident systemusers=root
l Allow system users jsmith, tbrown, and root to log in as database user user1:

ClientAuthentication = local user1 ident systemusers=jsmith:tbrown:root
l Allow system user jsmith to log in as any database user:
ClientAuthentication = local all ident systemusers=jsmith
l Allow any system user to log in as the database user of the same name:
ClientAuthentication = local all ident
l Allow any system user to log in as user1:
ClientAuthentication = local user1 ident systemusers=*
l Allow the system user backup to log in as dbadmin without a password and allow all other
system users to log in a dbadmin with a password:
ClientAuthentication = local dbadmin ident systemusers=backup continue, local dbadmin
password
l Allow all system users to log in as the database user with the same name without a password,
and log in as other database users with a password:
ClientAuthentication = local all ident continue, local all password
Installing and Configuring an Ident Server
To use Ident authentication, you must install the oidentd server and enable it on your HP Vertica
server. oidentd is an Ident daemon that is compatible with HP Vertica and compliant with RFC
1413.
To install and configure oidentd on Red Hat Linux for use with your HP Vertica database, take these
steps:
1. To install oidentd on Red Hat Linux, run this command:
$ yum install oidentd
Note: The source code and installation instructions for oidentd are available at the oidentd

website.
2. For Ident authentication to work, the Ident server must accept IPv6 connections. To make sure
this happens, you need to start oidentd with the argument -a ::. In the script
/etc/init.d/oidentid, change the line
exec="/usr/sbin/oidentd"
to
exec="/usr/sbin/oidentd -a ::"
3. Restart the server with the following command:
/etc/init.d/oidentd restart
Example Authentication Records
The following examples show several different authentication records.
Using an IP Range and Trust Authentication Method
The following example allows the dbadmin account to connect from any IP address in the range of
10.0.0.0 to 10.255.255.255 without a password, as long as the connection is made without using
SSL:
ClientAuthentication = hostnossl dbadmin 10.0.0.0/8 trust
Note: If this is the only authentication record in vertica.conf file, dbadmin will be the only
user that is able to log in.
Using Multiple Authentication Records
When the vertica.conf file contains multiple authentication records, HP Vertica scans them from
top to bottom and uses the first entry that matches the incoming connection to authenticate (or
reject) the user. If the user fails to authenticate using the method specified in the record, HP Vertica
denies access to that user. You can use this behavior to include records that enable or reject
specific connections and end with one or more "catch-all" records. The following example
demonstrates setting up some specific records, followed by some catch-all records:

ClientAuthentication = host alice 192.168.1.100/32 reject
ClientAuthentication = host alice 192.168.1.101/32 trust
ClientAuthentication = host all 0.0.0.0/0 password
ClientAuthentication = local all password
The first two records apply only to the user alice. If alice attempts to connect from 192.168.1.100,
the first record is used to authenticate her, which rejects her connection attempt. If she attempts to
connect from 192.168.1.101, she is allowed to connect automatically. If alice attempts to log in
from any other remote system, the third record matches, and she must enter a password. Finally, if
she attempts to connect locally from a node in the cluster, the fourth record applies, and she again
has to enter a password to authenticate herself. For all other users, the third and fourth record are
used to authenticate them using password authentication. The first two records are ignored, since
their user name doesn't match the name in the record.
Record Order
The ordering of the records is important. If the order of the records were reversed, so that the
wildcard rule was first, the rules that are specific to alice would never be used. The wildcard or local
rule would always match, and HP Vertica would use the password authentication, no matter where
alice connected from.
How to Modify Authentication Records
To modify an existing authentication record, use the Administration Tools or set the
ClientAuthentication configuration parameter.
Using the Administration Tools
The advantages of using the Administration Tools are:
l You do not have to connect to the database
l The editor verifies that records are correctly formed
l The editor maintains records so they are available to you to edit later
Note: You must restart the database to implement your changes.
For information about using the Administration Tools to create and edit authentication records, see
How to Create Authentication Records.
Using the ClientAuthentication Configuration Parameter
The advantage of using the ClientAuthentication configuration parameter is that the changes
are implemented immediately across all nodes within the database cluster. You do not need to
restart the database.

However, all the database nodes must be up and you must connect to the database before you set
this parameter. Most importantly, this method does not verify that records are correctly formed and
it does not maintain the records so you can modify them later.
New authentication records are appended to the list of existing authentication records. Because HP
Vertica scans the list of records from top to bottom and uses the first record that matches the
incoming connection, you might find your newly-added record does not have an effect if HP Vertica
used an earlier record instead.
To configure client authentication through a connection parameter, use the SET_CONFIG_PARAMETER
function:
=> SELECT SET_CONFIG_PARAMETER('ClientAuthentication,'
'connection type user name address method');
When you specify authentication records, make sure to adhere to the following guidelines:
l Fields that make up the record can be separated by white space or tabs
l Other than IP addresses and mask columns, field values cannot contain white space
For more information, see Authentication Record Format and Rules.
Examples
The following example creates an authentication record for the trust method:
=> SELECT SET_CONFIG_PARAMETER('ClientAuthentication',
'hostnossl dbadmin 0.0.0.0/0 trust');
The following example creates an authentication record for the LDAP method:
=> SELECT SET_CONFIG_PARAMETER('ClientAuthentication', 'host all 10.0.0.0/8
ldap "ldap://summit.vertica.com;cn=;,dc=vertica,dc=com"');
The following example specifies three authentication records. In a single command, separate each
authentication record by a comma:
=> SELECT SET_CONFIG_PARAMETER('ClientAuthentication',
'hostnossl dbadmin 0.0.0.0/0 trust, hostnossl all 0.0.0.0/0 md5,
local all trust');
Implementing Kerberos Authentication
Kerberos authentication is different from user name/password authentication. Instead of
authenticating each user to each network service, Kerberos uses symmetric encryption through a
trusted third party, called the Key Distribution Center (KDC). In this environment, clients and
servers validate their authenticity by obtaining a shared secret (ticket) from the KDC, after which
clients and servers can talk to each other directly.

Note: Topics in this section describe how to configure the HP Vertica server and clients for
Kerberos authentication. This section does not describe how to install, configure, or administer
a Key Distribution Center. You can obtain the Kerberos 5 GSSAPI distribution for your
operating system from the MIT Kerberos Distribution Page.
Kerberos Prerequisites
At a minimum you must meet the following requirements to use Kerberos authentication with the
HP Vertica server and client drivers.
Kerberos server
Your network administrator should have already installed and configured one or more Kerberos Key
Distribution Centers (KDC), and the KDC must be accessible from every node in your Vertica
Analytic Database cluster.
The KDC must support Kerberos 5 via GSSAPI. For details, see the MIT Kerberos Distribution
Page.
Client package
The Kerberos 5 client package contains software that communicates with the KDC server. This
package is not included as part of the HP Vertica Analytics Platform installation. If the Kerberos 5
client package is not present on your system, you must download and install it on all clients and
servers involved in Kerberos authentication (for example, each HP Vertica and each HP Vertica
client), with the exception of the KDC itself.
Kerberos software is built into Microsoft Windows. If you are using another operating system, you
must obtain and install the client package.
Refer to the Kerberos documentation for installation instructions, such as on the MIT website,
including the MIT Kerberos Distribution page.
Client/server identity
Each client (users or applications that will connect to HP Vertica) and the HP Vertica server must
be configured as Kerberos principals. These principals authenticate using the KDC.
Each client platform has a different security framework, so the steps required to configure and
authenticate against Kerberos differ among clients. See the following topics for more information:
l Configure HP Vertica for Kerberos Authentication
l Configure Clients for Kerberos Authentication.

Configure HP Vertica for Kerberos Authentication
To set up HP Vertica for Kerberos authentication, perform a series of short procedures described in
the following sections:
l Install the Kerberos 5 client package
l Create the HP Vertica principal
l Create the keytab
l Specify the location of the keytab file
l Point machines at the KDC and configure realms
l Inform HP Vertica about the Kerberos principal
l Configure the authentication method for all clients
l Restart the database
l Get the ticket and authenticate HP Vertica with the KDC
Install the Kerberos 5 client package
See Kerberos Prerequisites.
Create the HP Vertica principal
You can create the Vertica Analytic Database principal on any machine in the Kerberos realm,
though generally, you'll perform this task on the KDC. This section describes how to create the
Vertica Analytic Database principal on Linux and Active Directory KDCs.
Creating the Vertica Analytic Database principal on a Linux KDC
Start the Kerberos 5 database administration utility (kadmin or kadmin.local) to create an Vertica
Analytic Database principal on a Linux KDC.
l Use kadmin if you are accessing the KDC on a remote server. You can use kadmin on any
machine that has the Kerberos 5 client package installed, as long as you have access to the
Kerberos administrator's password. When you start kadmin, the utility will prompt you for the
Kerberos administrator's password. You might need root privileges on the client system in order
to run kadmin.
l Use kadmin.local if the KDC is on the machine you're logging in to and you have root
privileges on that server. You might also need to modify your path to include the location of the
kadmin.local command; for example, try setting the following path:
/usr/kerberos/sbin/kadmin.local.

The following example creates the principal vertica on the EXAMPLE.COM Kerberos realm:
$ sudo /usr/kerberos/sbin/kadmin.local
kadmin.local add_principal vertica/vcluster.example.com
For more information about the kadmin command, refer to the kadmin documentation.
Creating the Vertica Analytic Database principal on an Active Directory KDC
To configure Vertica Analytic Database for Kerberos authentication on Active Directory, you will
generally can most likely add the Vertica Analytic Database server and clients to an existing Active
Directory domain. You'll need to modify the Kerberos configuration file (krb5.conf) on the Vertica
Analytic Database server to make sure all parties support encryption types used by the Active
Directory KDC.
If you need to configure encryption on the KDC:
l Open the Local Group Policy Editor (gpedit.msc) and expand Computer Configuration >
Windows Settings > Security Settings > Local Policies > Security Options, and double-click
Network security: Configure encryption types allowed for Kerberos.
l Refresh the local and Active Directory-based Group Policy settings, including security settings,
by running the command gpupdate /force
l Use the ktpass command to configure the server principal name for the host or service in Active
Directory and generate a .keytab file; for example:
ktpass -out ./host.vcluster.example.com.keytab
-princ host/vcluster.example.com@EXAMPLE.COM
-mapuser vcluster
-mapop set -pass <password>
-crypto <encryption_type> -ptype <principal_type>
ktpass -out ./vertica.vcluster.example.com.keytab
-princ vertica/vcluster.example.com@EXAMPLE.COM
-mapuser vertica
-mapop set -pass <password>
-crypto <encryption_type> -ptype <principal_type>
For more information, see the Technet.Microsoft.com Ktpass page. See also "Create the keytab
" below.
You can view a list of the Service Principal Names that a computer has registered with Active
Directory by running the setspn –l hostname command, where hostname is the host name of
the computer object that you want to query; for example:
setspn -L vertica
Registered ServicePrincipalNamefor CN=vertica,CN=Users,
EXAMPLE=example,EXAMPLE=com
vertica/vcluster.example.com

setspn -L vcluster
Registered ServicePrincipalNamefor CN=vertica,CN=Users,
EXAMPLE=example,EXAMPLE=com
host/vcluster.example.com
Create the keytab
The keytab is an encrypted, local copy of the host's key that contains the credentials for the HP
Vertica principal (its own principal and key), so the HP Vertica server can authenticate itself to the
KDC. The keytab is required so that Vertica Analytic Database doesn’t have to prompt for a
password.
Before you create the keytab file
You do not need to create a keytab file on the KDC; however, a keytab entry must reside on each
node in the HP Vertica cluster. The absolute path to the keytab file must be the same on every
cluster node.
You can generate a keytab on any machine in the Kerberos realm that already has the Kerberos 5
client package installed, as long as you have access to the Kerberos administrator's password.
Then you can copy that file to each Vertica Analytic Database node.
Generating a keytab file on Linux systems
On Linux, the default location for the keytab file is /etc/krb5.keytab. You might need root
privileges on the client system to run the kadmin utility.
1. To generate a keytab or add a principal to an existing keytab entry, use the ktadd command
from the kadmin utility, as in the following example:
$ sudo /usr/kerberos/sbin/kadmin -p vertica/vcluster.example.com -q
"ktadd vertica/vcluster.example.com" -r EXAMPLE.COM
Authenticating as principal vertica/vcluster.example.com
with password.
2. Make the keytab file readable by the file owner who is running the process (typically the Linux
dbadmin user); for example, you can change ownership of the files to dbadmin as follows:
$ sudo chown dbadmin *.keytab
Important: In a production environment, you must control who can access the keytab file
to prevent unauthorized users from impersonating your server.
3. Copy the keytab file to the /etc folder on each cluster node.
After you create the keytab file, you can use the klist command to view keys stored in the file:

$ sudo /usr/kerberos/bin/klist -ke -t
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- -------------------------------------------------
4 vertica/vcluster.example.com@EXAMPLE.COM
Generating a keytab file for Active Directory
Use the ktutil command to read, write, or edit entries in a Kerberos 5 keytab file. The keytab
entries were created in the previous example, and they created a principal name for the host and
service in Active Directory.
1. On any Vertica Analytic Database node use the ktutil command to read the Kerberos 5
keytab file keytab into the current keylist (from the keytab entries you created using ktpass -
out):
$ /usr/kerberos/sbin/ktutil
ktutil: rkt host.vcluster.example.com.keytab
ktutil: rkt vertica.vcluster.example.com.keytab
ktutil: list
KVNO Principal
---- ----------------------------------------------------
3 host/vcluster.example.com@EXAMPLE.COM
ktutil: wkt vcluster.example.com.keytab
ktutil: exit
2. Make the keytab file readable by the file owner who is running the process (the administrator)
with no permissions for group or other:
$ chmod 600 vcluster.example.com.keytab
3. Copy the keytab file to the catalog directory.
Specify the location of the keytab file
Using the Administration Tools, log in to the database as the database administrator (usually
dbadmin) and set the KerberosKeyTabFile configuration parameter to point to the location of the
keytab file; for example:
> SELECT set_config_parameter('KerberosKeytabFile', '/etc/krb5.keytab');
See Kerberos Authentication Parameters for more information.

Point machines at the KDC and configure realms
Each client and Vertica Analytic Database server in the Kerberos realm must have a valid,
identically-configured Kerberos configuration (krb5.conf) file in order to know how to reach the
KDC.
If you use Microsoft Active Directory, you won't need to perform this step. Refer to the Kerberos
documentation for your platform for more information about the Kerberos configuration file on Active
Directory.
At a minimum, you must configure the following sections in the krb5.conf file. See the Kerberos
documentation for information about other sections in this configuration file.
l [libdefaults] Settings used by the Kerberos 5 library
l [realms] Realm-specific contact information and settings
l [domain_realm] Maps server hostnames to Kerberos realms
You need to update the /etc/krb5.conf file to reflect your site's Kerberos configuration. The
easiest way to ensure consistency among all clients/servers in the Kerberos realm is to copy the
/etc/krb5.conf file from the KDC to the /etc directory on each HP Vertica cluster node.
Inform HP Vertica about the Kerberos principal
Follow these steps to inform HP Vertica about the
KerberosServiceName/KerberosHostname@KerberosRealm principal. This procedure provides an
example principal called vertica/vcluster@EXAMPLE.COM.
About the host name parameter
If you omit the optional KerberosHostname parameter in Step 2 below, HP Vertica uses the return
value from the gethostname() function. Assuming each cluster node has a different host name,
those nodes will each have a different principal, which you must manage in that node's keytab file.
HP recommends that you specify the KerberosHostname parameter to get a single, cluster-wide
principal that is easier to manage than multiple principals.
Configure the Vertica Analytic Database principal parameters
For information about the parameters that you'll set in this procedure, see Kerberos Authentication
Parameters.
1. Log in to the database as an administrator (typically dbadmin) and set the service name for the
HP Vertica principal; for example, vertica.
> SELECT set_config_parameter('KerberosServiceName', 'vertica');
2. Optionally provide the instance or hostname portion of the principal; for example, vcluster

(see "About the host name parameter" above this procedure):
> SELECT set_config_parameter('KerberosHostname', 'vcluster.example.com');
3. Provide the realm portion of the principal; for example, EXAMPLE.COM:
> SELECT set_config_parameter('KerberosRealm', 'EXAMPLE.COM');
Configure the authentication method for all clients
To make sure that all clients use the gss Kerberos authentication method, run the following
command:
> SELECT set_config_parameter('ClientAuthentication', 'host all 0.0.0.0/0 gss');
For more information, see Implementing Client Authentication.
Restart the database
For all settings to take effect, you must restart the database.
Get the ticket and authenticate HP Vertica with the KDC
The example below shows how to get the ticket and authenticate Vertica Analytic Database with
the KDC using the kinit command. You'll commonly perform this final step from the vsql client.
/etc/krb5.conf
EXAMPLE.COM = {
kdc = myserver.example.com:11
admin_server = myadminserver.example.com:000
kpasswd_protocol = SET_CHANGE
default_domain = example /etc/krb5.conf
}
Calling the kinit utility requests a ticket from the KDC server.
$ kinit kuser@EXAMPLE.COM
Password for kuser@EXAMPLE.COM:
Configure Clients for Kerberos Authentication
Each supported platform has a different security framework, so the steps required to configure and
authenticate against Kerberos differ among clients.
On the server side, you construct the HP Vertica Kerberos service name principal using this format:

KerberosServiceName/KerberosHostname@KerberosRealm
On the client side, the GSS libraries require the following format for the HP Vertica service principal:
KerberosServiceName@KerberosHostName
The realm portion of the principal can be omitted because GSS libraries use the realm name of the
configured default (KerberosRealm) realm.
For information about client connection strings, see the following topics in the Programmer's Guide:
l ODBC DSN Parameters
l JDBC Connection Properties
l ADO.NET Connection Properties
l (vsql) Command Line Options
Note: A few scenarios exist in which HP Vertica server's principal name might not match the
host name in the connection string. See Troubleshooting Kerberos Authentication for more
information.
In This Section
l Configure ODBC and vsql Clients on Linux, HP-UX, AIX, MAC OSX, and Solaris
l Configure ODBC and vsql Clients on Windows and ADO.NET
l Configure JDBC Clients on all Platforms
Configure ODBC and vsql Clients on Linux, HP-UX, AIX, MAC OSX,
and Solaris
This topic describes the requirements for configuring an ODBC or vsql client on Linux, HP-UX, AIX,
MAC OSX, or Solaris.
Install the Kerberos 5 client package
See Kerberos Prerequisites.
Provide clients with a valid Kerberos configuration file
The Kerberos configuration (krb5.conf) file contains Kerberos-specific information, such as how to
reach the KDC, default realm name, domain, the path to log files, DNS lookup, encryption types to

use, ticket lifetime, and other settings. The default location for the Kerberos configuration file is
/etc/krb5.conf.
Each client participating in Kerberos authentication must have a valid, identically-configured
krb5.conf file in order to communicate with the KDC. When configured properly, the client can
authenticate with Kerberos and retrieve a ticket through the kinit utility. Likewise, the server can
then use ktutil to store its credentials in a keytab file
Tip: The easiest way to ensure consistency among clients, Vertica Analytic Database, and
the KDC is to copy the /etc/krb5.conf file from the KDC to the client's /etc directory.
Authenticate and connect clients
ODBC and vsql use the client's ticket established by kinit to perform Kerberos authentication.
These clients rely on the security library's default mechanisms to find the ticket file and the and
Kerberos configuration file.
To authenticate against Kerberos, call the kinit utility to obtain a ticket from the Kerberos KDC
server. The following two examples show how to send the ticket request using ODBC and vsql
clients.
ODBC authentication request/connection
1. On an ODBC client, call the kinit utility to acquire a ticket for the kuser user:
2. Connect to HP Vertica and provide the principals in the connection string:
char outStr[100];
SQLLEN len;
SQLDriverConnect(handle, NULL, "Database=VMart;User=kuser;
Server=myserver.example.com;Port=5433;KerberosHostname=vcluster.example.com",
SQL_NTS, outStr, &len);
vsql authentication request/connection
If the vsql client is on the same machine you're connecting to, vsql connects through a UNIX
domain socket and bypasses Kerberos authentication. When you are authenticating with Kerberos,
especially if the ClientAuthentication record connection_type is 'local', you must include the -h
hostname option, described in Command Line Options in the Programmer's Guide.
1. On the vsql client, call the kinit utility:

2. Connect to HP Vertica and provide the host and user principals in the connection string:
$ ./vsql -K vcluster.example.com -h myserver.example.com -U kuser
Welcome to vsql, the Vertica Analytic Database
interactive terminal.
q to quit
In the future, when you log in to vsql as kuser, vsql uses your cached ticket without prompting you
for a password.
You can verify the authentication method by querying the SESSIONS system table:
kuser=> SELECT authentication_method FROM sessions;
authentication_method
-----------------------
GSS-Kerberos
(1 row)
See Also
l Kerberos Client/Server Requirements
l ODBC DSN Parameters in the Programmer's Guide
l (vsql) Command Line Options in the Programmer's Guide
Configure ADO.NET, ODBC, and vsql Clients on Windows
The HP Vertica client drivers support the Windows SSPI library for Kerberos authentication.
Windows Kerberos configuration is stored in the registry.
You can choose between two different setup scenarios for Kerberos authentication on ODBC and
vsql clients on Windows and ADO.NET:
l Windows KDC on Active Directory with Windows built-in Kerberos client and HP Vertica
l Linux KDC with Windows-built-in Kerberos client and HP Vertica

Windows KDC on Active Directory with Windows built-in Kerberos client and
HP Vertica
Kerberos authentication on Windows is commonly used with Active Directory, Microsoft's
enterprise directory service/Kerberos implementation, and is most likely set up by your
organization's network or IT administrator.
Windows clients have Kerberos authentication built into the authentication process. This means
when you log in to Windows from a client machine, and your Windows instance has been
configured to use Kerberos through Active Directory, your login credentials authenticate you to the
Kerberos server (KDC). There is no additional software to set up. To use Kerberos authentication
on Windows clients, log in as REALMuser.
ADO.NET IntegratedSecurity
When you use the ADO.NET driver to connect to HP Vertica, you can optionally specify
IntegratedSecurity=true in the connection string. This Boolean setting informs the driver to
authenticate the calling user against his or her Windows credentials. As a result, you do not need to
include a user name or password in the connection string. If you add a user=<username> entry to
the connection string, the ADO.NET driver ignores it.
Linux KDC with Windows-built-in Kerberos client and HP Vertica
A simpler, but less common scenario is to configure Windows to authenticate against a non-
Windows KDC. In this implementation, you use the ksetup utility to point the Windows operating
system's native Kerberos capabilities at a non-Active Directory KDC. The act of logging in to
Windows obtains a ticket granting ticket, similar to the Active Directory implementation, except in
this case, Windows is internally communicating with a Linux KDC. See the Microsoft Windows
Server Ksetup page for more information.
Configuring Windows clients for Kerberos authentication
Depending on which implementation you want to configure, refer to one of the following pages on
the Microsoft Server website:
l To set up Windows clients with Active Directory, refer to Step-by-Step Guide to Kerberos 5
(krb5 1.0) Interoperability.
l To set up Windows clients with the ksetup utility, refer to the Ksetup page.
Authenticate and connect clients
This section shows you how to authenticate an ADO.NET and vsql client to the KDC, respectively.
Note: Use the fully-qualified domain name as the server in your connection string; for example,
use host.example.com instead of just host.
ADO.NET authentication request/connection

The following example uses the IntegratedSecurity=true, setting, which instructs the
ADO.NET driver to authenticate the calling user's Windows credentials:
VerticaConnection conn = new
VerticaConnection("Database=VMart;Server=host.example.com;
Port=5433;IntegratedSecurity=true;
KerberosServiceName=vertica;KerberosHostname=vcluster.example.com");
conn.open();
vsql authentication request/connection
1. Log in to your Windows client, for example as EXAMPLEkuser.
2. Run the vsql client and supply the connection string to HP Vertica:
C:UserskuserDesktop>vsql.exe -h host.example.com -K vcluster -U kuser
q to quit
See Also
l vsql Command Line Options in the Programmer's Guide
l ADO.NET Connection Properties in the Programmer's Guide
Configure JDBC Clients on All Platforms
Kerberos authentication on JDBC clients uses JavaTM Authentication and Authorization Service
(JAAS) to acquire the initial Kerberos credentials. JAAS is an API framework that hides platform-
specific authentication details and provides a consistent interface for other applications.
The client login process is determined by the JAAS Login Configuration File, which contains
options that specify the authentication method and other settings to use for Kerberos. Options
allowed in the configuration file are defined by a class called the LoginModule.
The JDBC client principal is crafted as jdbc-username@server-from-connection-string.
About the LoginModule
Many vendors can provide a LoginModule implementation that you can use for Kerberos
authentication, but HP recommends that you use the JAAS public class
com.sun.security.auth.module.Krb5LoginModul provided in the Java Runtime Environment
(JRE). The Krb5LoginModule authenticates users using Kerberos protocols and is implemented
differently on non-Windows and Windows platforms:

l On non-Windows platforms: The Krb5LoginModule defers to a native Kerberos client
implementation, which means you can use the same /etc/krb5.conf setup as you use to
configure ODBC and vsql clients on Linux, HP-UX, AIX, MAC OSX, and Solaris platforms.
l On Windows platforms: The Krb5LoginModule uses a custom Kerberos client implementation
bundled with the Java Runtime Environment (JRE). Windows settings are stored in a
%WINDIR%krb5.ini file, which has similar syntax and conventions to the non-Windows
krb5.conf file. You can copy a krb5.conf from a non-Windows client to %WINDIR%krb5.ini.
Documentation for the LoginModules is in the com.sun.security.auth package, and on the
Krb5LoginModule web page.
Create the JAAS login configuration
The JAASConfigName connection property identifies a specific configuration within a JAAS
configuration that contains the Krb5LoginModule and its settings. The JAASConfigName setting
lets multiple JDBC applications with different Kerberos settings coexist on a single host. The
default configuration name is verticajdbc.
Note: Carefully construct the JAAS login configuration file. If syntax is incorrect,
authentication will fail.
You can configure JAAS-related settings in the java.security master security properties file,
which is located in the lib/security directory of the JRE. For more information, see Appendix A in
the JavaTM Authentication and Authorization Service (JAAS) Reference Guide.
Create a JDBC login context
The following example creates a login context for Kerberos authentication on a JDBC client that
uses the default JAASConfigName of verticajdbc and specifies that:
l The ticket granting ticket will be obtained from the ticket cache
l The user will not be prompted for a password if credentials can't be obtained from the cache, the
keytab file, or through shared state
verticajdbc {
com.sun.security.auth.module.Krb5LoginModule
required
useTicketCache=true
doNotPrompt=true;
};
JDBC authentication request/connection
You can configure the Krb5LoginModule to use a cached ticket, keytab, or the driver can acquire
one automatically if the calling user provides a password.

In the previous example, the login process uses a cached ticket and won't ask for a password
because both useTicketCache and doNotPrompt are set to true. If doNotPrompt=false and you
provide a user name and password during the login process, the driver provides that information to
the LoginModule and calls the kinit utility on your behalf.
1. On a JDBC client, call the kinit utility to acquire a ticket:
kinit kuser@EXAMPLE.COM
If you prefer to use a password instead of calling the kinit utility, see "Have the driver call
kinit for you".
2. Connect to HP Vertica:
Properties props = new Properties();
props.setProperty("user", "kuser");
props.setProperty("KerberosServiceName", "vertica");
props.setProperty("KerberosHostName", "vcluster.example.com");
props.setProperty("JAASConfigName", "verticajdbc");
Connection conn = DriverManager.getConnection
"jdbc:vertica://myserver.example.com:5433/VMart", props);
Have the driver call kinit for you
If you want to bypass calling the kinit utility yourself but still benefit from encrypted, mutual
authentication, you can optionally pass the driver a clear text password to acquire the ticket from
the KDC. The password is encrypted when sent across the network. For example,
useTicketCache and doNotPrompt are now both false in the example below, which means that the
calling user's credentials will not be obtained through the ticket cache or keytab.
verticajdbc {
com.sun.security.auth.module.Krb5LoginModule
required
useTicketCache=false
doNotPrompt=false;
};
In the above example, the driver no longer looks for a cached ticket and you don't have to call
kinit. Instead, the driver takes the password and user name and calls kinit on your behalf.
Note: The above is an example to demonstrate the flexibility of JAAS.

See Also
l JDBC Connection Properties in the Programmer's Guide
l JavaTM Authentication and Authorization Service (JAAS) Reference Guide (external website)
Determining the Client Authentication Method
To determine the details behind the type client authentication used for a particular user session,
query the V_MONITOR.SESSIONS system table.
The following example output indicates that the GSS Kerberos authentication method is set:
> SELECT authentication_method FROM sessions;
authentication_method
-----------------------
GSS-Kerberos
(1 row)
For a list of possible values in the authentication_method column, see Implementing Client
Authentication.
Tip: You can also view details about the authentication method by querying the V_
MONITOR.USER_SESSIONS system table.
Troubleshooting Kerberos Authentication
This topic provides tips to help you avoid and troubleshoot issues related to Kerberos authentication
with Vertica Analytic Database.
Server's principal name doesn't match the host name
In some cases during client connection, the HP Vertica server's principal name might not match the
host name in the connection string. (See also Using the ODBC Data Source Configuration utility in
this topic.)
On ODBC, JDBC, and ADO.NET clients, you set the host name portion of the server's principal
using the KerberosHostName connection string. See the following topics in the Programmer's
Guide:
l ODBC DSN Parameters
l JDBC Connection Properties
l ADO.NET Connection Properties

On vsql clients, you set the host name portion of the server's principal name using the -K KRB HOST
command line option, where the default value is specified by the -h switch (the host name of the
machine on which the HP Vertica server is running). -K is equivalent to the drivers'
KerberosHostName connection string value. See Command Line Options in the Programmer's
Guide.
Principal/host mismatch issues and resolutions
Here are some common scenarios where the server's principal name might not match the host
name with a workaround, when available.
l The KerberosHostName configuration parameter has been overridden.
For example, consider the following connection string:
jdbc:vertica://node01.example.com/vmart?user=kuser
Because the above connection string includes no explicit KerberosHostName parameter, the
driver defaults to the host in the URL (node01.example.com). If you overwrote the server-side
KerberosHostName parameter as “abc”, the client would generate an incorrect principal. To
resolve this issue, explicitly set the client’s KerberosHostName to the connection string; for
example:
jdbc:vertica://node01.example.com/vmart?user=kuser&kerberoshostname=abc
l Connection load balancing is enabled, but the node against which the client authenticates might
not be the node in the connection string.
In this situation, consider setting all nodes to use the same KerberosHostName setting. When
you default to the host originally specified in the connection string, load balancing cannot
interfere with Kerberos authentication.
l You have a DNS name that does not match the Kerberos hostname.
For example, imagine a cluster of six servers, where you want hr-servers and finance-
servers to connect to different nodes on the Vertica Analytic Database cluster. Kerberos
authentication, however, occurs on a single (the same) KDC. In this example, the Kerberos
service hostname of the servers is server.example.com, and here's the list of example
servers:
server1.example.com 192.16.10.11
Now assume you have the following DNS entries:

finance-servers.example.com 192.168.10.11, 192.168.10.13, 192.168.10.13
hr-servers.example.com 192.168.10.14, 192.168.10.15, 192.168.10.16
When you connect to finance-servers.example.com, specify the Kerberos -h hostname
option as server.example.com and the -K host option for hr-servers.example.com; for
example:
$ vsql -h fianance-servers.example.com -K server.example.com
l You do not have DNS set up on the client machine, so you have to connect by IP only.
To resolve this issue, specify the Kerberos -h hostname option for the IP address and the -K
host option for server.example.com; for example:
$ vsql -h 192.168.1.12 -K server.example.com
l There is a load balancer involved (Virtual IP), but there is no DNS name for the VIP.
Specify the Kerberos -h hostname option for the Virtual IP address and the -K host option for
server.example.com; for example:
$ vsql -h <virtual IP> -K server.example.com
l You connect to HP Vertica using an IP address, but there is no host name to construct the
Kerberos principal name.
l You set the server-side KerberosHostName configuration parameter to a name other than the HP
Vertica node's host name,but the client can’t determine the host name based on the hostname in
the connection string alone.
JDBC client authentication
If Kerberos authentication fails on a JDBC client, check the JAAS login configuration file for syntax
issues. If syntax is incorrect, authentication will fail.
Working Domain Name Service (DNS)
Make sure that the DNS entries and hosts on the network are all properly configured. Refer to the
Kerberos documentation for your platform for details.

Clock synchronization
Systems clocks in your network must remain in sync for Kerberos authentication to work properly,
so you need to:
l Install NTP on the Kerberos server (KDC)
l Install NTP on each server in your network
l Synchronize system clocks on all machines that participate in the Kerberos realm within a few
minutes of the KDC and each other
Clock skew can be problematic on Linux Virtual Machines that need to sync with Windows Time
Service. You can try the following to keep time in sync:
1. Use any text editor to open /etc/ntp.conf.
2. Under the Undisciplined Local Clock section, add the IP address for the Vertica Analytic
Database server and remove existing server entries.
3. Log in to the server as root and set up a cron job to sync time with the added IP address every
half hour, or as often as needed; for example:
# 0 */2 * * * /etc/init.d/ntpd restart
4. Alternatively, run the following command to force clock sync immediately:
$ sudo /etc/init.d/ntpd restart
For more information, see Set Up Time Synchronization in the Installation Guide and the Network
Time Protocol website.
Encryption algorithm choices
Kerberos is based on symmetric encryption, so be sure that all Kerberos parties involved in the
Kerberos realm agree on the encryption algorithm to use. If they don't agree, authentication fails.
You can review the exceptions in the vertica.log.
On a Windows client, be sure the encryption types match the types set on Active Directory (see
Configure HP Vertica for Kerberos Authentication).
Be aware that Kerberos is used for securing the log-in process only. After the log-in process
completes, information travels between client and server without encryption, by default. If you want
to encrypt traffic, use SSL. For details, see Implementing SSL.

Kerberos passwords
If you change your Kerberos password, you must recreate all of your keytab files.
Using the ODBC Data Source Configuration utility
On Windows vsql clients, if you use the ODBC Data Source Configuration utility and supply a client
Data Source, be sure you enter a Kerberos host name in the Client Settings tab to avoid client
connection failures with the Vertica Analytic Database server.

Implementing SSL
To ensure privacy and verify data integrity, you can configure HP Vertica and database clients to
use Secure Socket Layer (SSL) to communicate and secure the connection between the client and
the server. The SSL protocol uses a trusted third-party called a Certificate Authority (CA), which
means that both the owner of a certificate and the party that relies on the certificate trust the CA.
Certificate Authority
The CA issues electronic certificates to identify one or both ends of a transaction and to certify
ownership of a public key by the name on the certificate.
Public/private Keys
A CA issues digital certificates that contain a public key and the identity of the owner.
The public key is available to all users through a publicly-accessible directory, while private keys
are confidential to their respective owner. The private/public key pair ensures that the data can be
encrypted by one key and decrypted by the other key pair only.
The public and private keys are similar and can be used alternatively; for example, what one key
encrypts, the other key pair can decrypt.
l If encrypted with a public key, can be decrypted by its corresponding private key only
l If encrypted with a private key can be decrypted by its corresponding public key only
For example, if Alice wants to send confidential data to Bob and needs to ensure that only Bob can
read it, she will encrypt the data with Bob's public key. Only Bob has access to his corresponding
private key; therefore, he is the only person who can decrypt Alice's encrypted data back into its
original form, even if someone else gains access to the encrypted data.
HP Vertica uses SSL to:
l Authenticate the server so the client can confirm the server's identity. HP Vertica also supports
mutual authentication in which the server can confirm the identity of the client. This
authentication helps prevent man-in-the-middle attacks.
l Encrypt data sent between the client and database server to significantly reduce the likelihood
that the data can be read if the connection between the client and server is compromised.
l Verify that data sent between the client and server has not been altered during transmission.
HP Vertica supports the following authentication methods under SSL v3/Transport Layer Security
(TLS) 1.0 protocol:
l SSL server authentication — Lets the client confirm the server's identity by verifying that the
server's certificate and public key are valid and were issued by a certificate authority (CA) listed

in the client's list of trusted CAs. See "Required Prerequisites for SSL Server Authentication and
SSL Encryption" in SSL Prerequisites and Configuring SSL.
l SSL client authentication — (Optional) Lets the server confirm the client's identity by verifying
that the client's certificate and public key are valid and were issued by a certificate authority
(CA) listed in the server's list of trusted CAs. Client authentication is optional because HP
Vertica can achieve authentication at the application protocol level through user name and
password credentials. See "Additional Prerequisites for SSL Server and Client Mutual
Authentication" in SSL Prerequisites.
l Encryption — Encrypts data sent between the client and database server to significantly
reduce the likelihood that the data can be read if the connection between the client and server is
compromised. Encryption works both ways, regardless of whether SSL Client Authentication is
enabled. See "Required Prerequisites for SSL Server Authentication and SSL encryption" in
SSL Prerequisites and Configuring SSL.
l Data integrity — Verifies that data sent between the client and server has not been altered
during transmission.
Note: For server authentication, HP Vertica supports using RSA encryption with ephemeral
Diffie-Hellman (DH). DH is the key agreement protocol.
SSL Prerequisites
Before you implement SSL security, obtain the appropriate certificate signed by a certificate
authority (CA) and private key files and then copy the certificate to your system. (See the OpenSSL
documentation.) These files must be in Privacy-Enhanced Mail (PEM) format.
Prerequisites for SSL Server Authentication and SSL
Encryption
Follow these steps to set up SSL authentication of the server by the clients, which is also required
in order to provide encrypted communication between server and client.
1. On each server host in the cluster, copy the server certificate file (server.crt) and private key
(server.key) to the HP Vertica catalog directory. (See Distributing Certificates and Keys.)
The public key contained within the certificate and the corresponding private key allow the SSL
connection to encrypt the data and ensure its integrity.
Note: The server.key file must have read and write permissions for the dbadmin user
only. Do not provide any additional permissions or extend them to any other users. Under
Linux, for example, file permissions would be 0600.
2. If you are using Mutual SSL Authentication, then copy the root.crt file to each client so that

the client's can verify the server's certificate. If you are using vsql, copy the file to:
/home/dbadmin/.vsql/.
This ability is not available for ODBC clients at this time.
The root.crt file contains either the server's certificate or the CA that issued the server certificate.
Note: If you do not perform this step, the SSL connection is set up and ensures message
integrity and confidentiality via encryption; however, the client cannot authenticate the server
and is, therefore, susceptible to problems where a fake server with the valid certificate file
masquerades as the real server. If the root.crt is present but does not match the CA used to
sign the certificate, the database will not start.
Optional Prerequisites for SSL Server and Client Mutual
Authentication
Follow these additional steps to optionally configure authentication of clients by the server.
Setting up client authentication by the server is optional because the server can use alternative
techniques, like database-level password authentication, to verify the client's identity. Follow these
steps only if you want to have both server and client mutually authenticate themselves with SSL
keys.
1. On each server host in the cluster, copy the root.crt file to the HP Vertica catalog directory.
(See Distributing Certificates and Keys.)
The root.crt file has the same name on the client and server. However, these files do not
need to be identical. They would be identical only if the client and server certificates were used
by the same root certificate authority (CA).
2. On each client, copy the client certificate file (client.crt) and private key (client.key) to the
client. If you are using vsql, copy the files to: /home/dbadmin/.vsql/.
If you are using either ODBC or JDBC, you can place the files anywhere on your system and
provide the location in the connection string (ODBC/JDBC) or ODBCINI (ODBC only). See
Configuring SSL for ODBC Clients and Configuring SSL for JDBC Clients.
Note: If you're using ODBC, the private key file (client.key) must have read and write
permissions for the dbadmin user only. Do not provide any additional permissions or
extend them to any other users. Under Linux, for example, file permissions would be 0600.
Generating SSL Certificates and Keys
This section describes the following:

l Generating Certificate Authority (CA) certificates and keys that can then be used to sign server
and client keys.
l Creating a server private key and requesting a new server certificate that includes a public key.
l Creating a client private key and and requesting a new client certificate that includes a public
key.
In a production environment, always use certificates signed by a CA.
For more detailed information on creating signed certificates, refer to the OpenSSL documentation.
What follows are sample procedures for creating certifications and keys. All examples are
hypothetical; the commands shown allow many other possible options not used in these examples.
Create your commands based upon your specific environment.
Create a CA Private Key and Public Certificate
Create a CA private key and public certificate:
1. Use the openssl gensra command to create a CA private key.
openssl gensra -out -new_servercakey.pem 1024
2. Use the openssl req command to create a CA public certificate.
openssl req -config openssl_req_ca.conf –newkey rsa:1024 –x509 –days 3650
-key new_servercakey.pem -out new_serverca.crt
What follows are sample CA certificate values you enter in response to openssl command line
prompts. Rather than enter these value from command line prompts, you can provide the same
information within .conf files (as shown in the command examples in this section).
Country Name (2 letter code) [GB]:US
State or Province Name (full name) [Berkshire]:Massachusetts
Locality Name (e.g., city) [Newbury]:Cambridge
Organization Name (e.g., company) [My Company Ltd]:HP Vertica
Organizational Unit Name (e.g., section) []:Support_CA
Common Name (e.g., your name or your server's hostname) []:myhost
Email Address []:myhost@vertica.com
Note that, when you create a certificate, there must be one unique name (a Distinguished
Name (DN)), which is different for each certificate that you create. The examples in this
section use the Organizational Unit Name for the DN.

Result: You now have a CA private key, new_servercakey.pem. You also have a CA public
certificate, new_serverca.crt. You use both in the procedures that follow for creating server and
client certificates.
Creating the Server Private Key and Certificate
Create the server’s private key file and certificate request, and sign the server certificate using the
CA private key file:
1. Use the openssl gensra command to create the server’s private key file.
openssl genrsa -out new_server.key 1024
HP Vertica supports unencrypted key files only; do not use a –des3 argument.
2. Use the openssl req command to create the server certificate request.
openssl req -config openssl_req_server.conf -new -key new_server.key -out
new_server_reqout.txt
The configuration file (openssl_req_server.conf) includes information that is incorporated
into your certificate request. (If not for the .conf file, you would enter the information in
response to command line prompts.) In this example, the Organizational Unit Name contains
the unique DN, Support_server.
Organizational Unit Name (e.g., section) []:Support_server
3. Use the openssl command x509 to sign the server’s certificate using the CA private key file
and public certificate.
openssl x509 -req -in new_server_reqout.txt -days 3650 -sha1 -CAcreateserial -CA new_
serverca.crt -CAkey new_servercakey.pem -out new_server.crt
Result: You created the server private key file, new_server.key, and then signed the server
certificate using the CA private key (new_servercakey.pem) and CA public certificate (new_
serverca.crt), outputting to a new server certificate, new_server.crt.

Create the Client Private Key and Certificate
Create the the client’s private key file and certificate request, and sign the client certificate using
the CA private key file:
1. Use the openssl gensra command to create the client’s private key file.
openssl genrsa -out new_client.key 1024
2. Use the openssl req command to create the client certificate request.
openssl req -config openssl_req_client.conf -new -key new_client.key -out
new_client_reqout.txt
The configuration file (openssl_req_client.conf) includes information that is incorporated
into your certificate request. (If not for the .conf file, you would enter the information in
response to command line prompts.) In this example, the Organizational Unit Name contains
the unique DN, Support_client.
Organizational Unit Name (e.g., section) []:Support_client
3. Use the openssl command x509 to sign the client’s certificate using the CA private key file and
public certificate.
openssl x509 -req -in new_client_reqout.txt -days 3650 -sha1 -
CAcreateserial -CA new_serverca.crt -CAkey new_servercakey.pem -out new_
client.crt
Result: You created the client private key file, new_client.key, and then signed the client
certificate using the CA private key (new_servercakey.pem) and CA public certificate (new_
serverca.crt), outputting to a new server certificate, new_client.crt.

Summary Illustration (Generating Certificates and Keys)
Set Server and Client Key and Certificate Permissions
Set permissions for server and client certificates and keys:
chmod 700 new_server.crt new_server.key
chmod 700 new_client.crt new_client.key
Note that you can create a shell function to generate SSL certificates and keys.

JDBC Certificates
The Java Runtime Environment (JRE) manages all keys and certificates with a keystore and a
truststore.
l A keystore is a repository that includes private keys and trusted certificates. The private keys
have public key certificates associated with them. The keystore provides credentials for
authentication.
l A truststore contains the certificates of the trusted third parties you might communicate with.
The truststore verifies credentials.
After you have generated the SSL certificates and keys, you need to ensure that the JRE is aware
of the certificate authority’s certificate.
This sample procedure adds a client certificate to a keystore/truststore.
1. Use the openssl x509 command to convert the CA certificate to a form that Java understands.
openssl x509 -in new_serverca.crt -out new_serverca.crt.der -outform der
2. Use the keytool utility with the –keystore option to create and add credentials to a truststore
(truststore). The -noprompt option allows you to proceed without prompts; you could add the
commands given in this procedure to a script. Note that the alias and storepass values in the
following example are arbitrary examples rather than mandatory values you would use in your
environment.
keytool -noprompt -keystore truststore -alias verticasql -storepass
vertica -importcert -file new_serverca.crt.der
3. Use the openssl pkcs12 command to add your client certificate and key into a pkcs12 file. This
interim step is needed as you cannot directly import both the certificate and key into your
keystore.
openssl pkcs12 -export -in new_client.crt -inkey new_client.key -password
pass:vertica -certfile new_client.crt -out keystore.p12
4. Use the keytool utility to import your certificate and key into your keystore.
keytool -noprompt -importkeystore -srckeystore keystore.p12 -srcstoretype
pkcs12 -destkeystore verticastore -deststoretype JKS -deststorepass
vertica -srcstorepass vertica

To complete the set-up for mutual authentication, you must perform a similar procedure to add your
server certificate to a keystore/truststore.
Summary Illustration (JDBC Certificates)
Generating Certificates and Keys for MC
A certificate signing request (CSR) is a block of encrypted text that you generate on the server on
which the certificate will be used. You send the CSR to a certificate authority (CA) in order to apply
for a digital identity certificate. The certificate authority uses the CSR to create your SSL certificate
from information in your certificate; for example, organization name, common (domain) name, city,
country, and so on.
MC uses a combination of OAuth (Open Authorization), Secure Socket Layer (SSL), and locally-
encrypted passwords to secure HTTPS requests between a user's browser and MC, as well as
between MC and the agents. Authentication occurs through MC and between agents within the
cluster. Agents also authenticate and authorize jobs.
The MC configuration process sets up SSL automatically, but you must have the openssl package
installed on your Linux environment first.

When you connect to MC through a client browser, HP Vertica assigns each HTTPS request a self-
signed certificate, which includes a timestamp. To increase security and protect against password
replay attacks, the timestamp is valid for several seconds only, after which it expires.
To avoid being blocked out of MC, synchronize time on the hosts in your HP Vertica cluster, as well
as on the MC host if it resides on a dedicated server. To recover from loss or lack of
synchronization, resync system time and the Network Time Protocol. See Set Up Time
Synchronization in the Installation Guide. If you want to generate your own certificates and keys for
MC, see Generating Certificates and Keys for MC.
Signed Certificates
For production, you need to use certificates that are signed by a certificate authority. You can
create and submit one now and import the certificate into MC when the certificate returns from the
CA.
To generate a new CSR, enter the following command in a terminal window, like vsql:
openssl req -new -key /opt/vertica/config/keystore.key -out server.csr
When you press enter, you are prompted to enter information that will be incorporated into your
certificate request. Some fields contain a default value, which you should change, and some you
can leave blank, like password and optional company name. Enter '.' to leave the field blank.
Important: The keystore.key value for the -key option creates private key for the keystore. If
you generate a new key and import it using the Management Console interface, the MC
process will not restart properly. You will have to restore the original keystore.jks file and
restart Management Console.
Here's an example of the information contained in the CSR, showing both the default and
replacement values:
Country Name (2 letter code) [GB]:USState or Province Name (full name) [Berkshire]:Massac
husetts
Locality Name (eg, city) [Newbury]: Billerica
Organization Name (eg, company) [My Company Ltd]:HP
Organizational Unit Name (eg, section) []:Information Management
Common Name (eg, your name or your server's hostname) []:console.vertica.com
Email Address []:mcadmin@vertica.com
The Common Name field is the fully qualified domain name of your server. The entry must be an
exact match for what you type in your web browser, or you will receive a name mismatch error.
Self-Signed Certificates
If you want to test your new SSL implementation, you can self-sign a CSR using either a temporary
certificate or your own internal CA, if one is available.
Note: A self-signed certificate will generate a browser-based error notifying you that the

signing certificate authority is unknown and not trusted. For testing purposes, accept the risks
and continue.
The following command generate a temporary certificate, which is good for 365 days:
openssl x509 -req -days 365 -in server.csr -signkey /opt/vertica/config/keystore.key -out
server.crt
Here's an example of the command's output to the terminal window:
Signature oksubject=/C=US/ST=Massachusetts/L=Billerica/O=HP/OU=IT/
CN=console.vertica.com/emailAddress=mcadmin@vertica.com
Getting Private key
You can now import the self-signed key, server.crt, into Management Console.
For additional information about certificates and keys, refer to the following external web sites:
Note: At the time of publication, the above links were valid. HP does not control this content,
which could change between HP Vertica documentation releases.
See Also
l How to Configure SSL
l Key and Certificate Management Tool
Importing a New Certificate to MC
To generate a new certificate for Management Console, you must use the keystore.key file,
which is located in /opt/vconsole/config on the server on which you installed MC. Any other
generated key/certificate pair will cause MC to restart incorrectly. You will then have to restore the
original keystore.jks file and restart Management Console. See Generating Certifications and
Keys for Management Console.
To Import a New Certificate
1. Connect to Management Console and log in as an administrator.
2. On the Home page, click MC Settings.
3. In the button panel at left, click SSL certificates.
4. To the right of "Upload a new SSL certificate" click Browse to import the new key.

5. Click Apply.
6. Restart Management Console.
Distributing Certificates and Keys
Once you have created the prerequisite certifications and keys for one host, you can easily
distribute them cluster-wide by using the Administration Tools. Client files cannot be distributed
through Administration Tools.
To distribute certifications and keys to all hosts in a cluster:
1. Log on to a host that contains the certifications and keys you want to distribute and start the
Administration Tools.
See Using the Administration Tools for information about accessing the Administration Tools.
2. On the Main Menu in the Administration Tools, select Configuration Menu, and click OK.
3. On the Configuration Menu, select Distribute Config Files, and click OK.
4. Select SSL Keys and click OK.
5. Select the database where you want to distribute the files and click OK.
6. Fill in the fields with the directory /home/dbadmin/.vsql/ using the root.crt, server.crt
and server.key files to distribute the files.
7. Configure SSL.
Configuring SSL
Configure SSL for each server in the cluster.
To Enable SSL:
1. Ensure that you have performed the steps listed in SSL Prerequisites minimally for server
authentication and encryption, and optionally for mutual authentication.
2. Set the EnableSSL parameter to 1. By default, EnableSSL is set to 0 (disabled).
=> SELECT SET_CONFIG_PARAMETER('EnableSSL', '1');
Note: HP Vertica fails to start if SSL has been enabled and the server certificate files
(server.crt, server.key) are not in the expected location.

4. If you are using either ODBC or JDBC, configure SSL for the appropriate client:
n Configuring SSL for ODBC Clients
n Configuring SSL for JDBC Clients
vsql automatically attempts to make connections using SSL. If a connection fails, vsql
attempts to make a second connection over clear text.
See Also
Configuring SSL for ODBC Clients
Configuring SSL for ODBC clients requires that you set the SSLMode parameter. If you want to
configure optional SSL client authentication, you also need to configure the SSLKeyFile and
SSLCertFile parameters.
The method you use to configure the DSN depends on the type of client operating system you are
using:
l Linux and UNIX — Enter the parameters in the odbc.ini file. See Creating an ODBC DSN for
Linux, Solaris, AIX, and HP-UX Clients.
l Microsoft Windows — Enter the parameters in the Windows Registry. See Creating an ODBC
DSN for Windows Clients.
SSLMode Parameter
Set the SSLMode parameter to one of the following for the DSN:
l require — Requires the server to use SSL. If the server cannot provide an encrypted channel,
the connection fails.
l prefer (the default) — Prefers the server to use SSL. If the server does not offer an encrypted
channel, the client requests one. The first connection to the database tries to use SSL. If that
fails, a second connection is attempted over a clear channel.
l allow — The first connection to the database tries to use a clear channel. If that fails, a second
connection is attempted over SSL.
l disable — Never connects to the server using SSL. This setting is typically used for
troubleshooting.

SSLKeyFile Parameter
To configure optional SSL client authentication, set the SSLKeyFile parameter to the file path and
name of the client's private key. This key can reside anywhere on the client.
SSLCertFile Parameter
To configure optional SSL client authentication, set the SSLCertFile parameter to the file path and
name of the client's public certificate. This file can reside anywhere on the client.
Configuring SSL for JDBC Clients
1. Set required properties.
2. Troubleshoot if necessary.
Setting Required Properties
1. If you are using a location or the keystore/truststore that is not the default, set the following
system properties so that the JRE can find your keystore/truststore:
javax.net.ssl.keyStore
javax.net.ssl.trustStore
2. If your keystore/truststore is password protected, set the following system properties so that
the JRE has access to keystore/truststore:
javax.net.ssl.keyStorePassword javax.net.ssl.trustStorePassword
3. Enable SSL in the JDBC Driver by setting the SSL property to true. There are a number of
ways to set the SSL property. You can set ssl=true in a connection string/URL, call
SslEnabled(true) on the DataSource, or use a Properties object parameter.
Troubleshooting
The following command turns on the debug utility for SSL:
-Djavax.net.debug=ssl
There are a number of debug specifiers (options) you can use with the debug utility. The specifiers
help narrow the scope of the debugging information that is returned. For example, you could specify
one of the options that prints handshake messages or session activity.

For information on the debug utility and its options, see Debugging Utilities in the Oracle document,
JSSE Reference Guide.
For information on interpreting debug information, refer to the Oracle document, Debugging
SSL/TLS Connections.
Requiring SSL for Client Connections
You can require clients to use SSL when connecting to HP Vertica by creating a client
authentication record for them that has a connection_type of hostssl. You can choose to limit
specific users to only connecting using SSL (useful for specific clients that you know are
connecting through an insecure network connection) or require all clients to use SSL.
See Implementing Client Authentication for more information about creating client authentication
records.

Managing Users and Privileges
Database users should have access to only the database resources they need to perform their
tasks. For example, most users should be able to read data but not modify or insert new data, while
other users might need more permissive access, such as the right to create and modify schemas,
tables, and views, as well as rebalance nodes on a cluster and start or stop a database. It is also
possible to allow certain users to grant other users access to the appropriate database resources.
Client authentication controls what database objects users can access and change in the
database. To prevent unauthorized access, a superuser limits access to what is needed, granting
privileges directly to users or to roles through a series of GRANT statements. Roles can then be
granted to users, as well as to other roles.
A Management Console administrator can also grant MC users access to one or more HP Vertica
databases through the MC interface. See About MC Users and About MC Privileges and Roles for
details.
This section introduces the privilege role model in HP Vertica and describes how to create and
manage users.
See Also
l About Database Privileges
l About Database Roles
l GRANT Statements
l REVOKE Statements

About Database Users
Every HP Vertica database has one or more users. When users connect to a database, they must
log on with valid credentials (username and password) that a superuser defined in the database.
Database users own the objects they create in a database, such as tables, procedures, and storage
locations.
Note: By default, users have the right to create temporary tables in a database.
See Also
l Creating a Database User
l CREATE USER
l About MC Users

Types of Database Users
In an HP Vertica database, there are three types of users:
l Database administrator (DBADMIN)
l Object owner
l Everyone else (PUBLIC)
Note: External to an HP Vertica database, an MC administrator can create users through the
Management Console and grant them database access. See About MC Users for details.
DBADMIN User
When you create a new database, a single database administrator account, DBADMIN, is
automatically created along with the PUBLIC role. This database superuser bypasses all
permission checks and has the authority to perform all database operations, such as bypassing all
GRANT/REVOKE authorizations and any user granted the PSEUDOSUPERUSER role.
Note: Although the dbadmin user has the same name as the Linux database administrator
account, do not confuse the concept of a database superuser with Linux superuser (root)
privilege; they are not the same. A database superuser cannot have Linux superuser privileges.
The DBADMIN user can start and stop a database without a database password. To connect to the
database, a password is required.
See Also
l DBADMIN Role
l PSEUDOSUPERUSER Role
l PUBLIC Role
Object Owner
An object owner is the user who creates a particular database object and can perform any operation
on that object. By default, only an owner (or a superuser) can act on a database object. In order to
allow other users to use an object, the owner or superuser must grant privileges to those users
using one of the GRANT statements.
Note: Object owners are PUBLIC users for objects that other users own.
See About Database Privileges for more information.

PUBLIC User
All non-DBA (superuser) or object owners are PUBLIC users.
Note: Object owners are PUBLIC users for objects that other users own.
Newly-created users do not have access to schema PUBLIC by default. Make sure to GRANT
USAGE ON SCHEMA PUBLIC to all users you create.
See Also
l PUBLIC Role
Creating a Database User
This procedure describes how to create a new user on the database.
1. From vsql, connect to the database as a superuser.
2. Issue the CREATE USER statement with optional parameters.
3. Run a series of GRANT statements to grant the new user privileges.
Notes
l Newly-created users do not have access to schema PUBLIC by default. Make sure to GRANT
USAGE ON SCHEMA PUBLIC to all users you create
l By default, database users have the right to create temporary tables in the database.
l If you plan to create users on Management Console, the database user account needs to exist
before you can associate an MC user with the database.
l You can change information about a user, such as his or her password, by using the ALTER
USER statement. If you want to configure a user to not have any password authentication, you
can set the empty password ‘’ in CREATE or ALTER USER statements, or omit the
IDENTIFIED BY parameter in CREATE USER.
Example
The following series of commands add user Fred to a database with password 'password. The
second command grants USAGE privileges to Fred on the public schema:
=> CREATE USER Fred IDENTIFIED BY 'password';=> GRANT USAGE ON SCHEMA PUBLIC to Fred;
User names created with double-quotes are case sensitive. For example:

=> CREATE USER "FrEd1";
In the above example, the logon name must be an exact match. If the user name was created
without double-quotes (for example, FRED1), then the user can log on as FRED1, FrEd1, fred1,
and so on.
ALTER USER and DROP USER syntax is not case sensitive.
See Also
l Granting and Revoking Privileges
l Granting Access to Database Roles
l Creating an MC User
Locking/unlocking a user's Database Access
A superuser can manually lock an existing database user's account with the ALTER USER
statement. For example, the following command prevents user Fred from logging in to the
database:
=> ALTER USER Fred ACCOUNT LOCK;
=> c - Fred
FATAL 4974: The user account "Fred" is locked
To grant Fred database access, use UNLOCK syntax with the ALTER USER command:
=> ALTER USER Fred ACCOUNT UNLOCK;
=> c - Fred
You are now connected as user "Fred".
Using CREATE USER to lock an account
Although not as common, you can create a new user with a locked account; for example, you might
want to set up an account for a user who doesn't need immediate database access, as in the case
of an employee who will join the company at a future date.
=> CREATE USER Bob ACCOUNT UNLOCK;
CREATE USER
CREATE USER also supports UNLOCK syntax; however, UNLOCK is the default, so you don't
need to specify the keyword when you create a new user to whom you want to grant immediate
database access.

Locking an account automatically
Instead of manually locking an account, a superuser can automate account locking by setting a
maximum number of failed login attempts through the CREATE PROFILE statement. See Profiles.
Changing a user's Password
A superuser can change another user's database account, including reset a password, with the the
ALTER USER statement.
Making changes to a database user account with does not affect current sessions.
=> ALTER USER Fred IDENTIFIED BY 'newpassword';
In the above command, Fred's password is now newpassword.
Note: Non-DBA users can change their own passwords using the IDENTIFIED BY 'new-
password' option along with the REPLACE 'old-password' clause. See ALTER USER for
details.
Changing a user's MC Password
On MC, users with ADMIN or IT privileges can reset a user's non-LDAP password from the MC
interface.
Non-LDAP passwords on MC are for MC access only and are not related to a user's logon
credentials on the HP Vertica database.
1. Sign in to Management Console and navigate to MC Settings > User management.
2. Click to select the user to modify and click Edit.
3. Click Edit password and enter the new password twice.
4. Click OK and then click Save.

About MC Users
Unlike database users, which you create on the HP Vertica database and then grant privileges and
roles through SQL statements, you create MC users on the Management Console interface. MC
users are external to the database; their information is stored on an internal database on the MC
application/web server, and their access to both MC and to MC-managed databases is controlled
by groups of privileges (also referred to as access levels). MC users are not system (Linux) users;
they are entries in the MC internal database.
Permission Group Types
There are two types of permission groups on MC, those that apply to MC configuration and those
that apply to database access:
l MC configuration privileges are made up of roles that control what users can configure on the
MC, such as modify MC settings, create/import HP Vertica databases, restart MC, create an
HP Vertica cluster through the MC interfac, and create and manage MC users.
l MC database privileges are made up of roles that control what users can see or do on an MC-
managed HP Vertica database, such as view the database cluster state, query and session
activity, monitor database messages and read log files, replace cluster nodes, and stop
databases.
If you are using MC, you might want to allow one or more users in your organization to configure and
manage MC, and you might want other users to have database access only. You can meet these
requirements by creating MC users and granting them a role from each privileges group. See
Creating an MC User for details.
MC User Types
There are four types of role-based users on MC:
l The default superuser administrator (Linux account) who gets created when you install and
configure MC and oversees the entire MC. See SUPER Role (mc).
l Users who can configure all aspects of MC and control all MC-managed databases. See
ADMIN Role (mc).
l Users who can configure some aspects of MC and monitor all MC-managed databases. See IT
Role (mc).
l Users who cannot configure MC and have access to one or more MC-managed databases only.
See NONE Role (mc).
You create users and grant them privileges (through roles) on the MC Settings page by selecting
User management, to add users who will be authenticated against the MC or Authentication, to
authenticate MC users through your organization's LDAP repository.

Creating Users and Choosing an Authentication Method
You create users and grant them privileges (through roles) on the MC Settings page, where you
can also choose how to authenticate their access to MC; for example:
l To add users who will be authenticated against the MC, click User Management
l To add users who will be authenticated through your organization's LDAP repository, click
Authentication
MC supports only one method for authentication, so if you choose MC, all MC users will be
authenticated using their MC login credentials.
Default MC Users
The MC super account is the only default user. The super or another MC administrator must create
all other MC users.
See Also
l Management Console
l About MC Privileges and Roles
l Granting Database Access to MC Users
l Mapping an MC User to a Database user's Privileges
Creating an MC User
MC provides two authentication schemes for MC users: LDAP or MC (internal). Which method you
choose will be the method MC uses to authenticate all MC users. It is not possible to authenticate
some MC users against LDAP and other MC users against credentials in the database through MC.
l MC (internal) authentication. Internal user authorization is specific to the MC itself, where you
create a user with a username and password combination. This method stores MC user
information in an internal database on the MC application/web server, and encrypts passwords.
Note that these MC users are not system (Linux) users; they are entries in the MC’s internal
database.
l LDAP authentication. All MC users—except for the MC super administrator, which is a Linux
account—will be authenticated based on search criteria against your organization's LDAP
repository. MC uses information from LDAP for authentication purposes only and does not
modify LDAP information. Also, MC does not store LDAP passwords but passes them to the
LDAP server for authentication.
Instructions for creating new MC users are in this topic.

l If you chose MC authentication, follow the instructions under Create a new MC-authenticated
user.
l If you chose LDAP authentication, follow the instructions under Create a new user from
LDAP.
See About MC Users and Configuring LDAP Authentication for more information.
Prerequisites
Before you create an MC user, you already:
l Created a database directly on the server or through the MC interface, or you imported an
existing database cluster into the MC interface. See Managing Database Clusters on MC.
l Created a database user account (source user) on the server, which has the privileges and/or
roles you want to map to the new (target) MC user. See Creating a Database User.
l Know what MC privileges you want to grant the new MC user. See About MC Privileges and
Roles.
l Are familiar with the concept of mapping MC users to database users.
If you have not yet met the first two above prerequisites, you can still create new MC users; you
just won't be able to map them to a database until after the database and target database user exist.
To grant MC users database access later, see Granting Database Access to MC Users.
Create a New MC-authenticated User
1. Sign in to Management Console as an administrator and navigate to MC Settings > User
management.
2. Click Add.
3. Enter the MC username.
Note: It is not necessary to give the MC user the exact same name as the database user
account you'll map the MC user to in Step 7. What matters is that the source database
user has privileges and/or roles similar to the database role you want to grant the MC user.
The most likely scenario is that you will map multiple MC users to a single database user
account. See MC Database Privileges and Mapping an MC User to a Database user's
Privileges for more information.
4. Let MC generate a password or create one by clicking Edit password. If LDAP has been
configured, the MC password field will not appear.
5. Optionally enter the user's e-mail address.

6. Select an MC configuration permissions level. See MC Configuration Privileges.
7. Next to the DB access levels section, click Add to grant this user database permissions. If
you want to grant access later, proceed to Step 8. If you want to grant database access now,
provide the following information:
i. Choose a database. Select a database from the list of MC-discovered (databases
that were created on or imported into the MC interface).
ii. Database username. Enter an existing database user name or, if the database is
running, click the ellipses […] to browse for a list of database users, and select a
name from the list.
iii. Database password. Enter the password to the database user account (not this
username's password).
iv. Restricted access. Chose a database level (ADMIN, IT, or USER) for this user.
v. Click OK to close the Add permissions dialog box.
See Mapping an MC User to a Database user's Privileges for additional information about
associating the two user accounts.
1. Leave the user's Status as enabled (the default). If you need to prevent this user from
accessing MC, select disabled.
2. Click Add User to finish.
Create a New LDAP-authenticated User
When you add a user from LDAP on the MC interface, options on the Add a new user dialog box
are slightly different from when you create users without LDAP authentication. Because passwords
are store externally (LDAP server) the password field does not appear. An MC administrator can
override the default LDAP search string if the user is found in another branch of the tree. The Add
user field is pre-populated with the default search path entered when LDAP was configured.
1. Sign in to Management Console and navigate to MC Settings > User management.
2. Click Add and provide the following information:
a. LDAP user name.
b. LDAP search string.
c. User attribute, and click Verify user.
d. User's email address.
e. MC configuration role. NONE is the default. See MC Configuration Privileges for details.

f. Database access level. See MC Database Privileges for details.
g. Accept or change the default user's Status (enabled).
3. Click Add user.
If you encounter issues when creating new users from LDAP, you'll need to contact your
organization's IT department.
How MC Validates New Users
After you click OK to close the Add permissions dialog box, MC tries to validate the database
username and password entered against the selected MC-managed database or against your
organization's LDAP directory. If the credentials are found to be invalid, you are asked to re-enter
them.
If the database is not available at the time you create the new user, MC saves the
username/password and prompts for validation when the user accesses the Database and Clusters
page later.
See Also
l Configuring MC
l About MC Users
l Adding Multiple Users to MC-managed Databases
Managing MC Users
You manage MC users through the following pages on the Management Console interface:
l MC Settings > User management
l MC Settings > Resource access
Who Manages Users
The MC superuser administrator (SUPER Role (mc)) and users granted ADMIN Role (mc) manage
all aspects of users, including their access to MC and to MC-managed databases.

Users granted IT Role (mc) can enable and disable user accounts.
See About MC Users and About MC Privileges and Roles for more information.
Editing an MC user's information follows the same steps as creating a new user, except the user's
information will be pre-populated, which you then edit and save.
The only user account you cannot alter or remove from the MC interface is the MC super account.
What Kind of User Information You Can Manage
You can change the following user properties:
l MC password
l Email address. This field is optional; if the user is authenticated against LDAP, the email field is
pre-populated with that user's email address if one exists.
l MC Configuration Privileges role
l MC Database Privileges role
You can also change a user's status (enable/disable access to MC) and delete users.
About User Names
After you create and save a user, you cannot change that user's MC user name, but you can delete
the user account and create a new user account under a new name. The only thing you lose by
deleting a user account is its audit activity, but MC immediately resumes logging activity under the
user's new account.
See Also
l About MC Users

About Database Privileges
When a database object is created, such as a schema, table, or view, that object is assigned an
owner—the person who executed the CREATE statement. By default, database administrators
(superusers) or object owners are the only users who can do anything with the object.
In order to allow other users to use an object, or remove a user's right to use an object, the
authorized user must grant another user privileges on the object.
Privileges are granted (or revoked) through a collection of GRANT/REVOKE statements that
assign the privilege—a type of permission that lets users perform an action on a database object,
such as:
l Create a schema
l Create a table (in a schema)
l Create a view
l View (select) data
l Insert, update, or delete table data
l Drop tables or schemas
l Run procedures
Before HP Vertica executes a statement, it determines if the requesting user has the necessary
privileges to perform the operation.
For more information about the privileges associated with these resources, see Privileges That Can
Be Granted on Objects.
Note: HP Vertica logs information about each grant (grantor, grantee, privilege, and so on) in
the V_CATALOG.GRANTS system table.
See Also
l GRANT Statements
l REVOKE Statements
Default Privileges for All Users
To set the minimum level of privilege for all users, HP Vertica has the special PUBLIC Role, which
it grants to each user automatically. This role is automatically enabled, but the database
administrator or a superuser can also grant higher privileges to users separately using GRANT
statements.

The following topics discuss those higher privileges.
Default Privileges for MC Users
Privileges on Management Console (MC) are managed through roles, which determine a user's
access to MC and to MC-managed HP Vertica databases through the MC interface. MC privileges
do not alter or override HP Vertica privileges or roles. See About MC Privileges and Roles for
details.
Privileges Required for Common Database Operations
This topic lists the required privileges for database objects in HP Vertica.
Unless otherwise noted, superusers can perform all of the operations shown in the following tables
without any additional privilege requirements. Object owners have the necessary rights to perform
operations on their own objects, by default.
Schemas
The PUBLIC schema is present in any newly-created HP Vertica database, and newly-created
users have only USAGE privilege on PUBLIC. A database superuser must explicitly grant new
users CREATE privileges, as well as grant them individual object privileges so the new users can
create or look up objects in the PUBLIC schema.
Operation Required Privileges
CREATE SCHEMA CREATE privilege on database
DROP SCHEMA Schema owner
ALTER SCHEMA RENAME CREATE privilege on database
Tables
CREATE TABLE CREATE privilege on schema
Note: Referencing sequences in the CREATE TABLE
statement requires the following privileges:
l SELECT privilege on sequence object
l USAGE privilege on sequence schema
DROP TABLE USAGE privilege on the schema that contains the table or
schema owner

TRUNCATE TABLE USAGE privilege on the schema that contains the table or
schema owner
ALTER TABLE ADD/DROP/
RENAME/ALTER-TYPE COLUMN
USAGE privilege on the schema that contains the table
ALTER TABLE ADD/DROP CONSTRAINT USAGE privilege on the schema that contains the table
ALTER TABLE PARTITION (REORGANIZE) USAGE privilege on the schema that contains the table
ALTER TABLE RENAME USAGE and CREATE privilege on the schema that
contains the table
ALTER TABLE SET SCHEMA l CREATE privilege on new schema
l USAGE privilege on the old schema
SELECT l SELECT privilege on table
l USAGE privilege on schema that contains the table
INSERT l INSERT privilege on table
DELETE l DELETE privilege on table
l SELECT privilege on the referenced table when
executing a DELETE statement that references table
column values in a WHERE or SET clause
UPDATE l UPDATE privilege on table
l SELECT privilege on the table when executing an
UPDATE statement that references table column
values in a WHERE or SET clause
REFERENCES l REFERENCES privilege on table to create foreign
key constraints that reference this table
l USAGE privileges on schema that contains the
constrained table and the source of the foreign k
ANALYZE_STATISTICS() l INSERT/UPDATE/DELETE privilege on table

ANALYZE_HISTOGRAM() l INSERT/UPDATE/DELETE privilege on table
DROP_STATISTICS() l INSERT/UPDATE/DELETE privilege on table
DROP_PARTITION() USAGE privilege on schema that contains the table
MERGE_PARTITIONS() USAGE privilege on schema that contains the table
Views
CREATE VIEW l CREATE privilege on the schema to contain a view
l SELECT privileges on base objects (tables/views)
l USAGE privileges on schema that contains the base objects
DROP VIEW USAGE privilege on schema that contains the view or schema owner
SELECT ... FROM VIEW l SELECT privilege on view
l USAGE privilege on the schema that contains the view
Note: Privileges required on base objects for view owner must be directly
granted, not through roles:
l View owner must have SELECT ... WITH GRANT OPTION privileges
on the view's base tables or views if non-owner runs a SELECT query
on the view. This privilege must be directly granted to the owner,not
through a role.
l View owner must have SELECT privilege directly granted (not through
a role) on a view's base objects (table or view) if owner runs a
SELECT query on the view.

Projections
CREATE PROJECTION l SELECT privilege on base tables
l USAGE privilege on schema that contains base tables or schema
owner
l CREATE privilege on schema to contain the projection
Note: If a projection is implicitly created with the table, no additional
privilege is needed other than privileges for table creation.
AUTO/DELAYED PROJECTION On projections created during INSERT..SELECT or COPY
operations:
l SELECT privilege on base tables
l USAGE privilege on schema that contains base tables
ALTER PROJECTION RENAME USAGE and CREATE privilege on schema that contains the
projection
DROP PROJECTION USAGE privilege on schema that contains the projection or schema
owner
External Procedures
CREATE PROCEDURE Superuser
DROP PROCEDURE Superuser
EXECUTE l EXECUTE privilege on procedure
l USAGE privilege on schema that contains the
procedure
Libraries
CREATE LIBRARY Superuser
DROP LIBRARY Superuser

User-Defined Functions
The following abbreviations are used in the UDF table:
l UDF = Scalar
l UDT = Transform
l UDAnF= Analytic
l UDAF = Aggregate
CREATE FUNCTION (SQL)CREATE FUNCTION (UDF)
CREATE TRANSFORM FUNCTION (UDF)
CREATE ANALYTIC FUNCTION (UDAnF
CREATE AGGREGATE FUNCTION (UDAF)
l CREATE privilege on schema to contain the
function
l USAGE privilege on base library (if applicable)
DROP FUNCTION DROP TRANSFORM FUNCTION
DROP ANALYTIC FUNCTION
DROP AGGREGATE FUNCTION
l Superuser or function owner
function
ALTER FUNCTION RENAME TO USAGE and CREATE privilege on schema that
contains the function
ALTER FUNCTION SET SCHEMA l USAGE privilege on schema that currently
contains the function (old schema)
l CREATE privilege on the schema to which the
function will be moved (new schema)
EXECUTE (SQL/UDF/UDT/ ADAF/UDAnF) function l EXECUTE privilege on function
function
Sequences
CREATE SEQUENCE CREATE privilege on schema to contain the sequence
Note: Referencing sequence in the CREATE TABLE statement
requires SELECT privilege on sequence object and USAGE
privilege on sequence schema.

CREATE TABLE with SEQUENCE l SELECT privilege on sequence
DROP SEQUENCE USAGE privilege on schema containing the sequence or schema
owner
ALTER SEQUENCE RENAME TO USAGE and CREATE privileges on schema
ALTER SEQUENCE SET SCHEMA l USAGE privilege on the schema that currently contains the
sequence (old schema)
l CREATE privilege on new schema to contain the sequence
CURRVAL()NEXTVAL() l SELECT privilege on sequence
Resource Pools
CREATE RESOURCE POOL Superuser
ALTER RESOURCE POOL Superuser on the resource pool to alter:
l MAXMEMORYSIZE
l PRIORITY
l QUEUETIMEOUT
UPDATE privilege on the resource pool to alter:
l PLANNEDCONCURRENCY
l SINGLEINITIATOR
l MAXCONCURRENCY
SET SESSION RESOURCE_POOL l USAGE privilege on the resource pool
l Users can only change their own resource pool setting using
ALTER USER syntax
DROP RESOURCE POOL Superuser

Users/Profiles/Roles
CREATE USER
CREATE PROFILE
CREATE ROLE
Superuser
ALTER USER
ALTER PROFILE
ALTER ROLE RENAME
Superuser
DROP USER
DROP PROFILE
DROP ROLE
Superuser
Object Visibility
You can use one or a combination of vsql d [pattern] meta commands and SQL system tables to
view objects on which you have privileges to view.
l Use dn [pattern] to view schema names and owners
l Use dt [pattern] to view all tables in the database, as well as the system table V_
CATALOG.TABLES
l Use dj [pattern] to view projections showing the schema, projection name, owner, and node, as
well as the system table V_CATALOG.PROJECTIONS
Operation
Required
Privileges
Look up schema At least one
privilege on
schema that
contains the
object

Operation
Required
Privileges
Look up Object in Schema or in System Tables USAGE
privilege on
schema
At least one
privilege on
any of the
following
objects:
TABLE
VIEW
FUNCTION
PROCEDURE
SEQUENCE
Look up Projection At least one
privilege on all
base tables
USAGE
privilege on
schema of all
base table
Look up resource pool SELECT
privilege on the
resource pool
Existence of object USAGE
privilege on the
schema that
contains the
object
I/O Operations
CONNECTDISCONNECT None

EXPORT TO HP Vertica l SELECT privileges on the source table
l USAGE privilege on source table schema
l INSERT privileges for the destination table in target
database
l USAGE privilege on destination table schema
COPY FROM HP Vertica l SELECT privileges on the source table
l INSERT privileges for the destination table in target
database
COPY FROM file Superuser
COPY FROM STDIN l INSERT privilege on table
l USAGE privilege on schema
COPY LOCAL l INSERT privilege on table
l USAGE privilege on schema

Comments
COMMENT ON { is one of }:
l AGGREGATE
FUNCTION
l ANALYTIC FUNCTION
l COLUMN
l CONSTRAINT
l FUNCTION
l LIBRARY
l NODE
l PROJECTION
l SCHEMA
l SEQUENCE
l TABLE
l TRANSFORM
FUNCTION
l VIEW
Object owner or superuser
Transactions
COMMIT None
ROLLBACK None
RELEASE SAVEPOINT None
SAVEPOINT None

Sessions
SET { is one of }:
l DATESTYLE
l ESCAPE_STRING_WARNING
l INTERVALSTYLE
l LOCALE
l ROLE
l SEARCH_PATH
l SESSION AUTOCOMMIT
l SESSION CHARACTERISTICS
l SESSION MEMORYCAP
l SESSION RESOURCE POOL
l SESSION RUNTIMECAP
l SESSION TEMPSPACE
l STANDARD_CONFORMING_
STRINGS
l TIMEZONE
None
SHOW { name | ALL } None
Tuning Operations
PROFILE Same privileges required to run the query being profiled
EXPLAIN Same privileges required to run the query for which you use the EXPLAIN keyword

Privileges That Can Be Granted on Objects
The following table provides an overview of privileges that can be granted on (or revoked from)
database objects in HP Vertica:
See Also
l GRANT Statements
l REVOKE Statements
Database Privileges
Only a database superuser can create a database. In a new database, the PUBLIC Role is granted
USAGE on the automatically-created PUBLIC schema. It is up to the superuser to grant further
privileges to users and roles.
The only privilege a superuser can grant on the database itself is CREATE, which allows the user
to create a new schema in the database. For details on granting and revoking privileges on a
database, see the GRANT (Database) and REVOKE (Database) topics in the SQL Reference
Manual.
Privilege Grantor Description
CREATE Superuser Allows a user to create a schema.

Schema Privileges
By default, only a superuser and the schema owner have privileges to create objects within a
schema. Additionally, only the schema owner or a superuser can drop or alter a schema. See
DROP SCHEMA and ALTER SCHEMA.
You must grant all new users access to the PUBLIC schema by running GRANT USAGE ON
SCHEMA PUBLIC. Then grant new users CREATE privileges and privileges to individual objects
in the schema. This enables new users to create or locate objects in the PUBLIC schema. Without
USAGE privilege, objects in the schema cannot be used or altered, even by the object owner.
CREATE gives the schema owner or user WITH GRANT OPTION permission to create new
objects in the schema, including renaming an object in the schema or moving an object into this
schema.
Note: The schema owner is typically the user who creates the schema. However, a superuser
can create a schema and assign ownership of the schema to a different user at creation.
All other access to the schema and its objects must be explicitly granted to users or roles by the
superuser or schema owner. This prevents unauthorized users from accessing the schema and its
objects. A user can be granted one of the following privileges through the GRANT statement:
Privilege Description
CREATE Allows the user to create new objects within the schema. This includes the ability to
create a new object, rename existing objects, and move objects into the schema
from other schemas.
USAGE Permission to select, access, alter, and drop objects in the schema. The user must
also be granted access to the individual objects in order to alter them. For example, a
user would need to be granted USAGE on the schema and SELECT on a table to be
able to select data from a table. You receive an error message if you attempt to query
a table that you have SELECT privileges on, but do not have USAGE privileges for
the schema that contains the table.
Note the following on error messages related to granting privileges on a schema or an object:
l You attempt to grant a privilege to a schema, but you do not have USAGE privilege for the
schema. In this case, you receive an error message that the schema does not exist.
l You attempt to grant a privilege to an object within a schema, and you have USAGE privilege on
the schema. You do not have privilege on the individual object within the schema. In this case,
you receive an error denying permission for that object.
Schema Privileges and the Search Path
The search path determines to which schema unqualified objects in SQL statements belong.

When a user specifies an object name in a statement without supplying the schema in which the
object exists (called an unqualified object name) HP Vertica has two different behaviors, depending
on whether the object is being accessed or created.
Creating an object
Accessing/altering an
object
When a user creates an object—such as table, view, sequence,
procedure, function—with an unqualified name, HP Vertica tries to
create the object in the current schema (the first schema in the
schema search path), returning an error if the schema does not exist
or if the user does not have CREATE privileges in that schema.
Use the SHOW search_path command to view the current search
path.
=> SHOW search_path; name | setting
-------------+---------------------------------------------------
(1 row)
Note: The first schema in the search path is the current schema, and
the $user setting is a placeholder that resolves to the current user's
name.
When a user accesses or
alters an object with an
unqualified name, those
statements search
through all schemas for a
matching object, starting
with the current schema,
where:
l The object name in the
schema matches the
object name in the
statement.
l The user has USAGE
privileges on the
schema in order to
access object in it.
l The user has at least
one privilege on the
object.
See Also
l Setting Search Paths
l GRANT (Schema)
l REVOKE (Schema)
Table Privileges
By default, only a superuser and the table owner (typically the person who creates a table) have
access to a table. The ability to drop or alter a table is also reserved for a superuser or table owner.
This privilege cannot be granted to other users.
All other users or roles (including the user who owns the schema, if he or she does not also own the
table) must be explicitly granted using WITH GRANT OPTION syntax to access the table.
These are the table privileges a superuser or table owner can grant:

SELECT Permission to run SELECT queries on the table.
INSERT Permission to INSERT data into the table.
DELETE Permission to DELETE data from the table, as well as SELECT privilege on the
table when executing a DELETE statement that references table column values in a
WHERE or SET clause.
UPDATE Permission to UPDATE and change data in the table, as well as SELECT privilege
on the table when executing an UPDATE statement that references table column
values in a WHERE or SET clause.
REFERENCES Permission to CREATE foreign key constraints that reference this table.
To use any of the above privileges, the user must also have USAGE privileges on the schema that
contains the table. See Schema Privileges for details.
Referencing sequence in the CREATE TABLE statement requires the following privileges:
l SELECT privilege on sequence object
For details on granting and revoking table privileges, see GRANT (Table) and REVOKE (Table) in
the SQL Reference Manual.
Projection Privileges
Because projections are the underlying storage construct for tables, they are atypical in that they
do not have an owner or privileges associated with them directly. Instead, the privileges to create,
access, or alter a projection are based on the anchor and base tables that the projection references,
as well as the schemas that contain them.
To be able run a query involving a projection, a user must have SELECT privileges on the table or
tables that the projection references, and USAGE privileges on all the schemas that contain those
tables.
There are two ways to create projection: explicitly and implicitly.
Explicit Projection Creation and Privileges
To explicitly create a projection using the CREATE PROJECTION statement, a user must be a
superuser or owner of the anchor table or have the following privileges:
l CREATE privilege on the schema in which the projection is created
l SELECT on all the base tables referenced by the projection
l USAGE on all the schemas that contain the base tables referenced by the projection

Explicitly-created projections can only be dropped by the table owner on which the projection is
based for a single-table projection, or the owner of the anchor table for pre-join projections.
Implicit Projection Creation and Privileges
Projections get implicitly created when you insert data into a table, an operation that automatically
creates a superprojection for the table.
Implicitly-created projections do not require any additional privileges to create or drop, other than
privileges for table creation. Users who can create a table or drop a table can also create and drop
the associated superprojection.
Selecting From Projections
To select from projections requires the following privileges:
l SELECT privilege on each of the base tables
l USAGE privilege on the corresponding containing schemas
HP Vertica does not associate privileges directly with projections since they are the underlying
storage construct. Privileges may only be granted on the logical storage containers: the tables and
views.
Dropping Projections
Dropping projections are handled much the same way HP Vertica creates them:
l Explicitly with DROP PROJECTION statement
l Implicitly when you drop the table
View Privileges
By default, only a superuser and the view owner (typically the person who creates the view) have
access to the base object for a view. All other users and roles must be directly granted access to
the view. For example:
l If a non-owner runs a SELECT query on the view, the view owner must also have SELECT ...
WITH GRANT OPTION privileges on the view's base tables or views. This privilege must be
directly granted to the owner, rather than through a role.
l If a view owner runs a SELECT query on the view, the owner must also have SELECT privilege
directly granted (not through a role) on a view's base objects (table or view).
The only privilege that can be granted to a user or role is SELECT, which allows the user to execute
SELECT queries on the view. The user or role also needs to have USAGE privilege on the schema
containing the view to be able to run queries on the view.

SELECT Permission to run SELECT queries on the view.
USAGE Permission on the schema that contains the view
For details on granting and revoking view privileges, see GRANT (View) and REVOKE (View) in the
SQL Reference Manual.
Sequence Privileges
To create a sequence, a user must have CREATE privileges on schema that contains the
sequence. Only the owner and superusers can initially access the sequence. All other users must
be granted access to the sequence by a superuser or the owner.
Only the sequence owner (typically the person who creates the sequence) or can drop or rename a
sequence, or change the schema in which the sequence resides:
l DROP SEQUENCE: Only a sequence owner or schema owner can drop a sequence.
l ALTER SEQUENCE RENAME TO: A sequence owner must have USAGE and CREATE
privileges on the schema that contains the sequence to be renamed.
l ALTER SEQUENCE SET SCHEMA: A sequence owner must have USAGE privilege on the
schema that currently contains the sequence (old schema), as well as CREATE privilege on the
schema where the sequence will be moved (new schema).
The following table lists the privileges that can be granted to users or roles on sequences.
The only privilege that can be granted to a user or role is SELECT, which allows the user to use
CURRVAL() and NEXTVAL() on sequence and reference in table. The user or role also needs to
have USAGE privilege on the schema containing the sequence.
SELECT Permission to use CURRVAL() and NEXTVAL() on sequence and reference in table.
USAGE Permissions on the schema that contains the sequence.
Note: Referencing sequence in the CREATE TABLE statement requires SELECT privilege on
sequence object and USAGE privilege on sequence schema.
For details on granting and revoking sequence privileges, see GRANT (Sequence) and REVOKE
(Sequence) in the SQL Reference Manual.
See Also
l Using Named Sequences

External Procedure Privileges
Only a superuser is allowed to create or drop an external procedure.
By default, users cannot execute external procedures. A superuser must grant users and roles this
right, using the GRANT (Procedure) EXECUTE statement. Additionally, users must have USAGE
privileges on the schema that contains the procedure in order to call it.
EXECUTE Permission to run an external procedure.
USAGE Permission on the schema that contains the procedure.
For details on granting and revoking external table privileges, see GRANT (Procedure) and
REVOKE (Procedure) in the SQL Reference Manual.
User-Defined Function Privileges
User-defined functions (described in CREATE FUNCTION Statements) can be created by
superusers or users with CREATE privileges on the schema that will contain the function, as well
as USAGE privileges on the base library (if applicable).
Users or roles other than the function owner can use a function only if they have been granted
EXECUTE privileges on it. They must also have USAGE privileges on the schema that contains
the function to be able to call it.
EXECUTE Permission to call a user-defined function.
USAGE Permission on the schema that contains the function.
l DROP FUNCTION: Only a superuser or function owner can drop the function.
l ALTER FUNCTION RENAME TO: A superuser or function owner must have USAGE and
CREATE privileges on the schema that contains the function to be renamed.
l ALTER FUNCTION SET SCHEMA: A superuser or function owner must have USAGE privilege
on the schema that currently contains the function (old schema), as well as CREATE privilege
on the schema where the function will be moved (new schema).
For details on granting and revoking user-defined function privileges, see the following topics in the
SQL Reference Manual:
l GRANT (User Defined Extension)
l REVOKE (User Defined Extension)

Library Privileges
Only a superuser can load an external library using the CREATE LIBRARY statement. By default,
only a superuser can create user-defined functions (UDFs) based on a loaded library. A superuser
can use the GRANT USAGE ON LIBRARY statement to allow users to create UDFs based on
classes in the library. The user must also have CREATE privileges on the schema that will contain
the UDF.
USAGE Permission to create UDFs based on classes in the library
Once created, only a superuser or the user who created a UDF can use it by default. Either of them
can grant other users or roles the ability to call the function using the GRANT EXECUTE ON
FUNCTION statement. See the GRANT (Function) and REVOKE (Function) topics in the SQL
Reference Manual for more information on granting and revoking privileges on functions.
In addition to EXECUTE privilege, users/roles also require USAGE privilege on the schema in
which the function resides in order to execute the function.
For more information about libraries and UDFs, see Developing and Using User Defined Functions
in the Programmer's Guide.
Resource Pool Privileges
Only a superuser can create, alter, or drop a resource pool.
By default, users are granted USAGE rights to the GENERAL pool, from which their queries and
other statements allocate memory and get their priorities. A superuser must grant users USAGE
rights to any additional resource pools by using the GRANT USAGE ON RESOURCE POOL
statement. Once granted access to the resource pool, users can use the SET SESSION
RESOURCE POOL statement and the RESOURCE POOL clause of the ALTER USER statement
to have their queries draw their resources from the new pool.
USAGE Permission to use a resource pool.
SELECT Permission to look up resource pool information/status in system tables.
UPDATE Permission to adjust the tuning parameters of the pool.
For details on granting and revoking resource pool privileges, see GRANT (Resource Pool) and
REVOKE (Resource Pool) in the SQL Reference Manual.
Storage Location Privileges
Users and roles without superuser privileges can copy data to and from storage locations as long
as the following conditions are met, where a superuser:

1. Creates a a special class of storage location (ADD_LOCATION) specifying the 'USER'
argument, which indicates the specified area is accessible to non-dbadmin users.
2. Grants users or roles READ and/or WRITE access to the specified location using the GRANT
(Storage Location) statement.
Note: GRANT/REVOKE (Storage Location) statements are applicable only to 'USER'
storage locations.
Once such storage locations exist and the appropriate privileges are granted, users and roles
granted READ privileges can copy data from files in the storage location into a table. Those granted
WRITE privileges can export data from a table to the storage location on which they have been
granted access. WRITE privileges also let users save COPY statement exceptions and rejected
data files from HP Vertica to the specified storage location.
Only a superuser can add, alter, retire, drop, and restore a location, as well as set and measure
location performance. All non-dbadmin users or roles require READ and/or WRITE permissions on
the location.
READ Allows the user to copy data from files in the storage location into a table.
WRITE Allows the user to copy data to the specific storage location. Users with WRITE
privileges can also save COPY statement exceptions and rejected data files to the
specified storage location.
See Also
l GRANT (Storage Location)
l Storage Management Functions
l ADD_LOCATION
Role, profile, and User Privileges
Only a superuser can create, alter or drop a:
l role
l profile
l user
By default, only the superuser can grant or revoke a role to another user or role. A user or role can be
given the privilege to grant and revoke a role by using the WITH ADMIN OPTION clause of the
GRANT statement.

For details on granting and revoking role privileges, see GRANT (Role) and REVOKE (Role) in the
See Also
l CREATE USER
l ALTER USER
l DROP USER
l CREATE PROFILE
l ALTER PROFILE
l DROP PROFILE
l CREATE ROLE
l ALTER ROLE RENAME
l DROP ROLE
Metadata Privileges
A superuser has unrestricted access to all database metadata. Other users have significantly
reduced access to metadata based on their privileges, as follows:

Type of
Metadata User Access
Catalog
objects:
l Tables
l Columns
l Constraints
l Sequences
l External
Procedures
l Projections
l ROS
containers
l WOS
Users must possess USAGE privilege on the schema and any type of access
(SELECT) or modify privilege on the object to see catalog metadata about the
object. See also Schema Privileges.
For internal objects like projections, WOS and ROS containers that don't have
access privileges directly associated with them, the user must possess the
requisite privileges on the associated schema and table objects instead. For
example, to see whether a table has any data in the WOS, you need to have
USAGE on the table schema and at least SELECT on the table itself. See also
Table Privileges and Projection Privileges.
User sessions
and functions,
and system
tables related
to these
sessions
Users can only access information about their own, current sessions.
The following functions provide restricted functionality to users:
l CURRENT_DATABASE
l CURRENT_SCHEMA
l CURRENT_USER
l HAS_TABLE_PRIVILEGE
l SESSION_USER (same as CURRENT_USER)
The system table, SESSIONS, provides restricted functionality to users.
Storage
locations
Users require READ permissions to copy data from storage locations.
Only a superuser can add or retire storage locations.
I/O Privileges
Users need no special permissions to connect to and disconnect from an HP Vertica database.
To EXPORT TO and COPY FROM HP Vertica, the user must have:

l SELECT privileges on the source table
l INSERT privileges for the destination table in target database
To COPY FROM STDIN and use local COPY a user must have INSERT privileges on the table
and USAGE privilege on schema.
Note: Only a superuser can COPY from file.
Comment Privileges
A comment lets you add, revise, or remove a textual message to a database object. You must be
an object owner or superuser in order to COMMENT ON one of the following objects:
l COLUMN
l CONSTRAINT
l FUNCTION (including AGGREGATE and ANALYTIC)
l LIBRARY
l NODE
l PROJECTION
l SCHEMA
l SEQUENCE
l TABLE
l TRANSFORM FUNCTION
l VIEW
Other users must have VIEW privileges on an object to view its comments.
Transaction Privileges
No special permissions are required for the following database operations:

l COMMIT
l ROLLBACK
l RELEASE SAVEPOINT
l SAVEPOINT
Session Privileges
No special permissions are required for users to use the SHOW statement or any of the SET
statements.
Tuning Privileges
In order to PROFILE a single SQL statement or returns a query plan's execution strategy to
standard output using the EXPLAIN command, users must have the same privileges that are
required for them to run the same query without the PROFILE or EXPLAIN keyword.
Granting and Revoking Privileges
To grant or revoke a privilege using one of the SQL GRANT or REVOKE statements, the user must
have the following permissions for the GRANT/REVOKE statement to succeed:
l Superuser or privilege WITH GRANT OPTION
l USAGE privilege on the schema
l Appropriate privileges on the object
The syntax for granting and revoking privileges is different for each database object, such as
schema, database, table, view, sequence, procedure, function, resource pool, and so on.
Normally, a superuser first creates a user and then uses GRANT syntax to define the user's
privileges or roles or both. For example, the following series of statements creates user Carol and
grants Carol access to the apps database in the PUBLIC schema and also lets Carol grant
SELECT privileges to other users on the applog table:
=> CREATE USER Carol;=> GRANT USAGE ON SCHEMA PUBLIC to Carol;
=> GRANT ALL ON DATABASE apps TO Carol;
=> GRANT SELECT ON applog TO Carol WITH GRANT OPTION;
See GRANT Statements and REVOKE Statements in the SQL Reference Manual.
About Superuser Privileges
A superuser (DBADMIN) is the automatically-created database user who has the same name as
the Linux database administrator account and who can bypass all GRANT/REVOKE authorization,

as well as supersede any user that has been granted the PSEUDOSUPERUSER role.
Note: Database superusers are not the same as a Linux superuser with (root) privilege and
cannot have Linux superuser privilege.
A superuser can grant privileges on all database object types to other users, as well as grant
privileges to roles. Users who have been granted the role will then gain the privilege as soon as
they enable it.
Superusers may grant or revoke any object privilege on behalf of the object owner, which means a
superuser can grant or revoke the object privilege if the object owner could have granted or revoked
the same object privilege. A superuser may revoke the privilege that an object owner granted, as
well as the reverse.
Since a superuser is acting on behalf of the object owner, the GRANTOR column of V_
CATALOG.GRANTS table displays the object owner rather than the superuser who issued the
GRANT statement.
A superuser can also alter ownership of table and sequence objects.
See Also
DBADMIN Role
About Schema Owner Privileges
By default, the schema owner has privileges to create objects within a schema. Additionally, the
schema owner can drop any object in the schema, requiring no additional privilege on the object.
The schema owner is typically the user who creates the schema.
Schema owners cannot access objects in the schema. Access to objects requires the appropriate
privilege at the object level.
All other access to the schema and its objects must be explicitly granted to users or roles by a
superuser or schema owner to prevent unauthorized users from accessing the schema and its
objects.
See Schema Privileges
About Object Owner Privileges
The database, along with every object in it, has an owner. The object owner is usually the person
who created the object, although a superuser can alter ownership of objects, such as table and
sequence.
Object owners must have appropriate schema privilege to access, alter, rename, move or drop any
object it owns without any additional privileges.
An object owner can also:

l Grant privileges on their own object to other users
The WITH GRANT OPTION clause specifies that a user can grant the permission to other
users. For example, if user Bob creates a table, Bob can grant privileges on that table to users
Ted, Alice, and so on.
l Grant privileges to roles
Users who are granted the role gain the privilege.
How to Grant Privileges
As described in Granting and Revoking Privileges, specific users grant privileges using the GRANT
statement with or without the optional WITH GRANT OPTION, which allows the user to grant the
same privileges to other users.
l A superuser can grant privileges on all object types to other users.
l A superuser or object owner can grant privileges to roles. Users who have been granted the role
then gain the privilege.
l An object owner can grant privileges on the object to other users using the optional WITH
GRANT OPTION clause.
l The user needs to have USAGE privilege on schema and appropriate privileges on the object.
When a user grants an explicit list of privileges, such as GRANT INSERT, DELETE, REFERENCES ON
applog TO Bob:
l The GRANT statement succeeds only if all the roles are granted successfully. If any grant
operation fails, the entire statement rolls back.
l HP Vertica will return ERROR if the user does not have grant options for the privileges listed.
When a user grants ALL privileges, such as GRANT ALL ON applog TO Bob, the statement always
succeeds. HP Vertica grants all the privileges on which the grantor has the WITH GRANT OPTION
and skips those privileges without the optional WITH GRANT OPTION.
For example, if the user Bob has delete privileges with the optional grant option on the applog table,
only DELETE privileges are granted to Bob, and the statement succeeds:
=> GRANT DELETE ON applog TO Bob WITH GRANT OPTION;GRANT PRIVILEGE
For details, see the GRANT Statements in the SQL Reference Manual.
How to Revoke Privileges
In general, ONLY the user who originally granted a privilege can revoke it using a REVOKE
statement. That user must have superuser privilege or have the optional WITH GRANT OPTION

on the privilege. The user also must have USAGE privilege on the schema and appropriate
privileges on the object for the REVOKE statement to succeed.
In order to revoke a privilege, this privilege must have been granted to the specified grantee by this
grantor before. If HP Vertica finds that to be the case, the above REVOKE statement removes the
privilege (and WITH GRANT OPTION privilege, if supplied) from the grantee. Otherwise, HP
Vertica prints a NOTICE that the operation failed, as in the following example.
=> REVOKE SELECT ON applog FROM Bob;
NOTICE 0: Cannot revoke "SELECT" privilege(s) for relation "applog" that you did not gra
nt to "Bob"
REVOKE PRIVILEGE
The above REVOKE statement removes the privilege (and WITH GRANT OPTION privilege, if
applicable) from the grantee or it prints a notice that the operation failed.
In order to revoke grant option for a privilege, the grantor must have previously granted the grant
option for the privilege to the specified grantee. Otherwise, HP Vertica prints a NOTICE.
The following REVOKE statement removes the GRANT option only but leaves the privilege intact:
=> GRANT INSERT on applog TO Bob WITH GRANT OPTION;
GRANT PRIVILEGE
=> REVOKE GRANT OPTION FOR INSERT ON applog FROM Bob;
REVOKE PRIVILEGE
When a user revokes an explicit list of privileges, such as GRANT INSERT, DELETE, REFERENCES
ON applog TO Bob:
l The REVOKE statement succeeds only if all the roles are revoked successfully. If any revoke
operation fails, the entire statement rolls back.
l HP Vertica returns ERROR if the user does not have grant options for the privileges listed.
l HP Vertica returns NOTICE when revoking privileges that this user had not been previously
granted.
When a user revokes ALL privileges, such as REVOKE ALL ON applog TO Bob, the statement
always succeeds. HP Vertica revokes all the privileges on which the grantor has the optional WITH
GRANT OPTION and skips those privileges without the WITH GRANT OPTION.
For example, if the user Bob has delete privileges with the optional grant option on the applog table,
only grant option is revoked from Bob, and the statement succeeds without NOTICE:
=> REVOKE GRANT OPTION FOR DELETE ON applog FROM Bob;
For details, see the REVOKE Statements in the SQL Reference Manual.

Privilege Ownership Chains
The ability to revoke privileges on objects can cascade throughout an organization. If the grant
option was revoked from a user, the privilege that this user granted to other users will also be
revoked.
If a privilege was granted to a user or role by multiple grantors, to completely revoke this privilege
from the grantee the privilege has to be revoked by each original grantor. The only exception is a
superuser may revoke privileges granted by an object owner, with the reverse being true, as well.
In the following example, the SELECT privilege on table t1 is granted through a chain of users, from
a superuser through User3.
l A superuser grants User1 CREATE privileges on the schema s1:
=> c - dbadminYou are now connected as user "dbadmin".
=> CREATE USER User1;
CREATE USER
CREATE USER
CREATE USER
=> CREATE SCHEMA s1;
CREATE SCHEMA
=> GRANT USAGE on SCHEMA s1 TO User1, User2, User3;
GRANT PRIVILEGE
=> CREATE ROLE reviewer;
CREATE ROLE
=> GRANT CREATE ON SCHEMA s1 TO User1;
GRANT PRIVILEGE
l User1 creates new table t1 within schema s1 and then grants SELECT WITH GRANT OPTION
privilege on s1.t1 to User2:
=> c - User1You are now connected as user "User1".
=> CREATE TABLE s1.t1(id int, sourceID VARCHAR(8));
CREATE TABLE
=> GRANT SELECT on s1.t1 to User2 WITH GRANT OPTION;
GRANT PRIVILEGE
l User2 grants SELECT WITH GRANT OPTION privilege on s1.t1 to User3:
=> GRANT SELECT on s1.t1 to User3 WITH GRANT OPTION;
GRANT PRIVILEGE
l User3 grants SELECT privilege on s1.t1 to the reviewer role:

=> GRANT SELECT on s1.t1 to reviewer;
GRANT PRIVILEGE
Users cannot revoke privileges upstream in the chain. For example, User2 did not grant privileges
on User1, so when User1 runs the following REVOKE command, HP Vertica rolls back the
command:
=> REVOKE CREATE ON SCHEMA s1 FROM User1;
ROLLBACK 0: "CREATE" privilege(s) for schema "s1" could not be revoked from "User1"
Users can revoke privileges indirectly from users who received privileges through a cascading
chain, like the one shown in the example above. Here, users can use the CASCADE option to
revoke privileges from all users "downstream" in the chain. A superuser or User1 can use the
CASCADE option to revoke the SELECT privilege on table s1.t1 from all users. For example, a
superuser or User1 can execute the following statement to revoke the SELECT privilege from all
users and roles within the chain:
=> REVOKE SELECT ON s1.t1 FROM User2 CASCADE;
REVOKE PRIVILEGE
When a superuser or User1 executes the above statement, the SELECT privilege on table s1.t1 is
revoked from User2, User3, and the reviewer role. The GRANT privilege is also revoked from
User2 and User3, which a superuser can verify by querying the V_CATALOG.GRANTS system
table.
=> SELECT * FROM grants WHERE object_name = 's1' AND grantee ILIKE 'User%';
grantor | privileges_description | object_schema | object_name | grantee
---------+------------------------+---------------+-------------+---------
dbadmin | USAGE | | s1 | User1
(3 rows)

Modifying Privileges
A superuser or object owner can use one of the ALTER statements to modify a privilege, such as
changing a sequence owner or table owner. Reassignment to the new owner does not transfer
grants from the original owner to the new owner; grants made by the original owner are dropped.
Changing a Table Owner
The ability to change table ownership is useful when moving a table from one schema to another.
Ownership reassignment is also useful when a table owner leaves the company or changes job
responsibilities. Because you can change the table owner, the tables won't have to be completely
rewritten, you can avoid loss in productivity.
The syntax looks like this:
ALTER TABLE [[db-name.]schema.]table-name OWNER TO new-owner name
In order to alter table ownership, you must be either the table owner or a superuser.
A change in table ownership transfers just the owner and not privileges; grants made by the original
owner are dropped and all existing privileges on the table are revoked from the previous owner.
However, altering the table owner transfers ownership of dependent sequence objects (associated
IDENTITY/AUTO-INCREMENT sequences) but does not transfer ownership of other referenced
sequences. See ALTER SEQUENCE for details on transferring sequence ownership.
Notes
l Table privileges are separate from schema privileges; therefore, a table privilege change or table
owner change does not result in any schema privilege change.
l Because projections define the physical representation of the table, HP Vertica does not require
separate projection owners. The ability to create or drop projections is based on the table
privileges on which the projection is anchored.
l During the alter operation HP Vertica updates projections anchored on the table owned by the
old owner to reflect the new owner. For pre-join projection operations, HP Vertica checks for
privileges on the referenced table.
Example
In this example, user Bob connects to the database, looks up the tables, and transfers ownership of
table t33 from himself to to user Alice.
=> c - BobYou are now connected as user "Bob".
=> d
Schema | Name | Kind | Owner | Comment
--------+--------+-------+---------+---------
public | applog | table | dbadmin |

public | t33 | table | Bob |
(2 rows)
=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE
Notice that when Bob looks up database tables again, he no longer sees table t33.
=> d List of tables
List of tables
--------+--------+-------+---------+---------
(1 row)
When user Alice connects to the database and looks up tables, she sees she is the owner of table
t33.
=> c - AliceYou are now connected as user "Alice".
=> d
List of tables
--------+------+-------+-------+---------
public | t33 | table | Alice |
(2 rows)
Either Alice or a superuser can transfer table ownership back to Bob. In the following case a
superuser performs the transfer.
=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> d
List of tables
--------+----------+-------+---------+---------
public | comments | table | dbadmin |
s1 | t1 | table | User1 |
(4 rows)
You can also query the V_CATALOG.TABLES system table to view table and owner information.
Note that a change in ownership does not change the table ID.
In the below series of commands, the superuser changes table ownership back to Alice and queries
the TABLES system table.
=> ALTER TABLE t33 OWNER TO Alice;ALTER TABLE
=> SELECT table_schema_id, table_schema, table_id, table_name, owner_id, owner_name FROM
tables; table_schema_id | table_schema | table_id | table_name | owner_id
| owner_name
-------------------+--------------+-------------------+------------+-------------------+-

-----------
45035996273704968 | public | 45035996273713634 | applog | 45035996273704962 |
dbadmin
45035996273704968 | public | 45035996273724496 | comments | 45035996273704962 |
dbadmin
45035996273730528 | s1 | 45035996273730548 | t1 | 45035996273730516 |
User1
45035996273704968 | public | 45035996273795846 | t33 | 45035996273724576 |
Alice
(5 rows)
Now the superuser changes table ownership back to Bob and queries the TABLES table again.
Nothing changes but the owner_name row, from Alice to Bob.
=> ALTER TABLE t33 OWNER TO Bob;ALTER TABLE
tables;
table_schema_id | table_schema | table_id | table_name | owner_id |
owner_name-------------------+--------------+-------------------+------------+-----------
--------+------------
45035996273704968 | public | 45035996273713634 | applog | 45035996273704962 |
dbadmin
45035996273704968 | public | 45035996273724496 | comments | 45035996273704962 |
dbadmin
45035996273730528 | s1 | 45035996273730548 | t1 | 45035996273730516 |
User1
45035996273704968 | public | 45035996273793876 | foo | 45035996273724576 |
Alice
45035996273704968 | public | 45035996273795846 | t33 | 45035996273714428 |
Bob
(5 rows)
Table Reassignment with Sequences
Altering the table owner transfers ownership of only associated IDENTITY/AUTO-INCREMENT
sequences but not other reference sequences. For example, in the below series of commands,
ownership on sequence s1 does not change:
=> CREATE USER u1;CREATE USER
=> CREATE USER u2;
CREATE USER
=> CREATE SEQUENCE s1 MINVALUE 10 INCREMENT BY 2;
CREATE SEQUENCE
=> CREATE TABLE t1 (a INT, id INT DEFAULT NEXTVAL('s1'));
CREATE TABLE
CREATE TABLE
=> SELECT sequence_name, owner_name FROM sequences;
sequence_name | owner_name
---------------+------------
s1 | dbadmin
(1 row)

=> ALTER TABLE t1 OWNER TO u1;
ALTER TABLE
---------------+------------
s1 | dbadmin
(1 row)
ALTER TABLE
---------------+------------
s1 | dbadmin
(1 row)
See Also
l Changing a Sequence Owner
Changing a Sequence Owner
The ALTER SEQUENCE command lets you change the attributes of an existing sequence. All
changes take effect immediately, within the same session. Any parameters not set during an ALTER
SEQUENCE statement retain their prior settings.
If you need to change sequence ownership, such as if an employee who owns a sequence leaves
the company, you can do so with the following ALTER SEQUENCE syntax:
ALTER SEQUENCE sequence-name OWNER TO new-owner-name;
This operation immediately reassigns the sequence from the current owner to the specified new
owner.
Only the sequence owner or a superuser can change ownership, and reassignment does not
transfer grants from the original owner to the new owner; grants made by the original owner are
dropped.
Note: Renaming a table owner transfers ownership of dependent sequence objects
(associated IDENTITY/AUTO-INCREMENT sequences) but does not transfer ownership of other
referenced sequences. See Changing a Table Owner.
Example
The following example reassigns sequence ownership from the current owner to user Bob:
=> ALTER SEQUENCE sequential OWNER TO Bob;
See ALTER SEQUENCE in the SQL Reference Manual for details.

About Database Roles
To make managing permissions easier, use roles. A role is a collection of privileges that a
superuser can grant to (or revoke from) one or more users or other roles. Using roles avoids having
to manually grant sets of privileges user by user. For example, several users might be assigned to
the administrator role. You can grant or revoke privileges to or from the administrator role, and all
users with access to that role are affected by the change.
Note: Users must first enable a role before they gain all of the privileges that have been
granted to it. See Enabling Roles.
Role Hierarchies
You can also use roles to build hierarchies of roles; for example, you can create an administrator
role that has privileges granted non-administrator roles as well as to the privileges granted directly
to the administrator role. See also Role Hierarchy.
Roles do no supersede manually-granted privileges, so privileges directly assigned to a user are not
altered by roles. Roles just give additional privileges to the user.
Creating and Using a Role
Using a role follows this general flow:
1. A superuser creates a role using the CREATE ROLE statement.
2. A superuser or object owner grants privileges to the role using one of the GRANT statements.
3. A superuser or users with administrator access to the role grant users and other roles access
to the role.
4. Users granted access to the role use the SET ROLE command to enable that role and gain the
role's privileges.
You can do steps 2 and 3 in any order. However, granting access to a role means little until the role
has privileges granted to it.
Tip: You can query the V_CATALOG system tables ROLES, GRANTS, and USERS to see
any directly-assigned roles; however, these tables do not indicate whether a role is available to
a user when roles could be available through other roles (indirectly). See the HAS_ROLE()
function for additional information.

Roles on Management Console
When users sign in to the Management Console (MC), what they can view or do is governed by MC
roles. For details, see About MC Users and About MC Privileges and Roles.

Types of Database Roles
HP Vertica has three pre-defined roles:
l PUBLIC
l PSEUDOSUPERUSER
l DBADMIN
l DBDUSER
Predefined roles cannot be dropped or renamed. Other roles may not be granted to (or revoked from)
predefined roles except to/from PUBLIC, but predefined roles may be granted to other roles or users
or both.
Individual privileges may be granted to/revoked from predefined roles. See the SQL Reference
Manual for all of the GRANT and REVOKE statements.
DBADMIN Role
Every database has the special DBADMIN role. A superuser (or someone with the
PSEUDOSUPERUSER Role) can grant this role to or revoke this role from any user or role.
Users who enable the DBADMIN role gain these privileges:
l Create or drop users
l Create or drop schemas
l Create or drop roles
l View all system tables
l View and terminate user sessions
The DBADMIN role does NOT allow users to:
l Start and stop a database
l Change DBADMIN privileges
l Set configuration parameters (set_config_parameter)
You can assign additional privileges to the DBADMIN role, but you cannot assign any additional
roles; for example, the following is not allowed:
=> CREATE ROLE appviewer;
CREATE ROLE

=> GRANT appviewer TO dbadmin;
ROLLBACK 2347: Cannot alter predefined role "dbadmin"
You can, however, grant the DBADMIN role to other roles to augment a set of privileges. See Role
Hierarchy for more information.
View a List of Database Superusers
To see who is a superuser, run the vsql du meta-command. In this example, only dbadmin is a
superuser.
=> du
List of users
User name | Is Superuser
-----------+--------------
dbadmin | t
Fred | f
Bob | f
Sue | f
Alice | f
User1 | f
User2 | f
User3 | f
u1 | f
u2 | f
(10 rows)
See Also
DBADMIN User
DBDUSER Role
The special DBDUSER role must be explicitly granted by a superuser and is a predefined role. The
DBDUSER role allows non-DBADMIN users to access Database Designer using command-line
functions. Users with the DBDUSER role cannot access Database Designer using the
Administration Tools. Only DBADMIN users can run Administration Tools.
You cannot assign any additional privileges to the DBDUSER role, but you can grant the
DBDUSER role to other roles to augment a set of privileges.
Once you have been granted the DBDUSER role, you must enable it before you can run Database
Designer using command-line functions. For more information, see About Running Database
Designer Programmatically.
Important: When you create a DBADMIN user or grant the DBDUSER role, make sure to
associate a resource pool with that user to manage resources during Database Designer runs.
Multiple users can run Database Designer concurrently without interfering with each other or
using up all the cluster resources. When a user runs Database Designer, either using the

Administration Tools or programmatically, its execution is mostly contained by the user's
resource pool, but may spill over into some system resource pools for less-intensive tasks.
PSEUDOSUPERUSER Role
The special PSEUDOSUPERUSER role is automatically created in each database. A superuser
(or someone with the PSEUDOSUPERUSER role) can grant this role to another user, or revoke the
role from another user. The PSEUDOSUPERUSER cannot revoke or change any superuser
privileges.
Users with the PSEUDOSUPERUSER role enabled have all of the privileges of the database
superuser, including the ability to:
l Create schemas
l Create and grant privileges to roles
l Bypass all GRANT/REVOKE authorization
l Set user account's passwords
l Lock and unlock user accounts
l Create or drop a UDF library
l Create or drop a UDF function
l Create or drop an external procedure
l Add or edit comments on nodes
l Create or drop password profiles
You can assign additional privileges to the PSEUDOSUPERUSER role, but you cannot assign any
additional roles; for example, the following is not allowed:
=> CREATE ROLE appviewer;
CREATE ROLE
=> GRANT appviewer TO pseudosuperuser;
ROLLBACK 2347: Cannot alter predefined role "pseudosuperuser"
PUBLIC Role
By default, every database has the special PUBLIC role. HP Vertica grants this role to each user
automatically, and it is automatically enabled. You grant privileges to this role that every user
should have by default. You can also grant access to roles to PUBLIC, which allows any user to
access the role using the SET ROLE statement.

Note: The PUBLIC role can never be dropped, nor can it be revoked from users or roles.
Example
In the following example, if the superuser hadn't granted INSERT privileges on the table publicdata
to the PUBLIC group, the INSERT statement executed by user bob would fail:
=> CREATE TABLE publicdata (a INT, b VARCHAR);
CREATE TABLE
=> GRANT INSERT, SELECT ON publicdata TO PUBLIC;
GRANT PRIVILEGE
=> CREATE PROJECTION publicdataproj AS (SELECT * FROM publicdata);
CREATE PROJECTION
dbadmin=> c - bob
You are now connected as user "bob".
=> INSERT INTO publicdata VALUES (10, 'Hello World');
OUTPUT
--------
1
(1 row)
See Also
PUBLIC User
Default Roles for Database Users
By default, no roles (other than the default PUBLIC Role) are enabled at the start of a user session.
=> SHOW ENABLED_ROLES;
name | setting
---------------+---------
enabled roles |
(1 row)
A superuser can set one or more default roles for a user, which are automatically enabled at the
start of the user's session. Setting a default role is a good idea if users normally rely on the
privileges granted by one or more roles to carry out the majority of their tasks. To set a default role,
use the DEFAULT ROLE parameter of the ALTER USER statement as superuser:
=> c vmart apps
You are now connected to database "apps" as user "dbadmin".
=> ALTER USER Bob DEFAULT ROLE logadmin;
ALTER USER
=> c - Bob;
You are now connected as user "Bob"

name | setting
---------------+----------
enabled roles | logadmin
(1 row)
Notes
l Only roles that the user already has access to can be made default.
l Unlike granting a role, setting a default role or roles overwrites any previously-set defaults.
l To clear any default roles for a user, use the keyword NONE as the role name in the DEFAULT
ROLE argument.
l Default roles only take effect at the start of a user session. They do not affect the roles enabled
in the user's current session.
l Avoid giving users default roles that have administrative or destructive privileges (the
PSEUDOSUPERUSER role or DROP privileges, for example). By forcing users to explicitly
enable these privileges, you can help prevent accidental data loss.
Using Database Roles
There are several steps to using roles:
1. A superuser creates a role using the CREATE ROLE statement.
2. A superuser or object owner grants privileges to the role.
3. A superuser or users with administrator access to the role grant users and other roles access
to the role.
4. Users granted access to the role run the SET ROLE command to make that role active and
gain the role's privileges.
You can do steps 2 and 3 in any order. However, granting access to a role means little until the role
has privileges granted to it.
Tip: Query system tables ROLES, GRANTS, and USERS to see any directly-assigned roles.
Because these tables do not indicate whether a role is available to a user when roles could be
available through other roles (indirectly), see the HAS_ROLE() function for additional
information.

See Also
Role Hierarchy
In addition to granting roles to users, you can also grant roles to other roles. This lets you build
hierarchies of roles, with more privileged roles (an administrator, for example) being assigned all of
the privileges of lesser-privileged roles (a user of a particular application), in addition to the
privileges you assign to it directly. By organizing your roles this way, any privilege you add to the
application role (reading or writing to a new table, for example) is automatically made available to
the more-privileged administrator role.
Example
The following example creates two roles, assigns them privileges, then assigns them to a new
administrative role.
1. Create new table applog:
=> CREATE TABLE applog (id int, sourceID VARCHAR(32), data TIMESTAMP, event VARCHAR(2
56));
2. Create a new role called logreader:
=> CREATE ROLE logreader;
3. Grant the logreader role read-only access on the applog table:
=> GRANT SELECT ON applog TO logreader;
4. Create a new role called logwriter:
=> CREATE ROLE logwriter;
5. Grant the logwriter write access on the applog table:
=> GRANT INSERT ON applog to logwriter;
6. Create a new role called logadmin, which will rule the other two roles:

=> CREATE ROLE logadmin;
7. Grant the logadmin role privileges to delete data:
=> GRANT DELETE ON applog to logadmin;
8. Grant the logadmin role privileges to have the same privileges as the logreader and logwriter
roles:
=> GRANT logreader, logwriter TO logadmin;
9. Create new user Bob:
=> CREATE USER Bob;
10. Give Bob logadmin privileges:
=> GRANT logadmin TO Bob;
The user Bob can now enable the logadmin role, which also includes the logreader and logwriter
roles. Note that Bob cannot enable either the logreader or logwriter role directly. A user can only
enable explicitly-granted roles.
Hierarchical roles also works with administrative access to a role:
=> GRANT logreader, logwriter TO logadmin WITH ADMIN OPTION;
GRANT ROLE
=> GRANT logadmin TO Bob;
=> c - bob; -- connect as Bob
You are now connected as user "Bob".
=> SET ROLE logadmin; -- Enable logadmin role
SET
=> GRANT logreader TO Alice;
GRANT ROLE
Note that the user Bob only has administrative access to the logreader and logwriter roles through
the logadmin role. He doesn't have administrative access to the logadmin role, since it wasn't
granted to him with the optional WITH ADMIN OPTION argument:
=> GRANT logadmin TO Alice;
WARNING: Some roles were not granted
GRANT ROLE
For Bob to be able to grant the logadmin role, a superuser would have had to explicitly grant him
administrative access.

See Also
Creating Database Roles
A superuser creates a new role using the CREATE ROLE statement. Only a superuser can create
or drop roles.
=> CREATE ROLE administrator;
CREATE ROLE
The newly-created role has no privileges assigned to it, and no users or other roles are initially
granted access to it. A superuser must grant privileges and access to the role.
Deleting Database Roles
A superuser can delete a role with the DROP ROLE statement.
Note that if any user or other role has been assigned the role you are trying to delete, the DROP
ROLE statement fails with a dependency message.
=> DROP ROLE administrator;
NOTICE: User Bob depends on Role administrator
ROLLBACK: DROP ROLE failed due to dependencies
DETAIL: Cannot drop Role administrator because other objects depend on it
HINT: Use DROP ROLE ... CASCADE to remove granted roles from the dependent users/roles
Supply the optional CASCADE parameter to drop the role and its dependencies.
=> DROP ROLE administrator CASCADE;
DROP ROLE
Granting Privileges to Roles
A superuser or owner of a schema, table, or other database object can assign privileges to a role,
just as they would assign privileges to an individual user by using the GRANT statements
described in the SQL Reference Manual . See About Database Privileges for information about
which privileges can be granted.
Granting a privilege to a role immediately affects active user sessions. When you grant a new
privilege, it becomes immediately available to every user with the role active.
Example
The following example creates two roles and assigns them different privileges on a single table
called applog.

1. Create a table called applog:
=> CREATE TABLE applog (id int, sourceID VARCHAR(32), data TIMESTAMP, event VARCHAR(2
56));
2. Create a new role called logreader:
=> CREATE ROLE logreader;
3. Assign read-only privileges to the logreader role on table applog:
=> GRANT SELECT ON applog TO logreader;
4. Create a role called logwriter:
=> CREATE ROLE logwriter;
5. Assign write privileges to the logwriter role on table applog:
=> GRANT INSERT ON applog TO logwriter;
See SQL Reference Manual the for the different GRANT statements.
Revoking Privileges From Roles
Use one of the REVOKE statements to revoke a privilege from a role.
=> REVOKE INSERT ON applog FROM logwriter;
REVOKE PRIVILEGE
Revoking a privilege immediately affects any user sessions that have the role active. When you
revoke a privilege, it is immediately removed from users that rely on the role for the privilege.
See the SQL Reference Manual for the different REVOKE statements.
Granting Access to Database Roles
A superuser can assign any role to a user or to another role using the GRANT command. The
simplest form of this command is:
GRANT role [, ...] TO { user | role } [, ...]
HP Vertica will return a NOTICE if you grant a role with or without admin option, to a grantee who
has already been granted that role. For example:

=> GRANT commenter to Bob;
NOTICE 4622: Role "commenter" was already granted to user "Bob"
See GRANT (Role) in the SQL Reference Manual for details.
Example
The following process illustrates how to create a role called commenter and granting user Bob
access to that role.
1. Connect to the database as a superuser:
c - dbadmin
2. Create a table called comments:
=> CREATE TABLE comments (id INT, comment VARCHAR);
3. Create a new role called commenter:
=> CREATE ROLE commenter;
4. Grant privileges to the new role on the comments table:
=> GRANT INSERT, SELECT ON comments TO commenter;
5. Grant the commenter role to user Bob.
=> GRANT commenter TO Bob;
Enable the newly-granted role
1. Connect to the database as user Bob
=> c - Bob
2. User Bob enables the role:
=> SET ROLE commenter;

3. Now insert some values into the comments table:
=> INSERT INTO comments VALUES (1, 'Hello World');
Based on the privileges granted to Bob by the commenter role, Bob can insert and query the
comments table.
4. Query the comments table:
=> SELECT * FROM comments;
id | comment
----+-------------
1 | Hello World
(1 row)
5. Commit the transaction:
=> COMMIT;
Note that Bob does not have proper permissions to drop the table:
=> DROP TABLE comments;ROLLBACK 4000: Must be owner of relation comments
See Also
Revoking Access From Database Roles
A superuser can revoke any role from a user or from another role using the REVOKE command.
The simplest form of this command is:
REVOKE role [, ...] FROM { user | role | PUBLIC } [, ...]
See REVOKE (Role) in the SQL Reference Manual for details.
Example
To revoke access from a role, use the REVOKE (Role) statement:
1. Connect to the database as a superuser:
c - dbadmin

2. Revoke the commenter role from user Bob:
=> REVOKE commenter FROM bob;
Granting Administrative Access to a Role
A superuser can assign a user or role administrative access to a role by supplying the optional
WITH ADMIN OPTION argument to the GRANT statement. Administrative access allows the user
to grant and revoke access to the role for other users (including granting them administrative
access). Giving users the ability to grant roles lets a superuser delegate role administration to other
users.
Example
The following example demonstrates granting the user bob administrative access to the commenter
role, then connecting as bob and granting a role to another user.
1. Connect to the database as a superuser (or a user with administrative access):
=> c - dbadmin
2. Grand administrative options on the commenter role to Bob
=> GRANT commenter TO Bob WITH ADMIN OPTION;
3. Connect to the database as user Bob
=> c - Bob
4. As user Bob, grant the commenter role to Alice:
=> GRANT commenter TO Alice;
Users with administrative access to a role can also grant other users administrative access:
=> GRANT commenter TO alice WITH ADMIN OPTION;
GRANT ROLE
As with all user privilege models, database superusers should be cautious when granting any user a
role with administrative privileges. For example, if the database superuser grants two users a role
with administrative privileges, both users can revoke the role of the other user. This example shows
granting the appalling role (with administrative privileges) to users bob and alice. After each
user has been granted the appadmin role, either use can connect as the other will full privileges.

=> GRANT appadmin TO bob, alice WITH ADMIN OPTION;
GRANT ROLE
=> connect - bob
You are now connected as user "bob".
=> REVOKE appadmin FROM alice;
REVOKE ROLE
Revoking Administrative Access From a Role
A superuser can revoke administrative access from a role using the ADMIN OPTION parameter
with the REVOKE statement. Giving users the ability to revoke roles lets a superuser delegate role
administration to other users.
Example
The following example demonstrates revoking administrative access from Alice for the commenter
role.
1. Connect to the database as a superuser (or a user with administrative access)
c - dbadmin
2. Issue the REVOKE command with ADMIN OPTION parameters:
=> REVOKE ADMIN OPTION FOR commenter FROM alice;
Enabling Roles
By default, roles aren't enabled automatically for a user account. (See Default Roles for Database
Users for a way to make roles enabled automatically.) Users must explicitly enable a role using the
SET ROLE statement. When users enable a role in their session, they gain all of the privileges
assigned to that role. Enabling a role does not affect any other roles that the users have active in
their sessions. They can have multiple roles enabled simultaneously, gaining the combined
privileges of all the roles they have enabled, plus any of the privileges that have been granted to
them directly.
=> SELECT * FROM applog;
ERROR: permission denied for relation applog
=> SET ROLE logreader;
SET
=> SELECT * FROM applog;
id | sourceID | data | event
----+----------+----------------------------+--------------------------------------------

--
1 | Loader | 2011-03-31 11:00:38.494226 | Error: Failed to open source file
2 | Reporter | 2011-03-31 11:00:38.494226 | Warning: Low disk space on volume /scratch-
a
(2 rows)
You can enable all of the roles available to your user account using the SET ROLE ALL statement.
=> SET ROLE ALL;SET
name | setting
---------------+------------------------------
enabled roles | logreader, logwriter
(1 row)
See Also
l Viewing a User's Role
Disabling Roles
To disable all roles, use the SET ROLE NONE statement:
=> SET ROLE NONE;SET
name | setting
---------------+---------
enabled roles |
(1 row)
Viewing Enabled and Available Roles
You can list the roles you have enabled in your session using the SHOW ENABLED ROLES
statement:
name | setting
---------------+----------
enabled roles | logreader
(1 row)
You can find the roles available to your account using the SHOW AVAILABLE ROLES statement:
Bob=> SHOW AVAILABLE_ROLES;
name | setting
-----------------+-----------------------------

available roles | logreader, logwriter
(1 row)
Viewing Named Roles
To view the names of all roles users can access, along with any roles that have been assigned to
those roles, query the V_CATALOG.ROLES system table.
=> SELECT * FROM roles;
role_id | name | assigned_roles
-------------------+-----------------+----------------
45035996273704964 | public |
45035996273704966 | dbduser |
45035996273704968 | dbadmin | dbduser*
45035996273704972 | pseudosuperuser | dbadmin*
45035996273704974 | logreader |
45035996273704976 | logwriter |
45035996273704978 | logadmin | logreader, logwriter
(7 rows)
Note: An asterisk (*) in the output means that role was granted WITH ADMIN OPTION.
Viewing a User's Role
The HAS_ROLE() function lets you see if a role has been granted to a user.
Non-superusers can check their own role membership using HAS_ROLE('role_name'), but only a
superuser can look up other users' memberships using the user_name parameter. Omitting the
user_name parameter will return role results for the superuser who is calling the function.
How to View a User's Role
In this example, user Bob wants to see if he's been assigned the logwriter command. The output
returns Boolean value t for true, denoting that Bob is assigned the specified logwriter role:
Bob=> SELECT HAS_ROLE('logwriter');
HAS_ROLE
----------
t
(1 row)
In this example, a superuser wants to verify that the logadmin role has been granted to user Ted:
dbadmin=> SELECT HAS_ROLE('Ted', 'logadmin');
The output returns boolean value t for true, denoting that Ted is assigned the specified logadmin
role:

HAS_ROLE
----------
t
(1 row)
Note that if a superuser omits the user_name argument, the function looks up that superuser's role.
The following output indicates that this superuser is not assigned the logadmin role:
dbadmin=> SELECT HAS_ROLE('logadmin');
HAS_ROLE
----------
f
(1 row)
Output of the function call with user Alice indicates that she is not granted the logadmin role:
dbadmin=> SELECT HAS_ROLE('Alice', 'logadmin');
HAS_ROLE
----------
f
(1 row)
To view additional information about users, roles and grants, you can also query the following
system tables in the V_CATALOG schema to show directly-assigned roles:
l ROLES
l GRANTS
l USERS
Note that the system tables do not indicate whether a role is available to a user when roles could be
available through other roles (indirectly). You need to call the HAS_ROLE() function for that
information.
Users
This command returns all columns from the USERS system table:
=> SELECT * FROM users;
-[ RECORD 1 ]
------------------+---------------------------
user_id | 45035996273704962
user_name | dbadmin
is_super_user | t
profile_name | default
is_locked | f
lock_time |
resource_pool | general
memory_cap_kb | unlimited

About MC Privileges and Roles
As introduced in About MC Users, you control user access to Management Console through groups
of privileges (also referred to as access levels) that fall into two types, those that apply to MC
configuration, and those that apply to MC-managed HP Vertica databases.
MC Permission Groups
l MC configuration privileges are made up of roles that control what users can configure on the
MC, such as modify MC settings, create/import HP Vertica databases, restart MC, create an
HP Vertica cluster through the MC interfac, and create and manage MC users.
l MC database privileges are made up of roles that control what users can see or do on an MC-
managed HP Vertica database, such as view the database cluster state, query and session
activity, monitor database messages and read log files, replace cluster nodes, and stop
databases.
Note: When you grant an MC user a database role, that user inherits the privileges assigned to
the database user account to which the MC user is mapped. For maximum access, use the
dbadmin username and password.
MC database privileges cannot alter or override the HP Vertica database user's privileges and
roles. MC user/database user association is described in Mapping an MC User to a Database
user's Privileges.
MC's Configuration Privileges and Database Access
The following table shows MC role-based users and summarizes the levels of access they have on
the MC interface, as well as to any MC-managed databases.
User type MC config permissions MC database permissions
MC
administrators
(SUPER and
ADMIN)
Perform all administrative
operations on MC, including
configure and restart the MC
process and add, change,
and remove all user
accounts.
Automatically inherit the database privileges of
the main database user account used to set up
one or more databases on the MC interface. By
default, MC administrators have access to all
MC-managed databases.

User type MC config permissions MC database permissions
IT users (IT) Monitor all MC-managed
databases, view MC-level
(non database) messages,
logs, and alerts, disable or
enable user access to MC,
and reset non-LDAP user
passwords.
Inherit no database privileges. You must grant
the IT user access to one or more MC-managed
databases, which you do by mapping this user to
the database user account. The MC IT user then
inherits the privileges assigned to the database
user to which he/she is mapped.
Database
users (NONE)
Perform no administrative
operations on MC. View
and/or manage databases
that you assign them.
Inherit no database privileges. You must grant
the database (NONE) user access to one or
more MC-managed databases, which you do by
mapping this user to the database user account.
The database user inherits the privileges
assigned to the database user to which he/she is
mapped.
Described in
About MC
Users
Described in MC
Configuration Privileges
Described in MC Database Privileges
See Also
l About MC Users
MC Configuration Privileges
When you create an MC user, you assign them an MC configuration access level (role). For the
most part, MC configuration permissions control a user's ability to create users and manage MC
settings on the MC interface. You can grant a maximum of one role to each MC user, choosing from
one of the following:
l ADMIN Role (mc)—Full access to all MC functionality, including any MC-managed database
l IT Role (mc)—Full access to all MC functionality, but database access is assigned
l NONE Role (mc)—Database access only, according to the databases an administrator assigns
You grant MC configuration permissions at the same time you create the user's account, through
the MC Settings page. You can change MC access levels through the same page later, if
necessary. See Creating an MC User for details.
You will also likely grant non-administrators (users with the IT and NONE roles) access to one or
more MC-managed databases. See MC Database Privileges for details.

MC Configuration Privileges By User Role
The following table summarizes MC configuration permissions by role. For details, see each role in
the above list.
MC access privileges ADMIN IT NONE
Configure MC settings:
l Configure storage locations and ports
l Upload an HP Vertica license
l Upload new SSL certificates
l Manage LDAP authentication
Yes
Create and manage databases and clusters
l Create a new database or import an existing
one
l Create a new cluster or import an existing
one
l Remove database/cluster from the MC
interface
Yes
Configure user settings:
l Add, edit, delete users
l Enable/disable user access to MC
l Add, change, delete user permissions
l Map users to one or more databases
Yes
Monitor user activity on MC Yes
Reset MC to its original, preconfigured state Yes
Restart Management Console Yes
Disable or enable user access to MC interface Yes Yes
Reset users' (non-LDAP) passwords Yes Yes
Monitor all console-managed databases Yes Yes
View MC log and non-database MC alerts Yes Yes

See Also
l About MC Users
l MC Database Privileges
SUPER Role (mc)
The default superuser administrator, called Super on the MC UI, is a Linux user account that gets
created when you install and configure MC. During the configuration process, you can assign the
Super any name you like; it need not be dbadmin.
The MC SUPER role, a superset of the ADMIN Role (mc), has the following privileges:
l Oversees the entire Management Console, including all MC-managed database clusters
Note: This user inherits the privileges/roles of the user name supplied when importing an
HP Vertica database into MC. HP recommends that you use the database administrator's
credentials.
l Creates the first MC user accounts and assigns them an MC configuration role
l Grants MC users access to one or more MC-managed HP Vertica databases by assigning MC
Database Privileges to each user
The MC super administrator account is unique. Unlike other MC users you create, including other
MC administrators, the MC super account cannot be altered or dropped, and you cannot grant the
SUPER role to other MC users. The only property you can change for the MC super is the
password. Otherwise the SUPER role has the same privileges on MC as the ADMIN Role (mc).
On MC-managed HP Vertica databases, SUPER has the same privileges as ADMIN Role (db).
The MC super account does not exist within the LDAP server. This account is also different from
the special dbadmin account that gets created during an HP Vertica installation, whose privileges
are governed by the DBADMIN Role. The HP Vertica-created dbadmin is a Linux account that
owns the database catalog and storage locations and can bypass database authorization rules,
such as creating or dropping schemas, roles, and users. The MC super does not have the same
privileges as dbadmin.

See Also
l Configuring MC
l Managing MC Users
ADMIN Role (mc)
This user account is the user who can perform all administrative operations on Management
Console, including configure and restart the MC process and add, change, and remove all user
accounts. By default, MC administrators inherit the database privileges of the main database user
account used to set up the database on the MC interface. Therefore, MC administrators have
access to all MC-managed databases. Grant the ADMIN role to users you want to be MC
administrators.
The difference between this ADMIN user and the default Linux account, the MC SUPER role, is
you cannot alter or delete the MC SUPER account, and you can't grant the SUPER role to any
other MC users. You can, however, change the access level for other MC administrators, and you
can delete this user's accounts from the MC interface.
The following list highlights privileges granted to the ADMIN role:
l Modify MC settings, such as storage locations and ports, restart the MC process, and reset MC
to its original, unconfigured state
l Audit license activity and install/upgrade an HP Vertica license
l Upload a new SSL certificate
l Use LDAP for user authentication
l View the MC log, alerts and messages
l Add new users and map them to one or more HP Vertica databases by granting an MC
database-level role
l Select a database and add multiple users at once
l Manage user roles and their access to MC

l Remove users from the MC
l Monitor user activity on the MC interface
l Stop and start any MC-managed database
l Create new databases/clusters and and import existing databases/clusters into MC
l Remove databases/clusters from the MC interface
l View all databases/clusters imported into MC
About the MC Database Administrator Role
There is also an MC database administrator (ADMIN) role that controls a user's access to MC-
managed databases. The two ADMIN roles are similar, but they are not the same, and you do not
need to grant users with the ADMIN (mc) role an ADMIN (db) role because MC ADMIN users
automatically inherit all database privileges of the main database user account that was created on
or imported into MC.
The following table summarizes the primary difference between the two ADMIN roles, but see
ADMIN Role (db) for details specific to MC-managed database administrators.
MC configuration ADMIN role MC database ADMIN role
Perform all administrative operations on the MC
itself, including restarting the MC process.
Privileges extend to monitoring all MC-created and
imported databases but anything database-related
beyond that scope depends on the user's privileges
granted on the database through GRANT
statements.
Perform database-specific activities, such
as stop and start the database, and
monitor query and user activity and
resources. Other database operations
depend on that user's privileges on the
specific database. This ADMIN role
cannot configure MC.
See Also
l ADMIN Role (db)
l Managing MC Users

IT Role (mc)
MC IT users can monitor all MC-managed databases, view MC-level (non database) messages,
logs, and alerts, disable or enable user access to MC, and reset non-LDAP user passwords. You
can also assign MC IT users specific database privileges, which you do by mapping IT users to a
user on a database. In this way, the MC IT user inherits the privileges assigned to the database
user to which he/she is mapped.
About the MC IT (database) Role
There is also an IT database administrator (IT) role that controls a user's access to MC-managed
databases. If you grant an MC user both IT roles, it means the user can perform some configuration
on MC and also has access to one or more MC-managed databases. The database mapping is not
required, but it gives the IT user wider privileges.
The two IT roles are similar, but they are not the same. The following table summarizes the primary
difference between them, but see IT Role (db) for details.
MC configuration IT role MC database IT role
Monitor MC-managed
database, view non-
database messages, and
manage user access
Monitor databases on which the user has privileges, view the
database overview and activity pages, monitor the node state
view messages and mark them read/unread, view database
settings.
Can also be mapped to one or more HP Vertica databases.
See Also
l IT Role (db)
NONE Role (mc)
The default role for all newly-created users on MC is NONE, which prevents users granted this role
from configuring the MC. When you create MC users with the NONE role, you grant them an MC
database-level role. This assignment maps the MC user to a user account on a specific database
and specifies that the NONE user inherits the database user’s privileges to which he or she is
mapped.
Which database-level role you grant this user with NONE privileges—whether ADMIN (db) or IT
(db) or USER (db)—depends on the level of access you want the user to have on the MC-managed
database. Database roles have no impact on the ADMIN and IT roles at the MC configuration level.

See Also
l About MC Users
l MC Database Privileges
l ADMIN Role (db)
l IT Role (db)
l USER Role (db)
MC Database Privileges
When you create MC users, you first assign them MC configuration privileges, which controls what
they can do on the MC itself. In the same user-creation operation, you grant access to one or more
MC-managed databases. MC database access does not give the MC user privileges directly on HP
Vertica; it provides MC users varying levels of access to assigned database functionality through
the MC interface.
Assign users an MC database level through one of the following roles:
l ADMIN Role (db)—Full access to all MC-managed databases. Actual privileges ADMINs inherit
depend on the database user account used to create or import the HP Vertica database into the
MC interface.
l IT Role (db)—Can start and stop a database but cannot remove it from the MC interface or drop
it.
l USER Role (db)—Can only view database information through the database Overview and
Activities pages but is restricted from viewing more detailed data.
When you assign an MC database level to an MC user, you need to map the MC user account to a
database user account. Mapping lets the MC user inherit the privileges assigned to that database
user and ensures that the MC user cannot do or see anything that is not allowed by the privileges
set up for the user account on the server database.
Privileges assigned to the database user always supersede privileges of the MC user if there is a
conflict, such as stopping a database. When the MC user logs in to MC, using his or her MC user
name and password, MC privileges for database-related activities are compared to the user
privileges on the database itself (the account you mapped the MC user to). Only when the user has
both MC privileges and corresponding database privileges will the operations be exposed to that
user in the MC interface.
Tip: As a best practice, you should identify, in advance, the appropriate HP Vertica database
user account that has privileges and/or roles similar to one of the MC database roles.

See Creating an MC User and Mapping an MC User to a Database user's Privileges for more
information.
MC Database Privileges By Role
The following tables summarizes MC configuration-level privileges by user role. The first table
shows the default privileges, and the second table shows, for the ADMIN role only, which
operations are dependent on the database user account's privileges and/or roles itself.
Default database-level privileges ADMIN IT USER
View messages Yes Yes Yes
Delete messages and mark read/unread Yes Yes
View database Overview page Yes Yes Yes
View database Activity page Yes Yes Yes
View database grid page Yes Yes Yes
Start a database Yes
Stop a node Yes
View node state Yes Yes
View MC settings Yes Yes
Privileges governed by the HP Vertica database user account:
Database-specific privileges ADMIN
Audit license activity Yes
Install new license Yes
View WLA tuning recommendations Yes
View database query page Yes
Stop a database Yes
Rebalance a database Yes
Drop a database Yes
Start, replace, add, remove nodes Yes
Modify database settings Yes

See Also
l About MC Users
l MC Configuration Privileges
ADMIN Role (db)
ADMIN is a superuser with full privileges to monitor MC-managed database activity and
messages. Other database privileges (such as stop or drop the database) are governed by the user
account on the HP Vertica database that this ADMIN (db) user is mapped to. ADMIN is the most
permissive role and is a superset of privileges granted to the IT and USER roles.
The ADMIN user has the following database privileges by default:
l View and delete database messages
l Mark messages read or unread
l View the database overview (grid) page
l View the database activity page
l Start the database
l View database cluster node state
l View database settings
The following MC-managed database operations depend on the database user's role that you
mapped this ADMIN user to:
l View license information
l Install a new license
l View Workload Analyzer tuning recommendations
l View query activity and loads
l Stop the database
l Rebalance the database
l Add, stop, replace, or remove nodes
l Manage database settings

Note: Database access granted through Management Console never overrides roles granted
on a specific HP Vertica database.
About the ADMIN (MC configuration) Role
There is also an MC configuration administrator role that defines what the user can change on the
MC itself. The two ADMIN roles are similar, but they are not the same. Unlike the MC configuration
role of ADMIN, which can manage all MC users and all databases imported into the UI, the MC
database ADMIN role has privileges only on the databases you map this user to. The following
table summarizes the primary difference between them, but see ADMIN Role (mc) for additional
details.
MC database ADMIN role MC configuration ADMIN role
Perform database-specific activities, such
as stop and start the database, and
monitor query and user activity and
resources. Other database operations
depend on that user's privileges on the
specific database. This ADMIN role
cannot configure MC.
Perform all administrative operations on the MC
itself, including restarting the MC process.
Privileges extend to monitoring all MC-created and
imported databases but anything database-related
beyond that scope depends on the user's privileges
granted on the database through GRANT
statements.
See Also
l ADMIN Role (mc)
IT Role (db)
IT can view most details about an MC-managed database, such as messages (and mark them
read/unread), the database overall health and activity/resources, cluster and node state, and MC
settings. You grant and manage user role assignments through the MC Settings > User
management page on the MC.
About the IT (MC configuration) Role
There is also an IT role at the MC configuration access level. The two IT roles are similar, but they
are not the same. If you grant an MC user both IT roles, it means the user can perform some
configuration on MC and also has access to one or more MC-managed databases. The following
table summarizes the primary difference between them, but see IT Role (mc) for additional details.

MC database IT MC configuration IT
Monitor databases on which the user has privileges, view the
database overview and activity pages, monitor the node state
view messages and mark them read/unread, view database
settings.
Monitor MC-managed
database, view non-database
messages, and manage user
access.
See Also
l IT Role (mc)
USER Role (db)
USER has limited database privileges, such as viewing database cluster health,
activity/resources, and messages. MC users granted the USER database role might have higher
levels of permission on the MC itself, such as the IT Role (mc). Alternatively, USER users might
have no (NONE) privileges to configure MC. How you combine the two levels is up to you.
See Also
l MC Configuration Privileges
Granting Database Access to MC Users
If you did not grant an MC user a database-level role when you created the user account, this
procedure describes how to do so.
Granting the user an MC database-level role associates the MC user with a database user's
privileges and ensures that the MC user cannot do or see anything that is not allowed by the
privileges set up for the user account on the server database. When that MC user logs in to MC, his
or her MC privileges for database-related activities are compared to that user's privileges on the
database itself. Only when the user has both MC privileges and corresponding database privileges
will the operations be exposed in the MC interface. See Mapping an MC User to a Database user's
Privileges for examples.
Prerequisites
Before you grant database access to an MC user, make sure you have read the prerequisites in
Creating an MC User.

Grant a Database-Level Role to an MC user:
1. Log in to Management Console as an administrator and navigate to MC Settings > User
management.
2. Select an MC user and click Edit.
3. Verify the MC Configuration Privileges are what you want them to be. NONE is the default.
4. Next to the DB access levels section, click Add and provide the following database access
credentials:
i. Choose a database. Select a database from the list of MC-discovered (databases
that were created on or imported into the MC interface).
ii. Database username. Enter an existing database user name or, if the database is
running, click the ellipses […] to browse for a list of database users, and select a
name from the list.
iii. Database password. Enter the password to the database user account (not this
username's password).
iv. Restricted access. Chose a database level (ADMIN, IT, or USER) for this user.
v. Click OK to close the Add permissions dialog box.
5. Optionally change the user's Status (enabled is the default).
6. Click Save.
See Mapping an MC User to a Database user's Privileges for a graphical illustration of how easy it
is to map the two user accounts.
How MC Validates New Users
After you click OK to close the Add permissions dialog box, MC tries to validate the database
username and password entered against the selected MC-managed database or against your
organization's LDAP directory. If the credentials are found to be invalid, you are asked to re-enter
them.
If the database is not available at the time you create the new user, MC saves the
username/password and prompts for validation when the user accesses the Database and Clusters
page later.

See Also
l About MC Users
Mapping an MC User to a Database user's Privileges
Database mapping occurs when you link one or more MC user accounts to a database user
account. After you map users, the MC user inherits privileges granted to the database user, up to
the limitations of the user's database access level on MC.
This topic presents the same mapping information as in Granting Database Access to MC Users
but with graphics. See also MC Database Privileges for an introduction to database mapping
through the MC interface and details about the different database access roles you can grant to an
MC user.
How to Map an MC User to a Database User
The following series of images shows you how easy it is to map an MC user to a database user
account from the MC Settings > User management page.
You view the list of MC users so you can see who has what privileges. You notice that user alice
has no database privileges, which would appear under the Resources column.
To give alice database privileges, click to highlight her MC username, click Edit, and the Edit
existing user page displays with no resources (databases) assigned to MC user alice.

Click Add, and when the Add permissions dialog box opens, choose a database from the menu.
In the same Add permissions dialog box, after you select a database, you need to enter the user
name of the database user account that you want to map alice to. To see a list of database user
names, click the ellipses […] and select a name from the list. In this example, you already know
that database user carol has privileges to stop and start the database, but the alice database
account can only view certain tables. On MC, you want alice to have similar privileges to carol, so
you map MC alice to database carol.

After you click OK, remember to assign MC user alice an MC database level. In this case, choose
IT, a role that has permissions to start and stop the selected database.
Enter the database password, click OK , close the confirmation dialog box, and click Save.
That's it.

What If You Map the Wrong Permissions
In the following mapping example, if you had granted alice MC database access level of ADMIN but
mapped her to a database account with only USER-type privileges, Alice's access to that database
would be limited to USER privileges. This is by design. When Alice logs in to MC using her own
user name and password, MC privileges for her ADMIN-assigned role are compared to the user
privileges on the database itself. Only when the user has both MC privileges and corresponding
database privileges will the appropriate operations be exposed in the MC interface.
Adding Multiple MC Users to a Database
In addition to creating or editing MC users and mapping them to a selected database, you can also
select a database and add users to that database on the MC Settings > Resouce access page.
Choose a database from the list, click Add, and select an MC user name, one at a time. Map the
MC user to the database user account, and then grant each MC user the database level you want
him or her to have.
It is possible you will grant the same database access to several MC users.

See Granting Database Access to MC Users and Mapping an MC User to a Database user's
Privileges for details.
How to Find Out an MC user's Database Role
On the User management page, the Resources column lists all of the databases a user is mapped
to. It does not, however, display the user's database access level (role).

You can retrieve that information by highlighting a user and clicking Edit. In the dialog box that
opens (shown in example below), Bob's role on the mcdb database is ADMIN. You can change
Bob's role from this dialog box by clicking Edit and assigning a different database-access role.
Adding Multiple Users to MC-managed Databases
If you are administering one or more MC-managed databases, and several MC users need access
to it, you have two options on the MC Settings page:
l From the User management option, select each user and grant database access, one user at a
time
l From the Resource access option, select a database first and add users to it
This procedure describes how to add several users to one database at once. If you want to add
users one at a time, see Creating an MC User.
Before You Start
Read the prerequisites in Creating an MC User.

How to Add Multiple Users to a Database
1. Log in to MC as an administrator and navigate to MC Settings > Resource access.
2. Choose a database from the list of discovered databases. Selecting the database populates a
table with users who already have privileges on the selected database.
3. To add new users, click Add and select the MC username you want to add to the database
from the drop-down list.
4. Enter an existing Database username on the selected database or click the ellipses button
[…] to browse for names. (This is the database account you want to map the selected user to.)
5. Enter the database password (not this username's password).
Note: The database password is generally the dbadmin superuser's password.
6. Choose a database-access role (ADMIN or IT or USER) for this user.
7. Click OK to close the Add access to resource dialog box.
8. Perform steps 3-7 for each user you want to add to the selected database, and then click Save.
See Also
l About MC Users
MC Mapping Matrix
The following table shows the three different MC configuration roles, ADMIN, IT, and NONE,
combined with the type of privileges a user granted that role inherits when mapped to a specific
database-level role.

MC
configuration
level
MC
database
level The combination lets the user ...
ADMIN Role
(mc)
All
(implicit)
l Perform all administrative operations on Management Console,
including configure and restart the MC process.
l Maximum access to all databases created and/or imported into
the MC interface—governed by the privileges associated with
the database user account used to set up the database on the
MC.
IT Role (mc) ADMIN
Role (db)
l Monitor MC-managed database activity.
l View non-database messages.
l Manage user access (enable/disable).
l Monitor MC-managed database activity and messages.
l Other database privileges (such as stop or drop the database)
are governed by the mapped user account on the database itself.
l Automatically inherits all privileges granted to the NONE:IT
combination.
IT Role (mc) IT Role
(db)
l Monitor MC-managed database activity.
l Manage user access (edit/enable/disable).
l On databases where granted privileges, monitor database
overview and activity, monitor node state, view messages and
mark them read/unread, view database settings
l Automatically inherits all privileges granted to the IT:USER
combination.
IT Role (mc) USER
Role (db)
l Monitor MC-managed database activity
l Manage user access (enable/disable).
l Viewing database cluster health, activity/resources, and
messages and alerts.

MC
configuration
level
MC
database
level The combination lets the user ...
NONE Role
(mc)
ADMIN
Role (db)
l No privileges to monitor/modify anything related to the MC itself.
l Monitor MC-managed database activity, node state, and
messages.
l Other database privileges (such as stop or drop the database)
are governed by the mapped user account on the database itself.
l Automatically inherits all privileges granted to the NONE:IT
combination.
NONE Role
(mc)
IT Role
(db)
l No privileges to monitor/modify anything related to the MC itself
l Monitor MC-managed database activity, node state, and
settings.
l View the database overview and activity pages.
l View messages and mark them read/unread.
l Automatically inherits all privileges granted to the NONE:USER
combination.
NONE Role
(mc)
USER
Role (db)
l No privileges to monitor/modify anything related to the MC itself.
l View database cluster health, activity/resources, and messages
and alerts.

HP Vertica provides a set of tools that allows you to perform administrative tasks quickly and
easily. Most of the database administration tasks in HP Vertica can be done using the
Always run the Administration Tools using the Database Administrator account on the
Administration host, if possible. Make sure that no other Administration Tools processes are
running.
If the Administration host is unresponsive, run the Administration Tools on a different node in the
cluster. That node permanently takes over the role of Administration host.
A man page is available for admintools. If you are running as the dbadmin user, simply type: man
admintools. If you are running as a different user, type: man -M /opt/vertica/man admintools.
Running the Administration Tools
At the Linux command line:
$ /opt/vertica/bin/admintools [ -t | --tool ] toolname [ options ]
toolname Is one of the tools described in the Administration Tools Reference.
options -h--help Shows a brief help message and exits.
-a--help_all Lists all command-line subcommands and options as described
in Writing Administration Tools Scripts.
If you omit toolname and options parameters, the Main Menu dialog box appears inside your
console or terminal window with a dark blue background and a title on top. The screen captures
used in this documentation set are cropped down to the dialog box itself, as shown below.

If you are unfamiliar with this type of interface, read Using the Administration Tools Interface before
you do anything else.
First Time Only
The first time you log in as the Database Administrator and run the Administration Tools, the user
interface displays.
1. In the EULA (end-user license agreement) window, type accept to proceed.
A window displays, requesting the location of the license key file you downloaded from the HP
Web site. The default path is /tmp/vlicense.dat.
2. Type the absolute path to your license key (for example, /tmp/vlicense.dat) and click OK.
Between Dialogs
While the Administration Tools are working, you see the command line processing in a window
similar to the one shown below. Do not interrupt the processing.

Using the Administration Tools Interface
The HP Vertica Administration Tools are implemented using Dialog, a graphical user interface that
works in terminal (character-cell) windows.The interface responds to mouse clicks in some terminal
windows, particularly local Linux windows, but you might find that it responds only to keystrokes.
Thus, this section describes how to use the Administration Tools using only keystrokes.
Note: This section does not describe every possible combination of keystrokes you can use to
accomplish a particular task. Feel free to experiment and to use whatever keystrokes you
prefer.
Enter [Return]
In all dialogs, when you are ready to run a command, select a file, or cancel the dialog, press the
Enter key. The command descriptions in this section do not explicitly instruct you to press Enter.
OK - Cancel - Help
The OK, Cancel, and Help buttons are
present on virtually all dialogs. Use the
tab, space bar, or right and left arrow
keys to select an option and then press
Enter. The same keystrokes apply to
dialogs that present a choice of Yes or
No.

Menu Dialogs
Some dialogs require that you
choose one command from a
menu. Type the alphanumeric
character shown or use the up
and down arrow keys to
select a command and then
press Enter.
List Dialogs
In a list dialog, use the up and down arrow keys
to highlight items, then use the space bar to
select the items (which marks them with an X).
Some list dialogs allow you to select multiple
items. When you have finished selecting items,
press Enter.
Form Dialogs
In a form dialog (also referred to as a dialog box), use the tab key to cycle between OK, Cancel,
Help, and the form field area. Once the cursor is in the form field area, use the up and down arrow
keys to select an individual field (highlighted) and enter information. When you have finished
entering information in all fields, press Enter.

Help Buttons
Online help is provided in the form of text dialogs. If you have trouble viewing the help, see Notes
for Remote Terminal Users in this document.
K-Safety Support in Administration Tools
The Administration Tools allow certain operations on a K-Safe database, even if some nodes are
unresponsive.
The database must have been marked as K-Safe using the MARK_DESIGN_KSAFE function.
The following management functions within the Administration Tools are operational when some
nodes are unresponsive.
Note: HP Vertica users can perform much of the below functionality using the Management
Console interface. See Management Console and Administration Tools for details.
l View database cluster state
l Connect to database
l Start database (including manual recovery)
l Stop database
l Replace node (assuming node that is down is the one being replaced)
l View database parameters
l Upgrade license key
The following operations work with unresponsive nodes; however, you might have to repeat the
operation on the failed nodes after they are back in operation:
l Edit authentication
l Distribute config files
l Install external procedure
l (Setting) database parameters
The following management functions within the Administration Tools require that all nodes be UP in
order to be operational:

l Create database
l Run the Database Designer
l Drop database
l Set restart policy
l Roll back database to Last Good Epoch
Notes for Remote Terminal Users
The appearance of the graphical interface depends on the color and font settings used by your
terminal window. The screen captures in this document were made using the default color and font
settings in a PuTTy terminal application running on a Windows platform.
Note: If you are using a remote terminal application, such as PuTTy or a Cygwin bash shell,
make sure your window is at least 81 characters wide and 23 characters high.
If you are using PuTTY, you can make the Administration Tools look like the screen captures in this
document:
1. In a PuTTY window, right click the title area and select Change Settings.
2. Create or load a saved session.
3. In the Category dialog, click Window > Appearance.
4. In the Font settings, click the Change... button.
5. Select Font: Courier New: Regular Size: 10
6. Click Apply.
Repeat these steps for each existing session that you use to run the Administration Tools.
You can also change the translation to support UTF-8:
1. In a PuTTY window, right click the title area and select Change Settings.
2. Create or load a saved session.
3. In the Category dialog, click Window > Translation.
4. In the "Received data assumed to be in which character set" drop-down menu, select UTF-8.
5. Click Apply.

Using the Administration Tools Help
The Help on Using the Administration Tools command displays a help screen about using the
Most of the online help in the Administration Tools is context-sensitive. For example, if you use
up/down arrows to select a command, press tab to move to the Help button, and press return, you
get help on the selected command.
In a Menu Dialog
1. Use the up and down arrow keys to choose the command for which you want help.
2. Use the Tab key to move the cursor to the Help button.
3. Press Enter (Return).

In a Dialog Box
1. Use the up and down arrow keys to choose the field on which you want help.
2. Use the Tab key to move the cursor to the Help button.
3. Press Enter (Return).
Scrolling
Some help files are too long for a single screen. Use the up and down arrow keys to scroll through
the text.
Password Authentication
When you create a new user with the CREATE USER command, you can configure the password
or leave it empty. You cannot bypass the password if the user was created with a password
configured. You can change a user's password using the ALTER USER command.
See Implementing Security for more information about controlling database authorization through
passwords.
Tip: Unless the database is used solely for evaluation purposes, HP recommends that all
database users have encrypted passwords.

Distributing Changes Made to the Administration
Tools Metadata
Administration Tools-specific metadata for a failed node will fall out of synchronization with other
cluster nodes if you make the following changes:
l Modify the restart policy
l Add one or more nodes
l Drop one or more nodes.
When you restore the node to the database cluster, you can use the Administration Tools to update
the node with the latest Administration Tools metadata:
1. Log on to a host that contains the metadata you want to transfer and start the Administration
Tools. (See Using the Administration Tools.)
2. On the Main Menu in the Administration Tools, select Configuration Menu and click OK.
3. On the Configuration Menu, select Distribute Config Files and click OK.
4. Select AdminTools Meta-Data.
The Administration Tools metadata is distributed to every host in the cluster.
Administration Tools and Management Console
You can perform most database administration tasks using the Administration Tools, but you have
the additional option of using the more visual and dynamic Management Console.
The following table compares the functionality available in both interfaces. Continue to use
Administration Tools and the command line to perform actions not yet supported by Management
Console.
HP Vertica Functionality
Management
Console
Administration
Tools
Use a Web interface for the administration of HP
Vertica
Yes No
Manage/monitor one or more databases and clusters
through a UI
Yes No
Manage multiple databases on different clusters Yes Yes

Management
Console
Administration
Tools
View database cluster state Yes Yes
View multiple cluster states Yes No
Connect to the database Yes Yes
Start/stop an existing database Yes Yes
Stop/restart HP Vertica on host Yes Yes
Kill an HP Vertica process on host No Yes
Create one or more databases Yes Yes
View databases Yes Yes
Remove a database from view Yes No
Drop a database Yes Yes
Create a physical schema design (Database
Designer)
Yes Yes
Modify a physical schema design (Database
Designer)
Yes Yes
Set the restart policy No Yes
Roll back database to the Last Good Epoch No Yes
Manage clusters (add, replace, remove hosts) Yes Yes
Rebalance data across nodes in the database Yes Yes
Configure database parameters dynamically Yes No
View database activity in relation to physical
resource usage
Yes No
View alerts and messages dynamically Yes No
View current database size usage statistics Yes No
View database size usage statistics over time Yes No
Upload/upgrade a license file Yes Yes
Warn users about license violation on login Yes Yes
Create, edit, manage, and delete users/user
information
Yes No

Management
Console
Administration
Tools
Use LDAP to authenticate users with company
credentials
Yes Yes
Manage user access to MC through roles Yes No
Map Management Console users to an HP Vertica
database
Yes No
Enable and disable user access to MC and/or the
database
Yes No
Audit user activity on database Yes No
Hide features unavailable to a user through roles Yes No
Generate new user (non-LDAP) passwords Yes No
Management Console Provides some, but Not All of the Functionality Provided By the
Administration Tools. MC Also Provides Functionality Not Available in the Administration Tools.
See Also
l Monitoring HP Vertica Using Management Console

Administration Tools Reference
Viewing Database Cluster State
This tool shows the current state of the nodes in the database.
1. On the Main Menu, select View Database Cluster State, and click OK.
The normal state of a running database is ALL UP. The normal state of a stopped database is
ALL DOWN.
2. If some hosts are UP and some DOWN, restart the specific host that is down using Restart
HP Vertica on Host from the Administration Tools, or you can start the database as described
in Starting and Stopping the Database (unless you have a known node failure and want to
continue in that state.)

Nodes shown as INITIALIZING or RECOVERING indicate that Failure Recovery is in
progress.
Nodes in other states (such as NEEDS_CATCHUP) are transitional and can be ignored unless
they persist.
See Also
l Advanced Menu Options
l Startup Problems
l Shutdown Problems
Connecting to the Database
This tool connects to a running database with vsql. You can use the Administration Tools to
connect to a database from any node within the database while logged in to any user account with
access privileges. You cannot use the Administration Tools to connect from a host that is not a
database node. To connect from other hosts, run vsql as described in Connecting From the
Command Line in the Programmer's Guide.
1. On the Main Menu, click Connect to Database, and then click OK.
2. Supply the database password if asked:
Password:
When you create a new user with the CREATE USER command, you can configure the
password or leave it empty. You cannot bypass the password if the user was created with a
password configured. You can change a user's password using the ALTER USER command.
The Administration Tools connect to the database and transfer control to vsql.
q to quit
=>
See Using vsql for more information.
Note: After entering your password, you may be prompted to change your password if it has
expired. See Implementing Client Authentication for details of password security.

See Also
l CREATE USER
l ALTER USER
Starting the Database
Starting a K-safe database is supported when up to K nodes are down or unavailable. See Failure
Recovery for a discussion on various scenarios encountered during database shutdown, startup
and recovery.
You can start a database using any of these methods:
l The Management Console
l The Administration Tools interface
l The command line
Starting the Database Using MC
On MC's Databases and Clusters page, click a database to select it, and click Start within the
dialog box that displays.
Starting the Database Using the Administration Tools
1. Open the Administration Tools and select View Database Cluster State to make sure that all
nodes are down and that no other database is running. If all nodes are not down, see Shutdown
Problems.
2. Open the Administration Tools. See Using the Administration Tools for information about
accessing the Administration Tools.
3. On the Main Menu, select Start Database,and then select OK.
4. Select the database to start, and then click OK.
Caution: HP strongly recommends that you start only one database at a time. If you start
more than one database at any time, the results are unpredictable. Users could encounter
resource conflicts or perform operations in the wrong database.
5. Enter the database password, and then click OK.
6. When prompted that the database started successfully, click OK.
7. Check the log files to make sure that no startup problems occurred.

If the database does not start successfully, see Startup Problems.
Starting the Database At the Command Line
If you use the admintools command line option, start_db(), to start a database, the -p password
argument is only required during database creation, when you install a new license.
As long as the license is valid, the -p argument is not required to start the database and is silently
ignored, even if you introduce a typo or prematurely press the enter key. This is by design, as the
database can only be started by the user who (as part of the verticadba UNIX user group) initially
created the database or who has root or su privileges.
If the license were to become invalid, HP Vertica would use the -p password argument to attempt
to upgrade the license with the license file stored in /opt/vertica/config/share/license.key.
Following is an example of using start_db on a standalone node:
[dbadmin@localhost ~]$ /opt/vertica/bin/admintools -t start_db -d VMartInfo: no password
specified, using none
Node Status: v_vmart_node0001: (DOWN)
Node Status: v_vmart_node0001: (UP)
Database VMart started successfully
Stopping a Database
To stop a running database, take these steps:
1. Use View Database Cluster State to make sure that all nodes are up. If all nodes are not up,
see Restarting HP Vertica on Host.
2. On the Main Menu, select Stop Database, and click OK.
3. Select the database you want to stop, and click OK.
4. Enter the password if asked, and click OK.
5. A message confirms that the database has been successfully stopped. Click OK.
Error
If users are connected during shutdown operations, you cannot stop a database. The
Administration Tools displays a message similar to the following:

Uable to shutdown database VMart.
Error: NOTICE 2519: Cannot shut down while users are connected
This may be because other users still have active sessions
or the Management Console is still active. You can force
the sessopms tp terminate and shutdown the database, but
any work done in the other sessions may be lost.
Do you want to try a forced shutdown?
Description
The message indicates that there are active user connections (sessions). For example, Database
Designer may be building or deploying a design. See Managing Sessions in the Administrator's
Guide for more information.
Resolution
The following examples were taken from a different database.
1. To see which users are connected, connect to the database and query the SESSIONS system
table described in the SQL Reference Manual. For example:
=> pset expanded
Expanded display is on.
=> SELECT * FROM SESSIONS;
-[ RECORD 1 ] node_name | site01
user_name | dbadmin
client_hostname | 127.0.0.1:57141
login_timestamp | 2009-06-07 14:41:26
session_id | rhel5-1-30361:0xd7e3e:994462853
transaction_start | 2009-06-07 14:48:54
transaction_id | 45035996273741092
transaction_description | user dbadmin (select * from session;)
statement_start | 2009-06-07 14:53:31
statement_id | 0
last_statement_duration | 1
current_statement | select * from sessions;
ssl_state | None
authentication_method | Trust
-[ RECORD 2 ]
node_name | site01
user_name | dbadmin
login_timestamp | 2009-06-07 14:52:55
session_id | rhel5-1-30361:0xd83ac:1017578618
transaction_id | 45035996273741096

transaction_description | user dbadmin (COPY ClickStream_Fact FROM '/data/clickstream/
1g/ClickStream_Fact.tbl' DELIMITER '|' NULL 'n' DIRECT;)
statement_start | 2009-06-07 14:53:26
statement_id | 17179869528
last_statement_duration | 0
current_statement | COPY ClickStream_Fact FROM '/data/clickstream/1g/ClickStrea
m_Fact.tbl' DELIMITER '|' NULL 'n' DIRECT;
ssl_state | None
The current statement column of Record 1 shows that session is the one you are using to query the
system table. Record 2 shows the session that must end before the database can be shut down.
1. If a statement is running in a session, that session must be closed. Use the function CLOSE_
SESSION or CLOSE_ALL_SESSIONS described in the SQL Reference Manual.
Note: CLOSE_ALL_SESSIONS is the more common command because it forcefully
disconnects all user sessions.
=> SELECT * FROM SESSIONS; -[ RECORD 1 ]
node_name | site01
user_name | dbadmin
client_pid | 17838
login_timestamp | 2009-06-07 14:41:26
client_label |
transaction_id | 45035996273741092
transaction_description | user dbadmin (select * from sessions;)
statement_start | 2009-06-07 14:53:31
statement_id | 0
last_statement_duration_us | 1
ssl_state | None
-[ RECORD 2 ]
node_name | site01
user_name | dbadmin
client_pid | 17839
login_timestamp | 2009-06-07 14:52:55
session_id | rhel5-1-30361:0xd83ac:1017578618
client_label |
transaction_id | 45035996273741096
transaction_description | user dbadmin (COPY ClickStream_Fact FROM
'/data/clickstream/1g/ClickStream_Fact.tbl'
DELIMITER '|' NULL 'n' DIRECT;)
statement_start | 2009-06-07 14:53:26
statement_id | 17179869528

current_statement | COPY ClickStream_Fact FROM
'/data/clickstream/1g/ClickStream_Fact.tbl'
DELIMITER '|' NULL 'n' DIRECT;
ssl_state | None
=> SELECT CLOSE_SESSION('rhel5-1-30361:0xd83ac:1017578618');
-[ RECORD 1 ]
close_session | Session close command sent. Check sessions for progress.
=> SELECT * FROM SESSIONS; -[ RECORD 1 ]
node_name | site01
user_name | dbadmin
client_pid | 17838
login_timestamp | 2009-06-07 14:41:26
client_label |
transaction_id | 45035996273741092
transaction_description | user dbadmin (select * from sessions;)
statement_start | 2009-06-07 14:54:11
statement_id | 0
ssl_state | None
2. Query the SESSIONS table again. For example, two columns have changed:
n stmtid is now 0, indicating that no statement is in progress.
n stmt_duration now indicates how long the statement ran in milliseconds before being
interrupted.
The SELECT statements that call these functions return when the interrupt or close message
has been delivered to all nodes, not after the interrupt or close has completed.
3. Query the SESSIONS table again. When the session no longer appears in the SESSION table,
disconnect and run the Stop Database command.
Controlling Sessions
The database administrator must be able to disallow new incoming connections in order to shut
down the database. On a busy system, database shutdown is prevented if new sessions connect
after the CLOSE_SESSION or CLOSE_ALL_SESSIONS() command is invoked—and before the
database actually shuts down.
One option is for the administrator to issue the SHUTDOWN('true') command, which forces the
database to shut down and disallow new connections. See SHUTDOWN in the SQL Reference
Manual.

Another option is to modify the MaxClientSessions parameter from its original value to 0, in order
to prevent new non-dbadmin users from connecting to the database.
1. Determine the original value for the MaxClientSessions parameter by querying the V_
MONITOR.CONFIGURATIONS_PARAMETERS system table:
=> SELECT CURRENT_VALUE FROM CONFIGURATION_PARAMETERS WHERE parameter_name='MaxClient
Sessions';
CURRENT_VALUE
---------------
50
(1 row)
2. Set the MaxClientSessions parameter to 0 to prevent new non-dbadmin connections:
=> SELECT SET_CONFIG_PARAMETER('MaxClientSessions', 0);
Note: The previous command allows up to five administrators to log in.
3. Issue the CLOSE_ALL_SESSIONS() command to remove existing sessions:
=> SELECT CLOSE_ALL_SESSIONS();
4. Query the SESSIONS table:
=> SELECT * FROM SESSIONS;
When the session no longer appears in the SESSIONS table, disconnect and run the Stop
Database command.
6. Restore the MaxClientSessions parameter to its original value:
=> SELECT SET_CONFIG_PARAMETER('MaxClientSessions', 50);
Notes
If the database does not stop successfully, see Shutdown Problems.
You cannot stop databases if your password has expired. The Administration Tools displays an
error message if you attempt to do so. You need to change your expired password using vsql before
you can shut down a database.

Restarting HP Vertica on Host
This tool restarts the HP Vertica process one or more nodes in a running database. Use this tool
when a cluster host reboots while the database is running. The spread daemon starts automatically
but the HP Vertica process does not, thus the node does not automatically rejoin the cluster.
1. On the Main Menu, select View Database Cluster State, and click OK.
2. If one or more nodes are down, select Restart HP Vertica on Host, and click OK.
3. Select the database that contains the host that you want to restart, and click OK.
4. Select the Host that you want to restart, and click OK.
5. Select View Database Cluster State again to make sure that all nodes are up.

Configuration Menu Item
The Configuration Menu allows you to:
l Create, drop, and view databases
l Use the Database Designer to create or modify a physical schema design
1. On the Main Menu, click Configuration Menu, and then click OK.
Creating a Database
1. On the Configuration Menu, click Create Database and then click OK.
2. Enter the name of the database and an optional comment. Click OK.
3. Enter a password.
If you do not enter a password, you are prompted to indicate whether you want to enter a
password. Click Yes to enter a password or No to create a database without a superuser
password.
Caution: If you do not enter a password at this point, superuser password is set to empty.
Unless the database is for evaluation or academic purposes, HP strongly recommends
that you enter a superuser password.
4. If you entered a password, enter the password again.
5. Select the hosts to include in the database. The hosts in this list are the ones that were
specified at installation time (install_vertica -s).

6. Specify the directories in which to store the catalog and data files.
Note: Catalog and data paths must contain only alphanumeric characters and cannot
have leading space characters. Failure to comply with these restrictions could result in
database creation failure.
Note: : Do not use a shared directory for more than one node. Data and catalog directories
must be distinct for each node. Multiple nodes must not be allowed to write to the same
data or catalog directory.
7. Check the current database definition for correctness, and click Yes to proceed.

8. A message indicates that you have successfully created a database. Click OK.
Note: If you get an error message, see Startup Problems
Dropping a Database
This tool drops an existing database. Only the Database Administrator is allowed to drop a
database.
1. Stop the database as described in Stopping a Database.
2. On the Configuration Menu, click Drop Database and then click OK.
3. Select the database to drop and click OK.
4. Click Yes to confirm that you want to drop the database.
5. Type yes and click OK to reconfirm that you really want to drop the database.
6. A message indicates that you have successfully dropped the database. Click OK.
Notes
In addition to dropping the database, HP Vertica automatically drops the node definitions that refer
to the database unless:
l Another database uses a node definition. If another database refers to any of these node
definitions, none of the node definitions are dropped.

l A node definition is the only node defined for the host. (HP Vertica uses node definitions to
locate hosts that are available for database creation, so removing the only node defined for a
host would make the host unavailable for new databases.)
Viewing a Database
This tool displays the characteristics of an existing database.
1. On the Configuration Menu, select View Database and click OK.
2. Select the database to view.
3. HP Vertica displays the following information about the database:
n The name of the database.
n The name and location of the log file for the database.
n The hosts within the database cluster.
n The value of the restart policy setting.
Note: This setting determines whether nodes within a K-Safe database are restarted when
they are rebooted. See Setting the Restart Policy.
n The database port.
n The name and location of the catalog directory.
Setting the Restart Policy
The Restart Policy enables you to determine whether or not nodes in a K-Safe database are
automatically restarted when they are rebooted. Since this feature does not automatically restart
nodes if the entire database is DOWN, it is not useful for databases that are not K-Safe.
To set the Restart Policy for a database:

1. Open the Administration Tools.
2. On the Main Menu, select Configuration Menu, and click OK.
3. In the Configuration Menu, select Set Restart Policy, and click OK.
4. Select the database for which you want to set the Restart Policy, and click OK.
5. Select one of the following policies for the database:
n Never — Nodes are never restarted automatically.
n K-Safe — Nodes are automatically restarted if the database cluster is still UP. This is the
default setting.
n Always — Node on a single node database is restarted automatically.
Note: Always does not work if a single node database was not shutdown cleanly or
crashed.
6. Click OK.
Best Practice for Restoring Failed Hardware
Following this procedure will prevent HP Vertica from misdiagnosing missing disk or bad mounts as
data corruptions, which would result in a time-consuming, full-node recovery.
If a server fails due to hardware issues, for example a bad disk or a failed controller, upon repairing
the hardware:
1. Reboot the machine into runlevel 1, which is a root and console-only mode.
Runlevel 1 prevents network connectivity and keeps HP Vertica from attempting to reconnect
to the cluster.
2. In runlevel 1, validate that the hardware has been repaired, the controllers are online, and any
RAID recover is able to proceed.
Note: You do not need to initialize RAID recover in runlevel 1; simply validate that it can
recover.
3. Once the hardware is confirmed consistent, only then reboot to runlevel 3 or higher.
At this point, the network activates, and HP Vertica rejoins the cluster and automatically recovers
any missing data. Note that, on a single-node database, if any files that were associated with a
projection have been deleted or corrupted, HP Vertica will delete all files associated with that
projection, which could result in data loss.

Installing External Procedure Executable Files
1. Run the Administration Tools.
$ /opt/vertica/bin/adminTools
2. On the AdminTools Main Menu, click Configuration Menu, and then click OK.
3. On the Configuration Menu, click Install External Procedure and then click OK.
4. Select the database on which you want to install the external procedure.
5. Either select the file to install or manually type the complete file path, and then click OK.
6. If you are not the superuser, you are prompted to enter your password and click OK.
The Administration Tools automatically create the <database_catalog_path>/procedures
directory on each node in the database and installs the external procedure in these directories
for you.
7. Click OK in the dialog that indicates that the installation was successful.

Advanced Menu Options
This Advanced Menu provides interactive recovery and repair commands.
1. On the Main Menu, click Advanced Menu and then OK.
Rolling Back Database to the Last Good Epoch
HP Vertica provides the ability to roll the entire database back to a specific epoch primarily to
assist in the correction of human errors during data loads or other accidental corruptions. For
example, suppose that you have been performing a bulk load and the cluster went down during a
particular COPY command. You might want to discard all epochs back to the point at which the
previous COPY command committed and run the one that did not finish again. You can determine
that point by examining the log files (see Monitoring the Log Files).
1. On the Advanced Menu, select Roll Back Database to Last Good Epoch.
2. Select the database to roll back. The database must be stopped.
3. Accept the suggested restart epoch or specify a different one.
4. Confirm that you want to discard the changes after the specified epoch.
The database restarts successfully.
Important note:
In HP Vertica 4.1, the default for the HistoryRetentionTime configuration parameter changed to
0, which means that HP Vertica only keeps historical data when nodes are down. This new setting
effectively prevents the use of the Administration Tools 'Roll Back Database to Last Good
Epoch' option because the AHM remains close to the current epoch and a rollback is not permitted
to an epoch prior to the AHM. If you rely on the Roll Back option to remove recently loaded data,
consider setting a day-wide window for removing loaded data; for example:

=> SELECT SET_CONFIG_PARAMETER ('HistoryRetentionTime', '86400');
Stopping HP Vertica on Host
This command attempts to gracefully shut down the HP Vertica process on a single node.
Caution: Do not use this command if you are intending to shut down the entire cluster. Use
Stop Database instead, which performs a clean shutdown to minimize data loss.
1. On the Advanced Menu, select Stop HP Vertica on Host and click OK.
2. Select the hosts to stop.
3. Confirm that you want to stop the hosts.
If the command succeeds View Database Cluster State shows that the selected hosts are
DOWN.

If the command fails to stop any selected nodes, proceed to Killing HP Vertica Process on
Host.
Killing the HP Vertica Process on Host
This command sends a kill signal to the HP Vertica process on a node.
Caution: Do not use this command unless you have already tried Stop Database and Stop HP
Vertica on Node and both were unsuccessful.
1. On the Advanced menu, select Kill HP Vertica Process on Host and click OK.
2. Select the hosts on which to kills the HP Vertica process.
3. Confirm that you want to stop the processes.

4. If the command succeeds View Database Cluster State shows that the selected hosts are
DOWN.
5. If the command fails to stop any selected processes, see Shutdown Problems.
Upgrading an Enterprise or Evaluation License Key
The following steps are for HP Vertica Enterprise Edition or evaluation licensed users only. This
command copies a license key file into the database. See Managing Licenses for more information.
1. On the Advanced menu select Upgrade License Key and click OK.
2. Select the database for which to upgrade the license key.
3. Enter the absolute pathname of your downloaded license key file (for example,
/tmp/vlicense.dat) and click OK.
4. Click OK when you see a message that indicates that the upgrade succeeded.
Note: If you are using HP Vertica Community Edition, follow the instructions in HP Vertica
License Renewals or Upgrades for instructions to upgrade to an HP Vertica Enterprise Edition
or evaluation license key.

Managing Clusters
Cluster Management lets you add, replace, or remove hosts from a database cluster. These
processes are usually part of a larger process of adding, removing, or replacing a database node.
Note: View the database state to verify that it is running. See View Database Cluster State. If
the database isn't running, restart it. See Starting the Database.
Using Cluster Management
To use Cluster Management:
1. From the Main Menu, select Advanced Menu, and then click OK.
2. In the Advanced Menu, select Cluster Management, and then click OK.
3. Select one of the following, and then click OK.
n Add Hosts to Database: See Adding Hosts to a Database.
n Re-balance Data: See Rebalancing Data.
n Replace Host: See Replacing Hosts.
n Remove Host from Database: See Removing Hosts from a Database.
The Help Using the Administration Tools command displays a help screen about using the
Most of the online help in the Administration Tools is context-sensitive. For example, if you up the
use up/down arrows to select a command, press tab to move to the Help button, and press return,
you get help on the selected command.
Administration Tools Metadata
The Administration Tools configuration data (metadata) contains information that databases need to
start, such as the hostname/IP address of each participating host in the database cluster.
To facilitate hostname resolution within the Administration Tools, at the command line, and inside
the installation utility, HP Vertica enforces all hostnames you provide through the Administration
Tools to use IP addresses:

l During installation
HP Vertica immediately converts any hostname you provide through command line options --
hosts, -add-hosts or --remove-hosts to its IP address equivalent.
n If you provide a hostname during installation that resolves to multiple IP addresses (such as
in multi-homed systems), the installer prompts you to choose one IP address.
n HP Vertica retains the name you give for messages and prompts only; internally it stores
these hostnames as IP addresses.
l Within the Administration Tools
All hosts are in IP form to allow for direct comparisons (for example db = database =
database.verticacorp.com).
l At the command line
HP Vertica converts any hostname value to an IP address that it uses to look up the host in the
configuration metadata. If a host has multiple IP addresses that are resolved, HP Vertica tests
each IP address to see if it resides in the metadata, choosing the first match. No match
indicates that the host is not part of the database cluster.
Metadata is more portable because HP Vertica does not require the names of the hosts in the
cluster to be exactly the same when you install or upgrade your database.
Writing Administration Tools Scripts
You can invoke most of the Administration Tools from the command line or a shell script.
Syntax
> /opt/vertica/bin/admintools [ -t | --tool ] toolname [ options ]
Note: For convenience, you can add /opt/vertica/bin to your search path.
Parameters
[ --tool | -t ] Instructs the Administration Tools to run the specified tool.
Note: If you use the --no-log option to run the Administration Tools silently, -
-no-log must appear before the --tool option.
toolname Name of one of the tools described in the help output below.
[ options ] -h--help Shows a brief help message and exits.
-a--help_all Lists all command-line subcommands and
options as shown in the Tools section below.

Tools
To return a description of the tools you can access, issue the following command at a command
prompt:
$ admintools -a
Usage:
adminTools [-t | --tool] toolName [options]
Valid tools are:
command_host
config_nodes
connect_db
create_db
database_parameters
db_add_node
db_remove_node
db_replace_node
db_status
drop_db
edit_auth
host_to_node
install_package
install_procedure
kill_host
kill_node
list_allnodes
list_db
list_host
list_node
list_packages
logrotate
node_map
rebalance_data
restart_db
restart_node
return_epoch
set_restart_policy
show_active_db
start_db
stop_db
stop_host
stop_node
uninstall_package
upgrade_license_key
view_cluster
-------------------------------------------------------------------------
Usage: command_host [options]
Options:
-h, --help show this help message and exit

-c CMD, --command=CMD
Command to run
-------------------------------------------------------------------------
Usage: config_nodes [options]
Options:
-f NODEHOSTFILE, --file=NODEHOSTFILE
File containing list of nodes, hostnames, catalog
path, and datapath (node<whitespace>host<whitespace>ca
talogPath<whitespace>dataPath one per line)
-c, --check Check all nodes to make sure they can interconnect
-s SKIPANALYZENODE, --skipanalyzenode=SKIPANALYZENODE
skipanalyzenode
-------------------------------------------------------------------------
Usage: connect_db [options]
Options:
-d DB, --database=DB Name of database to connect
-p DBPASSWORD, --password=DBPASSWORD
Database password in single quotes
-------------------------------------------------------------------------
Usage: create_db [options]
Options:
-s NODES, --hosts=NODES
comma-separated list of hosts to participate in
database
-d DB, --database=DB Name of database to be created
-c CATALOG, --catalog_path=CATALOG
Path of catalog directory[optional] if not using
compat21
-D DATA, --data_path=DATA
Path of data directory[optional] if not using compat21
Database password in single quotes [optional]
-l LICENSEFILE, --license=LICENSEFILE
Database license [optional]
-P POLICY, --policy=POLICY
Database restart policy [optional]

--compat21 Use Vertica 2.1 method using node names instead of
hostnames
-------------------------------------------------------------------------
Usage: database_parameters [options]
Options:
-d DB, --database=DB Name of database
-P PARAMETER, --parameter=PARAMETER
Database parameter
-c COMPONENT, --component=COMPONENT
Component[optional]
-s SUBCOMPONENT, --subcomponent=SUBCOMPONENT
Sub Component[optional]
-p PASSWORD, --password=PASSWORD
Database password[optional]
-------------------------------------------------------------------------
Usage: db_add_node [options]
Options:
-d DB, --database=DB Name of database to be restarted
-s HOSTS, --hosts=HOSTS
Comma separated list of hosts to add to database
-a AHOSTS, --add=AHOSTS
Comma separated list of hosts to add to database
-i, --noprompts do not stop and wait for user input(default false)
hostnames
-------------------------------------------------------------------------
Usage: db_remove_node [options]
Options:
-d DB, --database=DB Name of database to be modified
Name of the host to remove from the db

hostnames
-------------------------------------------------------------------------
Usage: db_replace_node [options]
Options:
-o ORIGINAL, --original=ORIGINAL
Name of host you wish to replace
-n NEWHOST, --new=NEWHOST
Name of the replacement host
-------------------------------------------------------------------------
Usage: db_status [options]
Options:
-s STATUS, --status=STATUS
Database status UP,DOWN or ALL(list running dbs -
UP,list down dbs - DOWN list all dbs - ALL
-------------------------------------------------------------------------
Usage: drop_db [options]
Options:
-d DB, --database=DB Database to be dropped
-------------------------------------------------------------------------
Usage: edit_auth [options]
Options:
-d DATABASE, --database=DATABASE
database to edit authentication parameters for
-------------------------------------------------------------------------
Usage: host_to_node [options]
Options:
-s HOST, --host=HOST comma separated list of hostnames which is to be
converted into its corresponding nodenames
-d DB, --database=DB show only node/host mapping for this database.

-------------------------------------------------------------------------
Usage: install_package [options]
Options:
-d DBNAME, --dbname=DBNAME
database name
database admin password
-P PACKAGE, --package=PACKAGE
specify package or 'all' or 'default'
-------------------------------------------------------------------------
Usage: install_procedure [options]
Options:
-d DBNAME, --database=DBNAME
Name of database for installed procedure
-f PROCPATH, --file=PROCPATH
Path of procedure file to install
-p OWNERPASSWORD, --password=OWNERPASSWORD
Password of procedure file onwer
-------------------------------------------------------------------------
Usage: kill_host [options]
Options:
comma-separated list of hosts on which the vertica
process is to be killed using a SIGKILL signal
hostnames
-------------------------------------------------------------------------
Usage: kill_node [options]
Options:
process is to be killed using a SIGKILL signal
hostnames
-------------------------------------------------------------------------

Usage: list_allnodes [options]
Options:
-------------------------------------------------------------------------
Usage: list_db [options]
Options:
-d DB, --database=DB Name of database to be listed
-------------------------------------------------------------------------
Usage: list_host [options]
Options:
-------------------------------------------------------------------------
Usage: list_node [options]
Options:
-n NODENAME, --node=NODENAME
Name of the node to be listed
-------------------------------------------------------------------------
Usage: list_packages [options]
Options:
database name
-------------------------------------------------------------------------
Usage: logrotateconfig [options]
Options:
database name
-r ROTATION, --rotation=ROTATION
set how often the log is rotated.[
daily|weekly|monthly ]
-s MAXLOGSZ, --maxsize=MAXLOGSZ
set maximum log size before rotation is forced.

-k KEEP, --keep=KEEP set # of old logs to keep
-------------------------------------------------------------------------
Usage: node_map [options]
Options:
-d DB, --database=DB List only data for this database.
-------------------------------------------------------------------------
Usage: rebalance_data [options]
Options:
database name
-k KSAFETY, --ksafety=KSAFETY
specify the new k value to use
--script Don't re-balance the data, just provide a script for
later use.
-------------------------------------------------------------------------
Usage: restart_db [options]
Options:
-e EPOCH, --epoch=EPOCH
Epoch at which the database is to be restarted. If
'last' is given as argument the db is restarted from
the last good epoch.
-------------------------------------------------------------------------
Usage: restart_node [options]
Options:
-s NODES, --hosts=NODES
comma-separated list of hosts to be restarted
-d DB, --database=DB Name of database whose node is to be restarted

-F, --force force the node to start and auto recover if necessary
hostnames
-------------------------------------------------------------------------
Usage: return_epoch [options]
Options:
-d DB, --database=DB Name of database
-------------------------------------------------------------------------
Usage: set_restart_policy [options]
Options:
-d DB, --database=DB Name of database for which to set policy
-p POLICY, --policy=POLICY
Restart policy: ('never', 'ksafe', 'always')
-------------------------------------------------------------------------
Usage: show_active_db [options]
Options:
-------------------------------------------------------------------------
Usage: start_db [options]
Options:
-d DB, --database=DB Name of database to be started
-F, --force force the database to start at an epoch before data
consistency problems were detected.
-------------------------------------------------------------------------
Usage: stop_db [options]
Options:
-d DB, --database=DB Name of database to be stopped
-F, --force Force the databases to shutdown, even if users are
connected.

-------------------------------------------------------------------------
Usage: stop_host [options]
Options:
process is to be killed using a SIGTERM signal
hostnames
-------------------------------------------------------------------------
Usage: stop_node [options]
Options:
process is to be killed using a SIGTERM signal
hostnames
-------------------------------------------------------------------------
Usage: uninstall_package [options]
Options:
database name
-------------------------------------------------------------------------
Usage: upgrade_license_key [options]
Options:
-d DB, --database=DB Name of database [required if databases exist]
-l LICENSE, --license=LICENSE
Database license
-i INSTALL, --install=INSTALL
argument '-i install' to Install license else without
'-i install' Upgrade license

Database password[optional]
-------------------------------------------------------------------------
Usage: view_cluster [options]
Options:
-x, --xpand show the full cluster state, node by node
-d DB, --database=DB filter the output for a single database

Using Management Console
Most of the information you need to use MC is available on the MC interface. The topics in this
section augment some areas of the MC interface and provide examples. For an introduction to MC
functionality, architecture, and security, see Management Console in the Concepts Guide.
Management Console provides some, but not all of the functionality that the Administration Tools
provides. In addition, MC provides extended functionality not available in the Administration Tools,
such as a graphical view of your HP Vertica database and detailed monitoring charts and graphs,
described in Monitoring HP Vertica Using MC. See Administration Tools and Management Console
in the Administrator's Guide.
If you have not yet installed MC, see Installing and Configuring Management Console in the
Installation Guide.
Connecting to MC
To connect to Management Console:
1. Open an HTML-5 compliant browser.
2. Enter the IP address or host name of the host on which you installed MC (or any cluster node if
you installed HP Vertica first), followed by the MC port you assigned when you configured MC
(default 5450).
For example, enter one of:
https://00.00.00.00:5450/
or
https://hostname:5450/
3. When the MC logon dialog appears, enter your MC username and password and click Log in.
Note: When MC users log in to the MC interface, MC checks their privileges on HP
Vertica Data Collector (DC) tables on MC-monitored databases. Based on DC table
privileges, along with the role assigned the MC user, each user's access to the MC's
Overview, Activity and Node details pages could be limited. See About MC Privileges and
Roles for more information.
If you do not have an MC username/password, contact your MC administrator.

Managing Client Connections on MC
Each client session on MC uses a connection from MaxClientSessions, a database configuration
parameter that determines the maximum number of sessions that can run on a single database
cluster node. If multiple MC users are mapped to the same database account and are concurrently
monitoring the Overview and Activity pages, graphs could be slow to update while MC waits for a
connection from the pool.
Tip: You can increase the value for MaxClientSessions on an MC-monitored database to take
extra sessions into account. See Managing Sessions for details.
See Also
l Monitoring HP Vertica Using MC

Managing Database Clusters on MC
To perform database/cluster-specific tasks on one or more MC-managed clusters, navigate to the
Databases and Clusters page.
MC administrators see the Import/Create Database Cluster options, while non-administrative MC
users see only the databases on which they have been assigned the appropriate access levels.
Depending on your access level, the database-related operations you can perform on the MC
interface include:
l Create a new database/cluster.
l Import an existing database/cluster into the MC interface.
l Start the database, unless it is already running (green).
l Stop the database, but only if no users are connected.
l Remove the database from the MC interface.
Note: Remove does not drop the database; it leaves it in the cluster, hidden from the UI. To
add the database back to the MC interface, import it using the IP address of any cluster
node. A Remove operation also stops metrics gathering on that database, but statistics
gathering automatically resumes after you re-import.
l Drop the database after you ensure no users are connected. Drop is a permanent action that
drops the database from the cluster.
l View Database to open the Overview page, a layout that provides a dashboard view into the
health of your database cluster (node state, storage, performance, CPU/memory, and query
concurrency). From this page you can drill down into more detailed database-specific
information by clicking data points in the graphs.
l View Cluster to open the Manage page, which shows all nodes in the cluster, as well as each
node's state. You can also see a list of monitored databases on the selected cluster and its
state; for example, a green arrow indicates a database in an UP state. For node-specific
information, click any node to open the Node Details page.
For more information about what users can see and do on MC, see the following topics:
See Also
l About MC Users

Create an Empty Database Using MC
You can create a new database on an existing HP Vertica cluster through the Management
Console interface.
Database creation can be a long-running process, lasting from minutes to hours, depending on the
size of the target database. You can close the web browser during the process and sign back in to
MC later; the creation process continues unless an unexpected error occurs. See the Notes
section below the procedure on this page.
You currently need to use command line scripts to define the database schema and load data.
Refer to the topics in Configuration Procedure. You should also run the Database Designer, which
you access through the Administration Tools, to create either a comprehensive or incremental
design. Consider using the Tutorial in the Getting Started Guide to create a sample database you
can start monitoring immediately.
How to Create an Empty Database on an MC-managed
Cluster
1. If you are already on the Databases and Clusters page, skip to the next step; otherwise:
a. Connect to MC and sign in as an MC administrator.
b. On the Home page, click the Databases and Clusters task.
2. If no databases exist on the cluster, continue to the next step; otherwise:
a. If a database is running on the cluster on which you want to add a new database, select the
database and click Stop.
b. Wait for the running database to have a status of Stopped.
3. Click the cluster on which you want to create the new database and click Create Database.
4. The Create Database wizard opens. Provide the following information:
n Database name and password. See Creating a Database Name and Password for rules.
n Optionally click Advanced to open the advanced settings and change the port and catalog,
data, and temporary data paths. By default the MC application/web server port is 5450 and
paths are /home/dbadmin, or whatever you defined for the paths when you ran the Cluster
Creation Wizard or the install_vertica script. Do not use the default agent port 5444 as a
new setting for the MC port. See MC Settings > Configuration for port values.
5. Click Continue.
6. Select nodes to include in the database.

The Database Configuration window opens with the options you provided and a graphical
representation of the nodes appears on the page. By default, all nodes are selected to be part of
this database (denoted by a green check mark). You can optionally click each node and clear
Include host in new database to exclude that node from the database. Excluded nodes are
gray. If you change your mind, click the node and select the Include check box.
7. Click Create in the Database Configuration window to create the database on the nodes.
The creation process takes a few moments, after which the database starts and a Success
message appears on the interface.
8. Click OK to close the success message.
MC's Manage page opens and displays the database nodes. Nodes not included in the database
are colored gray, which means they are standby nodes you can include later. To add nodes to or
remove nodes from your HP Vertica cluster, which are not shown in standby mode, you must run
the install_vertica script.
Notes
l If warnings occur during database creation, nodes will be marked on the UI with an Alert icon
and a message.
n Warnings do not prevent the database from being created, but you should address warnings
after the database creation process completes by viewing the database Message Center
from the MC Home page.
n Failure messages display on the database Manage page with a link to more detailed
information and a hint with an actionable task that you must complete before you can
continue. Problem nodes are colored red for quick identification.
n To view more detailed information about a node in the cluster, double-click the node from the
Manage page, which opens the Node Details page.
l To create MC users and grant them access to an MC-managed database, see About MC Users
and Creating an MC User.
See Also
l Creating a Cluster Using MC
l Troubleshooting Management Console
l Restarting MC

Import an Existing Database Into MC
If you have already upgraded your database to the current version of HP Vertica, MC automatically
discovers the cluster and any databases installed on it, regardless of whether those databases are
currently running or are down.
Note: If you haven't created a database and want to create one through the MC, see Create an
Empty Database Using MC.
How to Import an Existing Database on the Cluster
The following procedure describes how to import an MC-discovered existing database into the MC
interface so you can monitor it.
1. Connect to Management Console and sign in as an MC administrator.
2. On the MC Home page, click Databases and Clusters.
3. On the Databases and Clusters page, click the cluster cube and click View in the dialog box
that opens.
4. On the left side of the page, look under the Databases heading and click Import Discovered.
Tip: A running MC-discoverd databases appears as Monitored, and any non-running
databases appear as Discovered. MC supports only one running database on a single
cluster at a time. In the image above, if you want to monitor the MYDB database, you
would need to shut down the DATABASE2 database first.
5. In the Import Database dialog box:
a. Select the database you want to import.
b. Optionally clear auto-discovered databases you don't want to import.
c. Supply the database username and password and click Import.
After Management Console connects to the database it opens the Manage page, which provides a
view of the cluster nodes. See Monitoring Cluster Status for more information.
You perform the import process once per existing database. Next time you connect to Management
Console, you'll see your database under the Recent Databases section on the Home page, as well
as on the Databases and Clusters page.
Note: The system clocks in your cluster must be synchronized with the system that is running
Management Console to allow automatic discovery of local clusters.

Using MC on an AWS Cluster
If you are running an Amazon Web Services (AWS) cluster on HP Vertica 6.1.2, you can install and
run MC to monitor and manage your database. You cannot, however, use the MC interface to
create or import an HP Vertica cluster.
Managing MC Settings
The MC Settings page allows you to configure properties specific to Management Console. You
can:
l Change the MC and agent default port assignments
l Upload a new SSL certificate
l Use LDAP for user authentication
l Create new MC users and map them to an MC-managed database using user credentials on the
HP Vertica server
l Install HP Vertica on a cluster of hosts through the MC interface
l Customize the look and feel of MC with themes
Modifying Database-Specific Settings
To inspect or modify settings related to an MC-managed database, go to the Databases and
Clusters page. On this page, view a running database, and access that database's Settings page
from a tab at the bottom at the page.

Changing MC or Agent Ports
When you configure MC, the Configuration Wizard sets up the following default ports:
l 5450—Used to connect a web browser session to MC and allows communication from HP
Vertica cluster nodes to the MC application/web server
l 5444—Provides MC-to-node and node-to-node (agent) communications for database
create/import and monitoring activities
If You Need to Change the MC Default Ports
A scenario might arise where you need to change the default port assignments for MC or its agents.
For example, perhaps one of the default ports is not available on your HP Vertica cluster, or you
encounter connection problems between MC and the agents. The following topics describe how to
change port assignments for MC or its agents.
See Also
l Ensure Ports Are Available
How to Change the Agent Port
Changing the agent port takes place in two steps: at the command line, where you modify the
config.py file and through a browser, where you modify MC settings.
Change the Agent Port in config.py
1. Log in as root on any cluster node and change to the agent directory:
# cd /opt/vertica/agent
2. Use any text editor to open config.py.
3. Scroll down to the agent_port = 5444 entry and replace 5444 with a different port number.
4. Save and close the file.
5. Copy config.py to the /opt/vertica/agent directory on all nodes in the cluster.
6. Restart the agent process by running the following command:

# /etc/init.d/vertica_agent restart
7. Repeat (as root) Step 6 on each cluster node where you copied the config.py file.
Change the Agent Port on MC
1. Open a web browser and connect to MC as a user with MC ADMIN privileges.
2. Navigate to MC Settings > Configuration.
3. Change Default HP Vertica agent port from 5444 to the new value you specified in the
config.py file.
4. Click Apply and click Done.
5. Restart MC so MC can connect to the agent at its new port.
How to Change the MC Port
Use this procedure to change the default port for MC's application server from 5450 to a different
value.
1. Open a web browser and connect to MC as a user with MC ADMIN privileges.
2. On the MC Home page, navigate to MC Settings > Configuration and change the Application
server running port value from 5450 to a new value.
3. In the change-port dialog, click OK.
4. Restart MC.
5. Reconnect your browser session using the new port. For example, if you changed the port from
5450 to 5555, use one of the following formats:
https://00.00.00.00:5555/
OR
https://hostname:5555/
Backing Up MC
Before you upgrade MC, HP recommends that you back up your MC metadata (configuration and
user settings) on a storage location external to the server on which you installed MC.

1. On the target server (where you want to store MC metadata), log on as root or a user with sudo
privileges.
2. Create a backup directory; for example:
# mkdir /backups/mc/mc-backup-20130425
3. Copy the /opt/vconsole directory to the new backup folder:
# cp –r /opt/vconsole /backups/mc/mc-backup-20130425

Troubleshooting Management Console
The Management Console Diagnostics page, which you access from the Home page, helps you
resolve issues within the MC process, not the database.
What You Can diagnose:
l View Management Console logs, which you can sort by column headings, such as type,
component, or message).
l Search within messages for key words or phrases and search for log entries within a specific
time frame.
l Export database messages to a file.
l Reset console parameters to their original configuration.
Caution: Reset removes all data (monitoring and configuration information) from storage
and forces you to reconfigure MC as if it were the first time.
l Restart the Management Console process. When the process completes, you are directed back
to the login page.
Viewing the MC Log
If you want to browse MC logs (not database logs), navigate to the Diagnostics > MC Log page.
This page provides a tabular view of the contents at /opt/vconsole/log/mc/mconsole.log,
letting you more easily identify and troubleshoot issues related to MC.
You can sort log entries by clicking the column header and search within messages for key words,
phrases, and log entries within a specific time frame. You can also export log messages to a file.

See Also
l Exporting MC-managed Database Messages and Logs
Exporting the User Audit Log
When an MC user makes changes on Management Console, whether to an MC-managed database
or to the MC itself, their action generates a log entry that contains data you can export to a file.
If you perform an MC factory reset (restore MC to its pre-configured state), you automatically have
the opportunity to export audit records before the reset occurs.
To Manually Export MC User Activity
1. From the MC Home page, click Diagnostics and then click Audit Log.
2. On the Audit log viewer page, click Export and save the file to a location on the server.
To see what types of user operations the audit logger records, see Monitoring MC User Activity.

Restarting MC
You might need to restart the MC web/application server for a number of reasons, such as after you
change port assignments, use the MC interface to import a new SSL certificate, or if the MC
interface or HP Vertica-related tasks become unresponsive.
Restarting MC requires ADMIN Role (mc) or SUPER Role (mc) privileges.
How to Restart MC Through the MC Interface (using Your
browser)
1. Open a web browser and connect to MC as an administrator.
2. On MC's Home page, click Diagnostics.
3. Click Restart Console and then click OK to continue or Cancel to return to the Diagnostics
page..
The MC process shuts down for a few seconds and automatically restarts. After the process
completes, you are directed back to the sign-in page.
How to Restart MC At the Command Line
If you are unable to connect to MC through a web browser for any reason, such as if the MC
interface or HP Vertica-related tasks become unresponsive, you can run the vertica-consoled
script with start, stop, or restart arguments.
Follow these steps to start, stop, or restart MC.
1. As root, open a terminal window on the server on which MC is installed.
2. Run the vertica-consoled script:
# /etc/init.d/vertica-consoled { stop | start | restart }
stop Stops the MC application/web server.
start Starts the MC application/web server.
Caution: Use start only if you are certain MC is not already running. As a best
practice, stop MC before you issue the start command.
restart Restarts the MC application/web server. This process will report that the stop
didn't work if MC is not already running.

Starting over
If you need to return MC to its original state (a "factory reset"), see Resetting MC to Pre-Configured
State.
Resetting MC to Pre-Configured State
If you decide to reset MC to its original, preconfigured state, you can do so on the Diagnostics
page by clicking Factory Reset.
Tip: Consider trying one of the options described in Restarting MC first.
A factory reset removes all metadata (about a week's worth of database monitoring/configuration
information and MC users) from storage and forces you to reconfigure MC again, as described in
Configuring MC in the Installation Guide.
After you click Factory Reset, you have the chance to export audit records to a file by clicking Yes.
If you click No (do not export audit records), the process begins. There is no undo.
Keep the following in mind concerning user accounts and the MC.
l When you first configure MC, during the configuration process you create an MC super user (a
Linux account). Issuing a Factory Reset on the MC does not create a new MC super user, nor
does it delete the existing MC super user. When initializing after a Factory Reset, you must
logon using the original MC super user account.
l Note that, once MC is configured, you can add users that are specific to MC. Users created
through the MC interface are MC specific. When you subsequently change a password through
the MC, you only change the password for the specific MC user. Passwords external to MC
(i.e., system Linux users and HP Vertica database passwords) remain unchanged.
For information on MC users, refer to the sections, Creating an MC User and MC configuration
privileges.
Avoiding MC Self-Signed Certificate Expiration
When you connect to MC through a client browser, HP Vertica assigns each HTTPS request a self-
signed certificate, which includes a timestamp. To increase security and protect against password
replay attacks, the timestamp is valid for several seconds only, after which it expires.
To avoid being blocked out of MC, synchronize time on the hosts in your HP Vertica cluster, as well
as on the MC host if it resides on a dedicated server. To recover from loss or lack of
synchronization, resync system time and the Network Time Protocol. See Set Up Time
Synchronization in the Installation Guide. If you want to generate your own certificates and keys for
MC, see Generating Certificates and Keys for MC.

Operating the Database
Starting and Stopping the Database
This section describes how to start and stop the HP Vertica database using the Administration
Tools, Management Console, or from the command line.
Starting the Database
Starting a K-safe database is supported when up to K nodes are down or unavailable. See Failure
and recovery.
You can start a database using any of these methods:
l The command line
Starting the Database Using MC
On MC's Databases and Clusters page, click a database to select it, and click Start within the
dialog box that displays.
Starting the Database Using the Administration Tools
1. Open the Administration Tools and select View Database Cluster State to make sure that all
nodes are down and that no other database is running. If all nodes are not down, see Shutdown
Problems.
3. On the Main Menu, select Start Database,and then select OK.
4. Select the database to start, and then click OK.
Caution: HP strongly recommends that you start only one database at a time. If you start
more than one database at any time, the results are unpredictable. Users could encounter
resource conflicts or perform operations in the wrong database.
5. Enter the database password, and then click OK.

6. When prompted that the database started successfully, click OK.
7. Check the log files to make sure that no startup problems occurred.
If the database does not start successfully, see Startup Problems.
Starting the Database At the Command Line
If you use the admintools command line option, start_db(), to start a database, the -p password
argument is only required during database creation, when you install a new license.
As long as the license is valid, the -p argument is not required to start the database and is silently
ignored, even if you introduce a typo or prematurely press the enter key. This is by design, as the
database can only be started by the user who (as part of the verticadba UNIX user group) initially
Following is an example of using start_db on a standalone node:
[dbadmin@localhost ~]$ /opt/vertica/bin/admintools -t start_db -d VMartInfo: no password
specified, using none
Node Status: v_vmart_node0001: (UP)
Database VMart started successfully
Stopping the Database
Stopping a K-safe database is supported when up to K nodes are down or unavailable. See Failure
and recovery.
You can stop a running database using either of these methods:
Note: You cannot stop a running database if any users are connected or Database Designer is
building or deploying a database design.

Stopping a Running Database Using MC
1. Log in to MC as an MC administrator and navigate to the Manage page to make sure all nodes
are up. If a node is down, click that node and select Start node in the Node list dialog box.
2. Inform all users that have open connections that the database is going to shut down and
instruct them to close their sessions.
Tip: To check for open sessions, query the V_MONITOR.SESSIONS table. The client_
label column returns a value of MC for users who are connected to MC.
3. Still on the Manage page, click Stop in the toolbar.
Stopping a Running Database Using the Administration
Tools
1. Use View Database Cluster State to make sure that all nodes are up. If all nodes are not up,
see Restarting a Node.
2. Inform all users that have open connections that the database is going to shut down and
instruct them to close their sessions.
Tip: A simple way to prevent new client sessions from being opened while you are shutting
down the database is to set the MaxClientSessions configuration parameter to 0. Be sure to
restore the parameter to its original setting once you've restarted the database.
=> SELECT SET_CONFIG_PARAMETER ('MaxClientSessions', 0);
3. Close any remaining user sessions. (Use the CLOSE_SESSION and CLOSE_ALL_
SESSIONS functions.)
5. On the Main Menu, select Stop Database, and then click OK.
6. Select the database you want to stop, and click OK.
7. Enter the password if asked, and click OK.
8. When prompted that the database has been successfully stopped, click OK.
Stopping a Running Database At the Command Line
If you use the admintools command line option, stop_db(), to stop a database as follows:

[dbadmin@localhost ~]$ /opt/vertica/bin/admintools -t stop_db -d VMartInfo: no password s
pecified, using none
Issuing shutdown command to database
Database VMart stopped successfully
As long as the license is valid, the -p argument is not required to stop the database and is silently
ignored, even if you introduce a typo or press the enter key prematurely. This is by design, as the
database can only be stopped by the user who (as part of the verticadba UNIX user group) initially

Working with the HP Vertica Index Tool
Use the HP Vertica Reindex option only if you have upgraded HP Vertica 6.0 from an earlier
version.
As of HP Vertica 6.0, there are three Index tool options:
l Reindex
l CheckCRC
l Checksort
Note: The Checksort option is available as of Version 6.0.1.
Following an upgrade to 6.0 or later, any new ROSes (including those that the TM generates) will
use the new index format. New installations use the improved index and maintain CRC
automatically. If you choose not to run the Reindex option the Mergeout function will re-write your
ROS containers as they are passed through Mergeout, but older ROS containers might never go
through Mergeout and as a result will not utilize the new index format without the Reindex option
being run.
You can run each of the HP Vertica Index tool options when the database cluster is down. You can
run the CheckCRC (-v) and Checksort (-I) options with the cluster up or down, as follows:
Index Option Cluster down (per node) Cluster up (per node or all nodes)
Reindex vertica -D catalog_path -i N/A. Node cluster must be down to reindex.
CheckCRC vertica -D catalog_path -v select run_index_tool ('checkcrc')
select run_index_tool ('checkcrc', 'true')
Checksort vertica -D catalog_path -I select run_index_tool ('checksort')
select run_index_tool ('checksort', 'true')
The HP Vertica Index tool options are accessed through the vertica binary, located in the
/opt/vertica/bin directory on most installations.
Note: Running the Index tool options from the command line outputs progress about tool
activity, but not its results. To review the detailed messages after running the Index tool
options, see the indextool.log file, located in the database catalog directory as described
below.
Syntax
/opt/vertica/bin/vertica -D catalog_path [-i | -I | -v]

Parameters
-D catalog_path Specifies the catalog directory (-D) on which to run each option.
-i Reindexes the ROSes.
-I Checks the node's ROSes for correct sort order.
-v Calculates a per-block cyclic redundancy check (CRC) on existing data
storage.
Note: You must run the reindex option on each cluster node, with the cluster down. You can
run the Index tool in parallel on different nodes.
Permissions
You must be a superuser to run the Index tool with any option.
Controlling Expression Analysis
The expression analysis that occurs as part of the HP Vertica database indexing techniques
improves overall performance. You can turn off such analysis, but doing so is not recommended.
To control expression analysis:
1. Use add_vertica_options to turn off EE expression analysis:
select add_vertica_options('EE', 'DISABLE_EXPR_ANALYSIS');
2. Display the current value by selecting it as follows:
select show_current_vertica_options();
3. Use clr_vertica_options to enable the expression analysis option:
select clr_vertica_options('EE', 'DISABLE_EXPR_ANALYSIS');
Performance and CRC
HP Vertica recognizes that CRC can affect overall database performance. You can turn off the
CRC facility, but doing so is not recommended.

To control CRC:
1. Change the value of the configuration parameter to zero (0):
select set_config_parameter(‘CheckCRC’, ‘0’);
2. Display the value:
select * from configuration_parameters;
3. To enable CRC. set the parameter to one (1):
select set_config_parameter(‘CheckCRC’, ‘1');
The following sections describe each of the HP Vertica Index tool options and how and when to use
them.
Running the Reindex Option
Run the HP Vertica Reindex option to update each ROS index after upgrading to 6.0 or higher.
Note: If you are currently running HP Vertica 6.0 or higher, you do not need to run the Reindex
option. Only run the Reindex option if you are upgrading from a version prior to 6.0. See
Working with the HP Vertica Index Tool for more information.
Using this option scans all local storage and reindexes the data in all ROSes on the node from
which you invoke the tool, adding several new fields to the ROS data blocks. These fields include
the data block's minimum and maximum values (min_value and max_value), and the total number
of nulls stored in the block (null_count). This option also calculates cyclic redundancy check
(CRC) values, and populates the corresponding data block field with the CRC value. The new data
block fields are required before using the CheckCRC or Checksort options. Once ROS data has
been reindexed, you can use either of the other HP Vertica Index tool options.
To reindex ROSes with the database cluster DOWN:
1. From the command line, start the Index tool with -D to specify the catalog directory path, and -
i to reindex:
[dbadmin@localhost bin]$ /opt/vertica/bin/vertica -D /home/dbadmin/VMart/v_vmart_node
0001_catalog -i
2. The Index tool outputs some general data to the terminal, and writes detailed information to the
indextool.log file:

Setting up logger and sessions...Loading catalog...
Collecting storage containers...
Scanning data on disk...
Storages 219/219, bytes 302134347/302134347 100%
Committing catalog and SAL changes...
[dbadmin@localhost bin]$
3. The indextool.log file is located in the database catalog directory:
/home/dbadmin/VMart/v_vmart_node0001_catalog/indextool.log
Running the CheckCRC Option
The CheckCRC option initially calculates a cyclic redundancy check (CRC) on each block of the
existing data storage. You can run this option only after the ROS has been reindexed. Using this
Index tool option populates the corresponding ROS data block field with a CRC value. Running
CheckCRC after its first invocation checks for data corruption.
To run CheckCRC when the database cluster is down:
1. From the command line, use the Index tool with -D to specify the catalog directory path, and -v
to specify the CheckCRC option.
2. CheckCRC outputs general data such as the following to the terminal, and writes detailed
information to the indextool.log file:
dbadmin@localhost bin]$ /opt/vertica/bin/vertica -D /home/dbadmin/VMart/v_vmart_node0
001_catalog -vSetting up logger and sessions...
Loading catalog...
Storages 272/272, bytes 302135743/302135743 100%
[dbadmin@localhost bin]$
To run CheckCRC when the database is running:
1. From vsql, enter this query to run the check on the initiator node:
select run_index_tool ('checkcrc');
-or-

select run_index_tool ('checkcrc', 'false');
2. Enter this query to run the check on all nodes:
select run_index_tool ('checkcrc', 'true');
Handling CheckCRC Errors
Once CRC values exist in each ROS data block, HP Vertica calculates and compares the existing
CRC each time data is fetched from disk as part of query processing. If CRC errors occur while
fetching data, the following information is written to the vertica.log file:
CRC Check Failure Details:File Name:
File Offset:
Compressed size in file:
Memory Address of Read Buffer:
Pointer to Compressed Data:
Memory Contents:
The Event Manager is also notified upon CRC errors, so you can use an SNMP trap to capture
CRC errors:
"CRC mismatch detected on file <file_path>. File may be corrupted. Please check hardware
and drivers."
If you are running a query from vsql, ODBC, or JDBC, the query returns a FileColumnReader
ERROR, indicating that a specific block's CRC does not match a given record, with the following hint:
hint: Data file may be corrupt. Ensure that all hardware (disk and memory) is working pr
operly. Possible solutions are to delete the file <pathname> while the node is down, and
then allow the node to recover, or truncate the table data.code: ERRCODE_DATA_CORRUPTED
Running the Checksort Option
If ROS data is not sorted correctly in the projection's order, queries that rely on sorted data will be
incorrect. Use the Checksort option to check the ROS sort order if you suspect or detect incorrect
queries. The Index tool Checksort option (-I) evaluates each ROS row to determine if the row is
sorted correctly. If the check locates a row that is not in order, it writes an error message to the log
file indicating the row number and contents of the unsorted row.
Note: Running Checksort from the command line does not report any defects that the tool
discovers, only the amount of scanned data.

The Checksort option checks only the ROSes of the host from which you initiate the Index tool. For
a comprehensive check of all ROSes in the HP Vertica cluster, run check sort on each cluster node
to ensure that all ROS data is sorted.
To run Checksort when the database cluster is down:
1. From the command line, start the Index tool with -D to specify the catalog directory path, and -
I to check the sort order:
[dbadmin@localhost bin]$ /opt/vertica/bin/vertica -D /home/dbadmin/VMart/v_vmart_node
0001_catalog -I
2. The Index tool outputs some general data to the terminal, and detailed information to the
indextool.log file:
Setting up logger and sessions... Loading catalog...
Storages 17/17, bytes 1739030582/1739030582 100%
To run Checksort when the database is running:
1. From vsql, enter this query to check the ROS sort order on the initiator node:
select run_index_tool ('checksort');
-or-
select run_index_tool ('checksort', 'false');
2. Enter this query to run the sort check on all nodes:
select run_index_tool ('checksort', 'true');
Viewing Details of Index Tool Results
When running the HP Vertica Index tool options from the command line, the tool writes minimal
output to STDOUT, and detailed information to the indextool.log file in the database catalog
directory. When running CheckCRC and Checksort from vsql, results are written to the
vertica.log file on the node from which you run the query.
To view the results in the indextool.log file:

1. From the command line, navigate to the indextool.log file, located in the database catalog
directory.
[15:07:55][vertica-s1]: cd /my_host/databases/check/v_check_node0001_catalog
2. For Checksort, all error messages include an OID number and the string 'Sort Order
Violation' as follows:
<INFO> ...on oid 45035996273723545: Sort Order Violation:
3. You can use grep on the indextool.log file to search for the Sort Order Violation string with a
command such as this, which returns the line before each string (-B1), and the four lines that
follow (-A4):
[15:07:55][vertica-s1]: grep -B1 -A4 'Sort Order Violation:' /my_host/databases/check
/v_check_node0001_catalog/indextool.log 2012-06-14 14:07:13.686 unknown:0x7fe1da7a195
0 [EE] <INFO> An error occurred when running index tool thread on oid 450359962737235
37:
Sort Order Violation:
Row Position: 624
Column Index: 0
Last Row: 2576000
This Row: 2575000
--
012-06-14 14:07:13.687 unknown:0x7fe1dafa2950 [EE] <INFO> An error occurred when runn
ing index tool thread on oid 45035996273723545:
Sort Order Violation:
Row Position: 3
Column Index: 0
Last Row: 4
This Row: 2
--
To find the relevant projection where the sort violation was found:
1. Query the storage_containers system table using a storage_oid equal to the OID value
listed in the indextool.log file.
2. Use a query such as this:
=> select * from storage_containers where storage_oid = 45035996273723545;

Working with Tables
You can create two types of tables in HP Vertica, columnar and flexible. Additionally, you can
create either type as persistent or temporary. You can also create views to capture a specific set of
table columns that you query frequently.
This section describes how to:
l Create base and temporary tables
l Alter tables
l Use named sequences in tables
l Merge contents from one table into another
l Drop and truncate tables
Creating Base Tables
The CREATE TABLE statement creates a table in the HP Vertica logical schema.The example
database described in the Getting Started Guide includes sample SQL scripts that demonstrate
this procedure. For example:
CREATE TABLE vendor_dimension ( vendor_key INTEGER NOT NULL PRIMARY KEY,
vendor_name VARCHAR(64),
vendor_address VARCHAR(64),
vendor_city VARCHAR(64),
vendor_state CHAR(2),
vendor_region VARCHAR(32),
deal_size INTEGER,
last_deal_update DATE
);
Note: Each table can have a maximum 1600 columns.
Creating Tables Using the /*+direct*/ Clause
You can use the /* +direct */ clause to create a table or temporary table, saving the table
directly to disk (ROS), bypassing memory (WOS). For example, following is an existing table called
states:
VMart=> select * from states;
State | Bird | Tree | Tax
-------+----------+-------+-----
MA | Robin | Maple | 5.7
Working with Tables

NH | Thrush | Elm | 0
NY | Cardinal | Oak | 7.2
(3 rows)
Create a new table, StateBird, with the /*+direct*/ clause in the statement, placing the clause
directly before the query (select State, Bird from states):
VMart=> create table StateBird as /*+direct*/ select State, Bird from states;
CREATE TABLE
VMart=> select * from StateBird;
State | Bird
-------+----------
MA | Robin
NH | Thrush
NY | Cardinal
(3 rows)
The following example creates a temporary table using the /*+direct*/ clause, along with the ON
COMMIT PRESERVE ROWS directive:
VVMart=> create temp table StateTax ON COMMIT PRESERVE ROWS as /*+direct*/ select State,
Tax from states;
CREATE TABLE
VMart=> select * from StateTax;
State | Tax
-------+-----
MA | 5.7
NH | 0
NY | 7.2
(3 rows)
Automatic Projection Creation
To get your database up and running quickly, HP Vertica automatically creates a default projection
for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE
statements. Each projection created automatically (or manually) includes a base projection name
prefix. You must use the projection prefix when altering or dropping a projection (ALTER
PROJECTION RENAME, DROP PROJECTION).
How you use the CREATE TABLE statement determines when the projection is created:
l If you create a table without providing the projection-related clauses, HP Vertica automatically
creates a superprojection for the table when you use an INSERT INTO or COPY statement to
load data into the table for the first time. The projection is created in the same schema as the
table. Once HP Vertica has created the projection, it loads the data.
l If you use CREATE TABLE AS SELECT to create a table from the results of a query, the table
is created first and a projection is created immediately after, using some of the properties of the
underlying SELECT query.
Working with Tables

l (Advanced users only) If you use any of the following parameters, the default projection is
created immediately upon table creation using the specified properties:
n column-definition (ENCODING encoding-type and ACCESSRANK integer)
n ORDER BY table-column
n hash-segmentation-clause
n UNSEGMENTED { NODE node | ALL NODES }
n KSAFE
Note: Before you define a superprojection in the above manner, read Creating Custom
Designs in the Administrator's Guide.
Characteristics of Default Automatic Projections
A default auto-projection has the following characteristics:
l If created as a result of a CREATE TABLE AS SELECT statement, uses the encoding specified
in the query table.
l Auto-projections use hash segmentation.
l The number of table columns used in the segmentation expression can be configured, using the
MaxAutoSegColumns configuration parameter. See General Parameters in the Administrator's
Guide. Columns are segmented in this order:
n Short (<8 bytes) data type columns first
n Larger (> 8 byte) data type columns
n Up to 32 columns (default for MaxAutoSegColumns configuration parameter)
n If segmenting more than 32 columns, use nested hash function
Working with Tables

stream (COPY or
INSERT INTO)
Same as input
stream, if sorted.
the table
TABLE AS SELECT
query
Same as input
stream, if sorted.
If not sorted,
sorted using
following rules.
segmented
Has FK and PK
constraints
FK first, then PK
columns
PK columns
(no PK)
FK first, then
remaining columns
(no FK)
Has no FK or PK
constraints
Default automatic projections and segmentation get your database up and running quickly. HP
recommends that you start with these projections and then use the Database Designer to optimize
your database further. The Database Designer creates projections that optimize your database
based on the characteristics of the data and, optionally, the queries you use.
See Also
l Creating External Tables
l CREATE TABLE
Creating a Table Like Another
You can create a new table based on an existing table using the CREATE TABLE statement with the
LIKE existing_table clause, optionally including the projections of the existing table. Creating a
new table with the LIKE option replicates the table definition and any storage policy associated
with the existing table. The statement does not copy any data. The main purpose of this function is
to create an intermediate table into which you can move partition data, and eventually, archive the
data and drop the intermediate table.
Note: Invoking CREATE TABLE with its LIKE clause before calling the function to move
partitions for archiving requires first dropping pre-join-projections or refreshing out-of-date
Working with Tables

projections.
You can optionally use the including projections clause to create a table that will have the
existing table's current and non-pre-join projection definitions whenever you populate the table.
Replicated projections are named automatically to avoid conflict with any existing objects, and
follow the same naming conventions as auto projections. You cannot create a new table like
another if the source table has pre-join- or out-of-date-projections. The statement displays a warning
message.
Note: HP Vertica does not support using CREATE TABLE new_t LIKE exist_t INCLUDING
PROJECTIONS if exist_t is a temporary table.
Epochs and Node Recovery
The checkpoint epoch (CPE) for both the source and target projections are updated as ROSes are
moved. The start and end epochs of all storage containers, such as ROSes, are modified to the
agreed move epoch. When this occurs, the epochs of all columns without an actual data file rewrite
advance the CPE to the move epoch. If any nodes are down during the TM moveout, they will
detect that there is storage to recover, and will recover from other nodes with the correct epoch
upon rejoining the cluster.
Storage Location and Policies for New Tables
When you use the CREATE TABLE...LIKE statement, any storage policy objects associated with
the table are also copied. Data added to the new table will use the same labeled storage location as
the source table, unless you change the storage policy. For more information, see Working With
Storage Locations.
Simple Example
This example shows how to use the statement for a table that already exists, and suggests a
naming convention that describes the contents of the new table:
Create a new schema in which to create an intermediate table with projections. This is the table into
which you will move partitions. Then, create a table identical to the source table from which to move
partitions:
VMART=> create schema partn_backup;CREATE SCHEMA
VMART=> create table partn_backup.trades_200801 like prod.trades including projections;
CREATE TABLE
Once the schema and table exist, you can move one or more of the existing table partitions to the
new intermediate table.
Using CREATE TABLE LIKE
For this example, create a table, states:
Working with Tables

See Also
l Creating Base Tables
l Creating Temporary Tables
l Creating External Tables
l Moving Partitions
l CREATE TABLE
Creating Temporary Tables
You create temporary tables using the CREATE TEMPORARY TABLE statement, specifying the
table as either local or global. You cannot create temporary external tables.
A common use case for a temporary table is to divide complex query processing into multiple steps.
Typically, a reporting tool holds intermediate results while reports are generated (for example, first
get a result set, then query the result set, and so on). You can also write subqueries.
Note: The default retention when creating temporary tables is ON COMMIT DELETE ROWS,
which discards data at transaction completion. The non-default value is ON COMMIT PRESERVE
ROWS, which discards data when the current session ends.
Global Temporary Tables
HP Vertica creates global temporary tables in the public schema, with the data contents private to
the transaction or session through which data is inserted.
Global temporary table definitions are accessible to all users and sessions, so that two (or more)
users can access the same global table concurrently. However, whenever a user commits or rolls
back a transaction, or ends the session, HP Vertica removes the global temporary table data
automatically, so users see only data specific to their own transactions or session.
Global temporary table definitions persist in the database catalogs until they are removed explicitly
through a DROP TABLE statement.
Local Temporary Tables
Local temporary tables are created in the V_TEMP_SCHEMA namespace and inserted into the user's
search path transparently. Each local temporary table is visible only to the user who creates it, and
only for the duration of the session in which the table is created.
When the session ends, HP Vertica automatically drops the table definition from the database
catalogs. You cannot preserve non-empty, session-scoped temporary tables using the ON
COMMIT PRESERVE ROWS statement.
Working with Tables

Creating local temporary tables is significantly faster than creating regular tables, so you should
make use of them whenever possible.
Note: You cannot add projections to non-empty, session-scoped temporary tables if you
specify ON COMMIT PRESERVE ROWS. Be sure that projections exist before you load data,
as described in the section Automatic Projection Creation in CREATE TABLE. Also, while you
can add projections for tables created with the ON COMMIT DELETE ROWS option, be aware
that you could save the projection but still lose all the data.
Creating a Temp Table Using the /*+direct*/ Clause
You can use the /* +direct */ clause to create a table or temporary table, saving the table
directly to disk (ROS), bypassing memory (WOS). For example, following is an existing table called
states:
VMart=> select * from states;
State | Bird | Tree | Tax
-------+----------+-------+-----
MA | Robin | Maple | 5.7
NH | Thrush | Elm | 0
NY | Cardinal | Oak | 7.2
(3 rows)
Create a new table, StateBird, with the /*+direct*/ clause in the statement, placing the clause
directly before the query (select State, Bird from states):
VMart=> create table StateBird as /*+direct*/ select State, Bird from states;
CREATE TABLE
VMart=> select * from StateBird;
State | Bird
-------+----------
MA | Robin
NH | Thrush
NY | Cardinal
(3 rows)
The following example creates a temporary table using the /*+direct*/ clause, along with the ON
COMMIT PRESERVE ROWS directive:
VVMart=> create temp table StateTax ON COMMIT PRESERVE ROWS as /*+direct*/ select State,
Tax from states;
CREATE TABLE
VMart=> select * from StateTax;
State | Tax
-------+-----
MA | 5.7
NH | 0
NY | 7.2
(3 rows)
Working with Tables

Characteristics of Default Automatic Projections
Once local or global table exists, HP Vertica creates auto-projections for temporary tables
whenever you load or insert data.
The default auto-projection for a temporary table has the following characteristics:
l It is automatically unsegmented on the initiator node, if you do not specify a hash-segmentation-
clause.
l The projection is not pinned.
l Temp tables are not recoverable, so the superprojection is not K-Safe (K-SAFE=0), and you
cannot make it so.
stream (COPY or
INSERT INTO)
Same as input
stream, if sorted.
the table
TABLE AS SELECT
query
Same as input
stream, if sorted.
If not sorted,
sorted using
following rules.
segmented
Has FK and PK
constraints
FK first, then PK
columns
PK columns
(no PK)
FK first, then
remaining columns
(no FK)
Has no FK or PK
constraints
Advanced users can modify the default projection created through the CREATE TEMPORARY TABLE
statement by defining one or more of the following parameters:
Working with Tables

l column-definition (temp table) (ENCODING encoding-type and ACCESSRANK integer)
l ORDER BY table-column
l hash-segmentation-clause
l UNSEGMENTED { NODE node | ALL NODES }
l NO PROJECTION
Note: Before you define the superprojection in this manner, read Creating Custom Designs in
the Administrator's Guide.
Preserving GLOBAL Temporary Table Data for a
Transaction or Session
You can preserve session-scoped rows in a GLOBAL temporary table for the entire session or for
the current transaction only.
To preserve a temporary table for the transaction, use the ON COMMIT DELETE ROWS clause:
=> CREATE GLOBAL TEMP TABLE temp_table1 (x NUMERIC, y NUMERIC )
ON COMMIT DELETE ROWS;
To preserve temporary table data until the end of the session, use the ON COMMIT PRESERVE
ROWS clause:
=> CREATE GLOBAL TEMP TABLE temp_table2 (x NUMERIC, y NUMERIC )
ON COMMIT PRESERVE ROWS;
Specifying Column Encoding
You can specify the encoding type to use per column.
The following example specifies that the superprojection created for the temp table use RLE
encoding for the y column:
=> CREATE LOCAL TEMP TABLE temp_table1 (x NUMERIC, y NUMERIC ENCODING RLE )
ON COMMIT DELETE ROWS;
The following example specifies that the superprojection created for the temp table use the sort
order specified by the ORDER BY clause, rather than the order of columns in the column list.
=> CREATE GLOBAL TEMP TABLE temp_table1 (
x NUMERIC,
Working with Tables

y NUMERIC ENCODING RLE,
b VARCHAR(8),
z VARCHAR(8) )
ORDER BY z, x;
See Also
l CREATE TEMPORARY TABLE
l CREATE TABLE
l TRUNCATE TABLE
l DELETE
l ANALYZE_STATISTICS
Creating External Tables
You create an external table using the CREATE EXTERNAL TABLE AS COPY statement. You cannot
create temporary external tables. For the syntax details to create an external table, see the
CREATE EXTERNAL TABLE statement in the SQL Reference Manual.
Note: Each table can have a maximum of 1600 columns.
Required Permissions for External Tables
You must be a database superuser to create external tables, unless you create a USER-accessible
storage location (see ADD_LOCATION) and grant user privileges to the location, schema, and so
on.
Note: Permission requirements for external tables differ from other tables. To gain full access
(including SELECT) to an external table that a user has privileges to create, the database
superuser must also grant READ access to the USER-accessible storage location, see
GRANT (Storage Location).
COPY Statement Definition
When you create an external table, table data is not added to the database, and no projections are
created. Instead, HP Vertica performs a syntactic check of the CREATE EXTERNAL TABLE...
statement, and stores the table name and COPY statement definition in the catalog. When a
SELECT query references an external table, the stored COPY statement is parsed to obtain the
referenced data. Successfully returning data from the external table requires that the COPY
Working with Tables

definition is correct, and that other dependencies, such as files, nodes, and other resources are
accessible and available at query-time.
For more information about checking the validity of the external table COPY definition, see
Validating External Tables.
Developing User-Defined Load (UDL) Functions for
External Tables
You can create external tables with your own load functions. For more information about developing
user-defined load functions, see User Defined Load (UDL) and the extended COPY syntax in the
Examples
Examples of external table definitions:
CREATE EXTERNAL TABLE ext1 (x integer) AS COPY FROM '/tmp/ext1.dat' DELIMITER ',';
CREATE EXTERNAL TABLE ext1 (x integer) AS COPY FROM '/tmp/ext1.dat.bz2' BZIP DELIMITER ',
';
CREATE EXTERNAL TABLE ext1 (x integer, y integer) AS COPY (x as '5', y) FROM '/tmp/ext1.d
at.bz2' BZIP DELIMITER ',';
See Also
l COPY
l CREATE EXTERNAL TABLE AS COPY
Validating External Tables
When you create an external table, HP Vertica validates the syntax of the CREATE EXTERNAL
TABLE AS COPY FROM statement. For instance, if you omit a required keyword in the statement
(such as FROM), creating the external table fails, as in this example:
VMart=> create external table ext (ts timestamp,d varchar) as copy '/home/dbadmin/designe
r.log'; ERROR 2778: COPY requires a data source; either a FROM clause or a WITH SOURCE f
or a user-defined source
Checking other aspects of the COPY definition (such as path statements and node availability)
does not occur until a select query references the external table.
To validate that you have successfully created an external table definition, run a select query
referencing the external table. Check that the returned query data is what you expect. If the query
does not return data correctly, check the COPY exception and rejected data log files.
Working with Tables

Since the COPY definition determines what occurs when you query an external table, obtaining
COPY statement errors can help reveal any underlying problems. For more information about
COPY exceptions and rejections, see Capturing Load Rejections and Exceptions.
Limiting the Maximum Number of Exceptions
Querying external table data with an incorrect COPY FROM statement definition can potentially
result in many exceptions. To limit the number of saved exceptions, HP Vertica sets the maximum
number of reported rejections with the ExternalTablesExceptionsLimit configuration parameter.
The default value is 100. Setting the ExternalTablesExceptionsLimit to -1 disables the limit.
For more information about configuration parameters, see Configuration Parameters, and
specifically, General Parameters.
If COPY errors reach the maximum number of exceptions, the external table query continues, but
COPY generates a warning in the vertica.log, and does not report subsequent rejections and/or
exceptions.
Note: Using the ExternalTablesExceptionsLimit configuration parameter differs from the
COPY statement REJECTMAX clause. If COPY reaches the number of exceptions defined by
REJECTMAX, COPY aborts execution, and does not generate a vertica.log warning.
Working with External Tables
After creating external tables, you access them as any other table.
Managing Resources for External Tables
External tables require minimal additional resources. When you use a select query for an external
table, HP Vertica uses a small amount of memory when reading external table data, since the table
contents are not part of your database and are parsed each time the external table is used.
Backing Up and Restoring External Tables
Since the data in external tables is managed outside of HP Vertica, only the external table
definitions, not the data files, are included in database backups.
Using Sequences and Identity Columns in External
Tables
The COPY statement definition for external tables can include identity columns and sequences.
Whenever a select statement queries the external table, sequences and identity columns are re-
evaluated. This results in changing the external table column values, even if the underlying external
table data remains the same.
Working with Tables

Viewing External Table Definitions
When you create an external table, HP Vertica stores the COPY definition statement in the table_
definition column of the v_catalog.tables system table.
1. To list all tables, use a select * query, as shown:
select * from v_catalog.tables where table_definition <> '';
2. Use a query such as the following to list the external table definitions (table_definition):
select table_name, table_definition from v_catalog.tables; table_name |
table_definition
------------+----------------------------------------------------------------------
t1 | COPY FROM 'TMPDIR/external_table.dat' DELIMITER ','
t1_copy | COPY FROM 'TMPDIR/external_table.dat' DELIMITER ','
t2 | COPY FROM 'TMPDIR/external_table2.dat' DELIMITER ','
(3 rows)
External Table DML Support
Following are examples of supported queries, and others that are not:
Supported Unsupported
SELECT * FROM external_table; DELETE FROM external_table WHERE x = 5
;
SELECT * FROM external_table where col1=4; INSERT INTO external_table SELECT * FR
OM ext;
DELETE FROM ext WHERE id IN (SELECT x FROM externa
l_table);
INSERT INTO ext SELECT * FROM external_table; SELECT * FROM external_table for updat
e;
Using External Table Values
Following is a basic example of how you could use the values of an external table.
1. Create and display the contents of a file with some integer values:
[dbadmin@localhost ~]$ more ext.dat1
2
Working with Tables

3
4
5
6
7
8
10
11
12
2. Create an external table pointing at ext.dat:
VMart=> create external table ext (x integer) as copy from '/home/dbadmin/ext.dat'; C
REATE TABLE
3. Select the table contents:
VMart=> select * from ext; x
----
1
2
3
4
5
6
7
8
10
11
12
(11 rows)
4. Perform evaluation on some external table contents:
VMart=> select ext.x, ext.x + ext.x as double_x from ext where x > 5; x | double_x
----+----------
6 | 12
7 | 14
8 | 16
10 | 20
11 | 22
12 | 24
(6 rows)
5. Create a second table (second), also with integer values:
VMart=> create table second (y integer);CREATE TABLE
Working with Tables

6. Populate the table with some values:
VMart=> copy second from stdin;Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 1
>> 3
>> 4
>> 5
>> .
7. Join the external table (ext) with the table created in HP Vertica, called second:
VMart=> select * from ext join second on x=y; x | y
---+---
1 | 1
1 | 1
3 | 3
4 | 4
5 | 5
(5 rows)
Using External Tables
External tables let you query data stored in files accessible to the HP Vertica database, but not
managed by it. Creating external tables supplies read-only access through SELECT queries. You
cannot modify external tables through DML commands, such as INSERT, UPDATE, DELETE,
and MERGE.
Using CREATE EXTERNAL TABLE AS COPY Statement
You create external tables with the CREATE EXTERNAL TABLE AS COPY... statement, shown in this
basic example:
CREATE EXTERNAL TABLE tbl(i INT) AS COPY (i) FROM 'path1' ON node1, 'path2' ON node2;
For more details on the supported options to create an external table, see the CREATE
EXTERNAL TABLE statement in the SQL Reference Manual.
The data you specify in the FROM clause of a CREATE EXTERNAL TABLE AS COPY statement can
reside in one or more files or directories, and on one or more nodes. After successfully creating an
external table, HP Vertica stores the table name and its COPY definition. Each time a select query
references the external table, HP Vertica parses the COPY statement definition again to access
the data. Here is a sample select statement:
SELECT * FROM tbl WHERE i > 10;
Working with Tables

Storing HP Vertica Data in External Tables
While there are many requirements for you to use external table data, one reason is to store
infrequently-accessed HP Vertica data on low-cost external media. If external storage is a goal at
your site, the process to accomplish that requires exporting the older data to a text file, creating a
bzip or gzip file of the export data, and saving the compressed file on an NFS disk. You can then
create an external table to access the data any time it is required.
Using External Tables with User-Defined Load (UDL)
Functions
You can also use external tables in conjunction with the UDL functions that you create. For more
information about using UDLs, see User Defined Load (UDL) in the Programmer's Guide.
Organizing External Table Data
If the data you store in external tables changes regularly (for instance, each month in the case of
storing recent historical data), your COPY definition statement can use wildcards to make parsing
the stored COPY statement definition more dynamic. For instance, if you store monthly data on an
NFS mount, you could organize monthly files within a top-level directory for a calendar year, such
as:
/2012/monthly_archived_data/
In this case, the external table COPY statement will include a wildcard definition such as the
following:
CREATE TABLE archive_data (...) AS COPY FROM 'nfs_name/2012/monthly_archived_data/*'
Whenever an HP Vertica query references the external table months, and HP Vertica parses the
COPY statement, all stored data tables in the top-level monthly_archived_data directory are
made accessible to the query.
Altering Table Definitions
Using ALTER TABLE syntax, you can respond to your evolving database schema requirements.
The ability to change the definition of existing database objects facilitates ongoing maintenance.
Furthermore, most of these options are both fast and efficient for large tables, because they
consume fewer resources and less storage than having to stage data in a temporary table.
Here are some of the operations you can perform using the ALTER TABLE statement:
l Rename a table
l Add, drop, and rename columns
Working with Tables

l Add and drop constraints
l Add table columns with a default derived expression
l Change a column's data type
l Change a table owner
l Rename a table schema
l Move a table to a new schema
l Change, reorganize, and remove table partitions
External Table Restrictions
Not all ALTER TABLE options are applicable for external tables. For instance, you cannot add a
column to an external table, but you can rename the table:
=> ALTER TABLE mytable RENAME TO mytable2;
ALTER TABLE
Exclusive ALTER TABLE Clauses
The following clauses are exclusive, which means you cannot combine them with another ALTER
TABLE clause:
l ADD COLUMN
l RENAME COLUMN
l SET SCHEMA
l PARTITION BY
l REORGANIZE
l REMOVE PARTITIONING
l RENAME [ TO ]
l OWNER TO
Note: You can use the ADD constraints and DROP constraints clauses together.
Working with Tables

Using Consecutive ALTER TABLE Commands
With the exception of performing a table rename, complete other ALTER TABLE statements
consecutively. For example, to add multiple columns to a table, issue consecutive ALTER TABLE
ADD COLUMN commands, ending each statement with a semicolon.
For more information about ALTER TABLE syntax and parameters, see the SQL Reference
Manual.
Adding Table Columns
When you use the ADD COLUMN syntax as part of altering a table, HP Vertica takes an O lock on the
table until the operation completes. The lock prevents DELETE, UPDATE, INSERT, and COPY
statements from accessing the table, and blocks SELECT statements issued at SERIALIZABLE
isolation level, until the operation completes. Each table can have a maximum of 1600 columns.
You cannot add columns to a temporary table or to tables that have out-of-date superprojections
with up-to-date buddies.
The following operations occur as part of adding columns:
l Inserts the default value for existing rows. For example, if the default expression is CURRENT_
TIMESTAMP, all rows have the current timestamp.
l Automatically adds the new column with a unique projection column name to the
superprojection of the table.
l Populates the column according to the column-constraint (DEFAULT, for example).
You can add a column to a table using the ALTER TABLE ADD COLUMN statement with the
CASCADE keyword. When you use CASCADE, HP Vertica also adds the new table column to all
pre-join projections that are created using that table.
If you use the CASCADE keyword and specify a nonconstant default column value, HP Vertica
does not add the column to the pre-join projections.
For detailed information about how to add columns to tables, see ALTER TABLE.
Note: Adding a column to a table does not affect the K-safety of the physical schema design,
and you can add columns when nodes are down.
Updating Associated Table Views
Adding new columns to a table that has an associated view does not update the view's result set,
even if the view uses a wildcard (*) to represent all table columns. To incorporate new columns, you
must recreate the view. See CREATE VIEW in the SQL Reference Manual.
Working with Tables

Specifying Default Expressions
When you add a new column to a table using ALTER TABLE ADD COLUMN, the default expression for
the new column can evaluate to a user-defined scalar function, a constant, or a derived expression
involving other columns of the same table.
The default expression of an ADD COLUMN statement disallows nested queries, or aggregate
functions. Instead, use the ALTER COLUMN option, described in Altering Table Columns.
About Using Volatile Functions
You cannot use a volatile function in the following two scenarios. Attempting to do so causes a
rollback.
l As the default expression for an ALTER TABLE ADD COLUMN statement:
ALTER TABLE t ADD COLUMN a2 INT DEFAULT my_sequence.nextval;
ROLLBACK: VOLATILE functions are not supported in a default expression
ALTER TABLE t ADD COLUMN n2 INT DEFAULT my_sequence.currval;
ALTER TABLE t ADD COLUMN c2 INT DEFAULT RANDOM() + 1;
l As the default expression for an ALTER TABLE ALTER COLUMN statement on an external table:
ALTER TABLE mytable ADD COLUMN a2 FLOAT DEFAULT RANDOM();
ROLLBACK 5241: Unsupported access to external table
ALTER TABLE mytable ALTER COLUMN x SET DEFAULT RANDOM();
ROLLBACK 5241: Unsupported access to external table
You can specify a volatile function as a column default expression using the ALTER TABLE ALTER
COLUMN statement:
ALTER TABLE t ALTER COLUMN a2 SET DEFAULT my_sequence.nextval;
Updating Associated Table Views
Adding new columns to a table that has an associated view does not update the view's result set,
even if the view uses a wildcard (*) to represent all table columns. To incorporate new columns, you
must recreate the view. See CREATE VIEW in the SQL Reference Manual.
Altering Table Columns
Use ALTER COLUMN syntax to alter an existing table column to change, drop, or establish a default
expression for the column. You can also use DROP DEFAULT to remove a default expression.
Working with Tables

Any new data that you load after altering a column will conform to the modified table definition. For
example:
l After a DROP COLUMN operation completes, data backed up from the current epoch onward
will recover without the column. Data recovered from a backup prior to the current epoch will re-
add the table column. Because drop operations physically purge object storage and catalog
definitions (table history) from the table, AT EPOCH (historical) queries return nothing for the
dropped column.
l If you change a column's data type from CHAR(8) to CHAR(16) in epoch 10 and you restore the
database from epoch 5, the column will be CHAR(8) again.
Adding Columns with a Default Derived Expression
You can add one or more columns to a table and set the default value as an expression. The
expression can reference another column in the same table, or be calculated with a user-defined
function (see Types of UDFs in the Programmer's Guide). You can alter unstructured tables to use
a derived expression as described in Altering Unstructured Tables. HP Vertica computes a default
value within a row. This flexibility is useful for adding a column to a large fact table that shows
another view on the data without having to INSERT .. SELECT a large data set.
Adding columns requires an O lock on the table until the add operation completes. This lock
prevents DELETE, UPDATE, INSERT, and COPY statements from affecting the table, and blocks
SELECT statements issued at SERIALIZABLE isolation level, until the operation completes. Only
the new data you load after the alter operation completes is derived from the expression.
You cannot include a nested query or an aggregate function as a default expression. The column
must use a specific expression that involves other elements in the same row.
You cannot specify a default expression that derives data from another derived column. This means
that if you already have a column with a default derived value expression, you cannot add another
column whose default references the existing column.
Note: You can add a column when nodes are down.
Add a Default Column Value Derived From Another Column
1. Create a sample table called t with timestamp, integer and varchar(10) columns:
=> CREATE TABLE t (a TIMESTAMP, b INT, c VARCHAR(10));
CREATE TABLE
=> INSERT INTO t VALUES ('2012-05-14 10:39:25', 2, 'MA');
OUTPUT
--------
1
(1 row)
2. Use the vsql d t meta-command to describe the table:
Working with Tables

=> dt
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key
| Foreign Key
--------+-------+--------+-------------+------+---------+----------+-------------
+-------------
public | t | a | timestamp | 8 | | f | f
|
public | t | b | int | 8 | | f | f
|
public | t | c | varchar(10) | 10 | | f | f
|
(3 rows)
3. Use ALTER TABLE to add a fourth column that extracts the month from the timestamp value in
column a:
=> ALTER TABLE t ADD COLUMN d INTEGER DEFAULT EXTRACT(MONTH FROM a);
ALTER TABLE
4. Query table t:
=> select * from t;
a | b | c | d
---------------------+---+----+---
2012-05-14 10:39:25 | 2 | MA | 5
(1 row)
Column d returns integer 5 (representing the 5th month).
5. View the table again to see the new column (d) and its default derived value.
=> d t List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null |
Primary Key | Foreign Key
--------+-------+--------+-------------+------+-------------------------+----------+-
------------+-------------
public | t | a | timestamp | 8 | | f |
f |
public | t | b | int | 8 | | f |
f |
public | t | c | varchar(10) | 10 | | f |
f |
public | t | d | int | 8 | date_part('month', t.a) | f |
f |
(4 rows)
6. Drop the sample table t:
Working with Tables

=> DROP TABLE t;
DROP TABLE
Add a Default Column Value Derived From a UDSF
This example shows a user-defined scalar function that adds two integer values. The function is
called add2ints and takes two arguments.
1. Develop and deploy the function, as described in Developing and Using User Defined
Functions.
2. Create a sample table, t1, with two integer columns:
=> CREATE TABLE t1 ( x int, y int );
CREATE TABLE
3. Insert some values into t1:
=> insert into t1 values (1,2);
OUTPUT
--------
1
(1 row)
=> insert into t1 values (3,4);
OUTPUT
--------
1
(1 row)
4. Use ALTER TABLE to add a column to t1 with the default column value derived from the
UDSF, add2ints:
alter table t1 add column z int default add2ints(x,y);
ALTER TABLE
5. List the new column:
select z from t1;
z
----
3
7
(2 rows)
Working with Tables

Changing a column's Data Type
You can changes a table column's data type for any type whose conversion does not require
storage reorganization. For example, the following types are the conversions that HP Vertica
supports:
l Binary types—expansion and contraction but cannot convert between BINARY and
VARBINARY types.
l Character types—all conversions allowed, even between CHAR and VARCHAR
l Exact numeric types—INTEGER, INT, BIGINT, TINYINT, INT8, SMALLINT, and all
NUMERIC values of scale <=18 and precision 0 are interchangeable. For NUMERIC data
types, you cannot alter precision, but you can change the scale in the ranges (0-18), (19-37), and
so on.
HP Vertica does not allow data type conversion on types that require storage reorganization:
l Boolean type conversion to other types
l DATE/TIME type conversion
l Approximate numeric type conversions
l Between BINARY and VARBINARY types
You can expand (and shrink) columns within the same class of data type, which is useful if you
want to store longer strings in a column. HP Vertica validates the data before it performs the
conversion.
For example, if you try to convert a column from varchar(25) to varchar(10) and that column holds a
string with 20 characters, the conversion will fail. HP Vertica allow the conversion as long as that
column does not have a string larger than 10 characters.
Examples
The following example expands an existing CHAR column from 5 to 10:
=> CREATE TABLE t (x CHAR, y CHAR(5));CREATE TABLE
=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE CHAR(10);
ALTER TABLE
=> DROP TABLE t;
DROP TABLE
This example illustrates the behavior of a changed column's type. First you set column y's type to
VARCHAR(5) and then insert strings with characters that equal 5 and exceed 5:
=> CREATE TABLE t (x VARCHAR, y VARCHAR);CREATE TABLE
Working with Tables

=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE VARCHAR(5);
ALTER TABLE
=> INSERT INTO t VALUES ('1232344455','hello');
OUTPUT
--------
1
(1 row)
=> INSERT INTO t VALUES ('1232344455','hello1');
ERROR 4797: String of 6 octets is too long for type Varchar(5)
=> DROP TABLE t;
DROP TABLE
You can also contract the data type's size, as long as that altered column contains no strings
greater than 5:
=> CREATE TABLE t (x CHAR, y CHAR(10));CREATE TABLE
=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE CHAR(5);
ALTER TABLE
=> DROP TABLE t;
DROP TABLE
You cannot convert types between binary and varbinary. For example, the table definition below
contains two binary columns, so when you try to convert column y to a varbinary type, HP Vertica
returns a ROLLBACK message:
=> CREATE TABLE t (x BINARY, y BINARY);CREATE TABLE
=> ALTER TABLE t ALTER COLUMN y SET DATA TYPE VARBINARY;--N
ROLLBACK 2377: Cannot convert column "y" from "binary(1)" to type "varbinary(80)
=> DROP TABLE t;
DROP TABLE
How to Perform an Illegitimate Column Conversion
The SQL standard disallows an illegitimate column conversion, but you can work around this
restriction if you need to convert data from a non-SQL database. The following example takes you
through the process step by step, where you'll manage your own epochs.
Given a sales table with columns id (INT) and price (VARCHAR), assume you want to convert
the VARCHAR column to a NUMERIC field. You'll do this by adding a temporary column whose
default value is derived from the existing price column, rename the column, and then drop the
original column.
1. Create the sample table with INTEGER and VARCHAR columns and insert two rows.
=> CREATE TABLE sales(id INT, price VARCHAR) UNSEGMENTED ALL NODES;CREATE TABLE
=> INSERT INTO sales VALUES (1, '$50.00');
=> INSERT INTO sales VALUES (2, '$100.00');
Working with Tables

2. Commit the transaction:
=> COMMIT;COMMIT
3. Query the sales table:
=> SELECT * FROM SALES; id | price
----+---------
1 | $50.00
2 | $100.00
(2 rows)
4. Add column temp_price. This is your temporary column.
=> ALTER TABLE sales ADD COLUMN temp_price NUMERIC DEFAULT SUBSTR(sales.price, 2)::NU
MERIC;ALTER TABLE
5. Query the sales table, and you'll see the new temp_price column with its derived NUMERIC
values:
=> SELECT * FROM SALES; id | price | temp_price
----+---------+---------------------
1 | $50.00 | 50.000000000000000
2 | $100.00 | 100.000000000000000
(2 rows)
6. Drop the default expression from that column.
=> ALTER TABLE sales ALTER COLUMN temp_price DROP DEFAULT;ALTER TABLE
7. Advance the AHM:
SELECT advance_epoch(1); advance_epoch
---------------
New Epoch: 83
(1 row)
8. Manage epochs:
SELECT manage_epoch(); manage_epoch
--------------------------------
Current Epoch=83, AHM Epoch=82
(1 row)
Working with Tables

9. Drop the original price column.
=> ALTER TABLE sales DROP COLUMN price CASCADE;ALTER COLUMN
10. Rename the new (temporary) temp_price column back to its original name, price:
=> ALTER TABLE sales RENAME COLUMN temp_price to price;ALTER COLUMN
11. Query the sales table one last time:
=> SELECT * FROM SALES; id | price
----+---------------------
1 | 50.000000000000000
2 | 100.000000000000000
(2 rows)
12. Clean up (drop table sales):
=> DROP TABLE sales;DROP TABLE
See ALTER TABLE in the SQL Reference Manual
Adding Constraints on Columns
To add constraints on a new column:
1. Use the ALTER TABLE ADD COLUMN clause to add a new table column.
2. Use ALTER TABLE ADD CONSTRAINT to define constraints for the new column.
Adding and Removing NOT NULL Constraints
Use the[SET | DROP] NOT NULL clause to add (SET) or remove (DROP) a NOT NULL constraint
on the column.
When a column is a primary key and you drop the primary key constraint, the column retains the
NOT NULL constraint. If you want to allow that column to now contain NULL values, use [DROP
NOT NULL] to remove the NOT NULL constraint.
Examples
ALTER TABLE T1 ALTER COLUMN x SET NOT NULL;
ALTER TABLE T1 ALTER COLUMN x DROP NOT NULL;
Working with Tables

Note: Using the [SET | DROP] NOT NULL clause does not validate whether the column data
conforms to the NOT NULL constraint. Use ANALYZE_CONSTRAINTS to check for
constraint violations in a table.
See Also
l About Constraints
Dropping a Table Column
When you use the ALTER TABLE ... DROP COLUMN statement to drop a column, HP Vertica drops
both the specified column from the table and the ROS containers that correspond to the dropped
column.
ALTER TABLE [[db-name.]schema.]table-name ... | DROP [ COLUMN ] column-name [ CASCADE | RESTRI
CT ]
Because drop operations physically purge object storage and catalog definitions (table history) from
the table, AT EPOCH (historical) queries return nothing for the dropped column.
The altered table has the same object ID.
Note: Drop column operations can be fast because these catalog-level changes do not require
data reorganization, letting you quickly reclaim disk storage.
Restrictions
l At the table level, you cannot drop or alter a primary key column or a column participating in the
table's partitioning clause.
l At the projection level, you cannot drop the first column in a projection's sort order or columns
that participate in in the segmentation expression of a projection.
l All nodes must be up for the drop operation to succeed.
Using CASCADE to Force a Drop
You can work around some of the restrictions by using the CASCADE keyword which enforces
minimal reorganization of the table's definition in order to drop the column. You can use CASCADE
to drop a column if that column fits into one of the scenarios in the following table. Note that in all
cases that use CASCADE, HP Vertica tries to drop the projection(s) and will roll back if K-safety is
compromised:
Working with Tables

Column to
drop ... DROP column CASCADE behavior
Has a constraint
of any kind on it
HP Vertica will drop the column with CASCADE specified when a FOREIGN
KEY constraint depends on a UNIQUE or PRIMARY KEY constraint on the
referenced columns.
Participates in
the projection's
sort order
If you drop a column using the CASCADE keyword, HP Vertica truncates the
projection's sort order up to and including the projection that has been dropped
without impact on physical storage for other columns and then drops the
specified column. For example if a projection's columns are in sort order
(a,b,c), dropping column b causes the projection's sort order to be just (a),
omitting column (c).
Participates in
ONE of the
following:
l Is a pre-join
projection
l Participates
in the
projection's
segmentation
expression
In these scenarios, HP Vertica drops any non-critical, dependent projections,
maintains the superprojection that contains the data, and drops the
specified column. When a pre-join projection contains a column to be dropped
with CASCADE, HP Vertica tries to drop the projection.
Assume, for example, that you have a table with multiple projections and
where one projection has the column you are trying to drop is part of the
segmentation clause. When you specify CASCADE, the DROP COLUMN
statement tries to implicitly drop the projection that has this column in the
segmentation clause. If it succeeds the transaction completes, but if it
violates k-safety the transaction rolls back.
Although this is a DROP COLUMN ... CASCADE operation (not DROP
PROJECTION), HP Vertica could encounter cases when it is not possible to
drop a projection's column without reorganizing the projection. In these cases,
CASCADE will try to drop the projection itself to maintain data integrity. If K-
safety is compromised, the operation rolls back.
Examples
The following series of commands successfully drops a BYTEA data type column:
=> CREATE TABLE t (x BYTEA(65000), y BYTEA, z BYTEA(1));CREATE TABLE
=> ALTER TABLE t DROP COLUMN y;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624: Column "y" does not exist
=> ALTER TABLE t DROP COLUMN x RESTRICT;
ALTER TABLE
=> SELECT x FROM t;
ERROR 2624: Column "x" does not exist
=> SELECT * FROM t;
z
---
(0 rows)
=> DROP TABLE t CASCADE;
DROP TABLE
Working with Tables

The following series of commands tries to drop a FLOAT(8) column and fails because there are not
enough projections to maintain k-safety.
=> CREATE TABLE t (x FLOAT(8),y FLOAT(08));CREATE TABLE
=> ALTER TABLE t DROP COLUMN y RESTRICT;
ALTER TABLE
=> SELECT y FROM t;
ERROR 2624: Column "y" does not exist
=> ALTER TABLE t DROP x CASCADE;
ROLLBACK 2409: Cannot drop any more columns in t
=> DROP TABLE t CASCADE;
Moving a Table to Another Schema
The ALTER TABLE SET SCHEMA statement moves a table from one schema to another. Moving a
table requires that you have CREATE privileges for the destination schema. You can move only one
table between schemas at a time. You cannot move temporary tables between schemas.
SET SCHEMA has two options, CASCADE and RESTRICT. CASCADE, which is the default, automatically
moves all projections that are anchored on the source table to the destination schema, regardless of
the schema in which the projections reside. The RESTRICT option moves only projections that are
anchored on the source table and which also reside in the same schema.
If a table of the same name or any of the projections that you want to move already exist in the new
schema, the statement rolls back and does not move either the table or any projections. To work
around name conflicts:
1. Rename any conflicting table or projections that you want to move.
2. Run the ALTER TABLE SET SCHEMA statement again.
Note: HP Vertica lets you move system tables to system schemas. Moving system tables
could be necessary to support designs created through the Database Designer.
Changing a Table Owner
The ability to change table ownership is useful when moving a table from one schema to another.
Ownership reassignment is also useful when a table owner leaves the company or changes job
responsibilities. Because you can change the table owner, the tables won't have to be completely
rewritten, you can avoid loss in productivity.
ALTER TABLE [[db-name.]schema.]table-name OWNER TO new-owner name
In order to alter table ownership, you must be either the table owner or a superuser.
A change in table ownership transfers just the owner and not privileges; grants made by the original
owner are dropped and all existing privileges on the table are revoked from the previous owner.
However, altering the table owner transfers ownership of dependent sequence objects (associated
Working with Tables

IDENTITY/AUTO-INCREMENT sequences) but does not transfer ownership of other referenced
sequences. See ALTER SEQUENCE for details on transferring sequence ownership.
Notes
l Table privileges are separate from schema privileges; therefore, a table privilege change or table
owner change does not result in any schema privilege change.
l Because projections define the physical representation of the table, HP Vertica does not require
separate projection owners. The ability to create or drop projections is based on the table
privileges on which the projection is anchored.
l During the alter operation HP Vertica updates projections anchored on the table owned by the
old owner to reflect the new owner. For pre-join projection operations, HP Vertica checks for
privileges on the referenced table.
Example
In this example, user Bob connects to the database, looks up the tables, and transfers ownership of
table t33 from himself to to user Alice.
=> c - BobYou are now connected as user "Bob".
=> d
--------+--------+-------+---------+---------
(2 rows)
=> ALTER TABLE t33 OWNER TO Alice;
ALTER TABLE
Notice that when Bob looks up database tables again, he no longer sees table t33.
=> d List of tables
List of tables
--------+--------+-------+---------+---------
(1 row)
When user Alice connects to the database and looks up tables, she sees she is the owner of table
t33.
=> c - AliceYou are now connected as user "Alice".
=> d
List of tables
--------+------+-------+-------+---------
public | t33 | table | Alice |
(2 rows)
Working with Tables

Either Alice or a superuser can transfer table ownership back to Bob. In the following case a
superuser performs the transfer.
=> ALTER TABLE t33 OWNER TO Bob;
ALTER TABLE
=> d
List of tables
--------+----------+-------+---------+---------
public | comments | table | dbadmin |
s1 | t1 | table | User1 |
(4 rows)
You can also query the V_CATALOG.TABLES system table to view table and owner information.
Note that a change in ownership does not change the table ID.
In the below series of commands, the superuser changes table ownership back to Alice and queries
the TABLES system table.
=> ALTER TABLE t33 OWNER TO Alice;ALTER TABLE
tables; table_schema_id | table_schema | table_id | table_name | owner_id
| owner_name
-------------------+--------------+-------------------+------------+-------------------+-
-----------
45035996273704968 | public | 45035996273713634 | applog | 45035996273704962 |
dbadmin
45035996273704968 | public | 45035996273724496 | comments | 45035996273704962 |
dbadmin
45035996273730528 | s1 | 45035996273730548 | t1 | 45035996273730516 |
User1
45035996273704968 | public | 45035996273795846 | t33 | 45035996273724576 |
Alice
(5 rows)
Now the superuser changes table ownership back to Bob and queries the TABLES table again.
Nothing changes but the owner_name row, from Alice to Bob.
=> ALTER TABLE t33 OWNER TO Bob;ALTER TABLE
tables;
table_schema_id | table_schema | table_id | table_name | owner_id |
owner_name-------------------+--------------+-------------------+------------+-----------
--------+------------
45035996273704968 | public | 45035996273713634 | applog | 45035996273704962 |
dbadmin
45035996273704968 | public | 45035996273724496 | comments | 45035996273704962 |
dbadmin
45035996273730528 | s1 | 45035996273730548 | t1 | 45035996273730516 |
User1
45035996273704968 | public | 45035996273793876 | foo | 45035996273724576 |
Alice
Working with Tables

45035996273704968 | public | 45035996273795846 | t33 | 45035996273714428 |
Bob
(5 rows)
Table Reassignment with Sequences
Altering the table owner transfers ownership of only associated IDENTITY/AUTO-INCREMENT
sequences but not other reference sequences. For example, in the below series of commands,
ownership on sequence s1 does not change:
=> CREATE USER u1;CREATE USER
=> CREATE USER u2;
CREATE USER
=> CREATE SEQUENCE s1 MINVALUE 10 INCREMENT BY 2;
CREATE SEQUENCE
CREATE TABLE
CREATE TABLE
---------------+------------
s1 | dbadmin
(1 row)
ALTER TABLE
---------------+------------
s1 | dbadmin
(1 row)
ALTER TABLE
---------------+------------
s1 | dbadmin
(1 row)
See Also
l Changing a Sequence Owner
Changing a Sequence Owner
The ALTER SEQUENCE command lets you change the attributes of an existing sequence. All
changes take effect immediately, within the same session. Any parameters not set during an ALTER
SEQUENCE statement retain their prior settings.
If you need to change sequence ownership, such as if an employee who owns a sequence leaves
the company, you can do so with the following ALTER SEQUENCE syntax:
Working with Tables

ALTER SEQUENCE sequence-name OWNER TO new-owner-name;
This operation immediately reassigns the sequence from the current owner to the specified new
owner.
Only the sequence owner or a superuser can change ownership, and reassignment does not
transfer grants from the original owner to the new owner; grants made by the original owner are
dropped.
Note: Renaming a table owner transfers ownership of dependent sequence objects
(associated IDENTITY/AUTO-INCREMENT sequences) but does not transfer ownership of other
referenced sequences. See Changing a Table Owner.
Example
The following example reassigns sequence ownership from the current owner to user Bob:
=> ALTER SEQUENCE sequential OWNER TO Bob;
See ALTER SEQUENCE in the SQL Reference Manual for details.
Renaming Tables
The ALTER TABLE RENAME TO statement lets you rename one or more tables. The new table names
must not exist already.
Renaming tables does not affect existing pre-join projections because pre-join projections refer to
tables by their unique numeric object IDs (OIDs). Renaming tables also does not change the table
OID.
To rename two or more tables:
1. List the tables to rename with a comma-delimited list, specifying a schema-name after part of
the table specification only before the RENAME TO clause:
=> ALTER TABLE S1.T1, S1.T2 RENAME TO U1, U2;
The statement renames the listed tables to their new table names from left to right, matching
them sequentially, in a one-to-one correspondence.
The RENAME TO parameter is applied atomically so that all tables are renamed, or none of the
tables is renamed. For example, if the number of tables to rename does not match the number
of new names, none of the tables is renamed.
2. Do not specify a schema-name as part of the table specification after the RENAME TO clause,
since the statement applies to only one schema. The following example generates a syntax
Working with Tables

error:
=> ALTER TABLE S1.T1, S1.T2 RENAME TO S1.U1, S1.U2;
Note: Renaming a table referenced by a view causes the view to fail, unless you create
another table with the previous name to replace the renamed table.
Using Rename to Swap Tables Within a Schema
You can use the ALTER TABLE RENAME TO statement to swap tables within a schema without
actually moving data. You cannot swap tables across schemas.
To swap tables within a schema (example statement is split to explain steps):
1. Enter the names of the tables to swap, followed by a new temporary table placeholder (temps):
=> ALTER TABLE T1, T2, temps
2. Use the RENAME TO clause to swap the tables: T1 to temps, T2 to T1, and temps to T2:
RENAME TO temps, T1, T2;
Working with Tables

Using Named Sequences
Named sequences are database objects that generate unique numbers in ascending or descending
sequential order. They are most often used when an application requires a unique identifier in a table
or an expression. Once a named sequence returns a value, it never returns that same value again.
Named sequences are independent objects, and while you can use their values in tables, they are
not subordinate to them.
Types of Incrementing Value Objects
In addition to named sequences, HP Vertica supports two other kinds of sequence objects, which
also increment values:
l Auto-increment column value: a sequence available only for numeric column types. Auto-
increment sequences automatically assign the next incremental sequence value for that column
when a new row is added to the table.
l Identity column: a sequence available only for numeric column types.
Auto-increment and Identity sequences are defined through column constraints in the CREATE
TABLE statement and are incremented each time a row is added to the table. Both of these object
types are table-dependent and do not persist independently. The identity value is never rolled back
even if the transaction that tries to insert a value into the table is not committed. The LAST_
INSERT_ID function returns the last value generated for an auto-increment or identity column.
Each type of sequence object has a set of properties and controls. A named sequence has the most
controls, and an Auto-increment sequence the least. The following table lists the major differences
between the three sequence objects:
Behavior Named Sequence Identity Auto-increment
Default cache value 250K X X X
Set initial cache X X
Define start value X X
Specify increment unit X X
Create as standalone object X
Create as column constraint X X
Exists only as part of table X X
Requires name X
Use in expressions X
Unique across tables X
Working with Tables

Behavior Named Sequence Identity Auto-increment
Change parameters X
Move to different schema X
Set to increment or decrement X
Grant privileges to object X
Specify minimum value X
Specify maximum value X
Always starts at 1 X
While sequence object values are guaranteed to be unique, they are not guaranteed to be
contiguous. Since sequences are not necessarily contiguous, you may interpret the returned values
as missing. For example, two nodes can increment a sequence at different rates. The node with a
heavier processing load increments the sequence, but the values are not contiguous with those
being incremented on the other node.
Using a Sequence with an Auto_Increment or Identity
Column
Each table can contain only one auto_increment or identity column. A table can contain both an
auto_increment or identity column, and a named sequence, as in the next example, illustrating
a table with both types of sequences:
VMart=> CREATE TABLE test2 (id INTEGER NOT NULL UNIQUE,
middle INTEGER DEFAULT NEXTVAL('my_seq'),
next INT, last auto_increment);
CREATE TABLE
Named Sequence Functions
When you create a named sequence object, you can also specify the increment or decrement
value. The default is 1. Use these functions with named sequences:
l NEXTVAL — Advances the sequence and returns the next sequence value. This value is
incremented for ascending sequences and decremented for descending sequences. The first
time you call NEXTVAL after creating a sequence, the function sets up the cache in which to
store the sequence values, and returns either the default sequence value, or the start number
you specified with CREATE SEQUENCE.
l CURRVAL — Returns the LAST value across all nodes returned by a previous invocation of
NEXTVAL in the same session. If there were no calls to NEXTVAL after creating a sequence,
the CURRVAL function returns an error:
Working with Tables

dbt=> create sequence seq2;
CREATE SEQUENCE
dbt=> select currval('seq2');
ERROR 4700: Sequence seq2 has not been accessed in the session
You can use the NEXTVAL and CURRVAL functions in INSERT and COPY expressions.
Using DDL Commands and Functions With Named
Sequences
For details, see the following related statements and functions in the SQL Reference Manual:
Use this
statement... To...
CREATE
SEQUENCE
Create a named sequence object.
ALTER
SEQUENCE
Alter named sequence parameters, rename a sequence within a schema, or
move a named sequence between schemas.
DROP
SEQUENCE
Remove a named sequence object.
GRANT
SEQUENCE
Grant user privileges to a named sequence object. See also Sequence
Privileges.
Creating Sequences
Create a sequence using the CREATE SEQUENCE statement. All of the parameters (besides a
sequence name) are optional.
The following example creates an ascending sequence called my_seq, starting at 100:
dbt=> create sequence my_seq START 100;
CREATE SEQUENCE
After creating a sequence, you must call the NEXTVAL function at least once in a session to create
a cache for the sequence and its initial value. Subsequently, use NEXTVAL to increment the
sequence. Use the CURRVAL function to get the current value.
The following NEXTVAL function instantiates the newly-created my_seq sequence and sets its first
number:
=> SELECT NEXTVAL('my_seq');
nextval
---------
Working with Tables

100
(1 row)
If you call CURRVAL before NEXTVAL, the system returns an error:
dbt=> SELECT CURRVAL('my_seq');
ERROR 4700: Sequence my_seq has not been accessed in the session
The following command returns the current value of this sequence. Since no other operations have
been performed on the newly-created sequence, the function returns the expected value of 100:
=> SELECT CURRVAL('my_seq');
currval
---------
100
(1 row)
The following command increments the sequence value:
nextval
---------
101
(1 row)
Calling the CURRVAL again function returns only the current value:
=> SELECT CURRVAL('my_seq');
currval
---------
101
(1 row)
The following example shows how to use the my_seq sequence in an INSERT statement.
=> CREATE TABLE customer (
lname VARCHAR(25),
fname VARCHAR(25),
membership_card INTEGER,
id INTEGER
);
=> INSERT INTO customer VALUES ('Hawkins' ,'John', 072753, NEXTVAL('my_seq'));
Now query the table you just created to confirm that the ID column has been incremented to 102:
=> SELECT * FROM customer;
lname | fname | membership_card | id
---------+-------+-----------------+-----
Hawkins | John | 72753 | 102
(1 row)
Working with Tables

The following example shows how to use a sequence as the default value for an INSERT
command:
=> CREATE TABLE customer2(
id INTEGER DEFAULT NEXTVAL('my_seq'),
lname VARCHAR(25),
fname VARCHAR(25),
membership_card INTEGER
);
=> INSERT INTO customer2 VALUES (default,'Carr', 'Mary', 87432);
Now query the table you just created. The ID column has been incremented again to 103:
=> SELECT * FROM customer2;
id | lname | fname | membership_card
-----+-------+-------+-----------------
103 | Carr | Mary | 87432
(1 row)
The following example shows how to use NEXTVAL in a SELECT statement:
=> SELECT NEXTVAL('my_seq'), lname FROM customer2;
NEXTVAL | lname
---------+-------
104 | Carr
(1 row)
As you can see, each time you call NEXTVAL(), the value increments.
The CURRVAL() function returns the current value.
Altering Sequences
The ALTER SEQUENCE statement lets you change the attributes of a previously-defined
sequence. Changes take effect in the next database session. Any parameters not specifically set
in the ALTER SEQUENCE command retain their previous settings.
The ALTER SEQUENCE statement lets you rename an existing sequence, or the schema of a
sequence, but you cannot combine either of these changes with any other optional parameters.
Note: Using ALTER SEQUENCE to set a START value below the CURRVAL can result in
duplicate keys.
Examples
The following example modifies an ascending sequence called my_seq to start at 105:
ALTER SEQUENCE my_seq RESTART WITH 105;
The following example moves a sequence from one schema to another:
Working with Tables

ALTER SEQUENCE [public.]my_seq SET SCHEMA vmart;
The following example renames a sequence in the Vmart schema:
ALTER SEQUENCE [vmart.]my_seq RENAME TO serial;
Remember that changes occur only after you start a new database session. For example, if you
create a sequence named my_sequence and start the value at 10, each time you call the NEXTVAL
function, you increment by 1, as in the following series of commands:
CREATE SEQUENCE my_sequence START 10;SELECT NEXTVAL('my_sequence');
nextval
---------
10
(1 row)
SELECT NEXTVAL('my_sequence');
nextval
---------
11
(1 row)
Now issue the ALTER SEQUENCE statement to assign a new value starting at 50:
ALTER SEQUENCE my_sequence RESTART WITH 50;
When you call the NEXTVAL function, the sequence is incremented again by 1 value:
SELECT NEXTVAL('my_sequence'); NEXTVAL
---------
12
(1 row)
The sequence starts at 50 only after restarting the database session:
SELECT NEXTVAL('my_sequence'); NEXTVAL
---------
50
(1 row)
Distributed Sequences
When you create a sequence object, the CACHE parameter controls the sequence efficiency, by
determining the number of sequence values each node maintains during a session. The default
cache value is 250K, meaning that each node reserves 250,000 values for each sequence per
session.
HP Vertica distributes a session across all nodes. After you create a sequence, the first time a
node executes a NEXTVAL() function as part of a SQL statement, the node reserves its own cache
of sequence values. The node then maintains that set of values for the current session. Other
Working with Tables

nodes executing a NEXTVAL() function will also create and maintain their own cache of sequence
values cache.
Note: If any node consumes all of its sequence values, HP Vertica must perform a catalog
lock to obtain a new set of values. A catalog lock can be costly in terms of database
performance, since certain activities, such as data inserts, cannot occur until HP Vertica
releases the lock.
During a session, one node can use its allocated set of sequence values slowly, while another node
uses its values more quickly. Therefore, the value returned from NEXTVAL in one statement can
differ from the values returned in another statement executed on another node.
Regardless of the number of calls to NEXTVAL and CURRVAL, HP Vertica increments a
sequence only once per row. If multiple calls to NEXTVAL occur within the same row, the function
returns the same value. If sequences are used in join statements, HP Vertica increments a
sequence once for each composite row output by the join.
The current value of a sequence is calculated as follows:
l At the end of every statement, the state of all sequences used in the session is returned to the
initiator node.
l The initiator node calculates the maximum CURRVAL of each sequence across all states on all
nodes.
l This maximum value is used as CURRVAL in subsequent statements until another NEXTVAL
is invoked.
Sequence values in cache can be lost in the following situations:
l If a statement fails after NEXTVAL is called (thereby consuming a sequence value from the
cache), the value is lost.
l If a disconnect occurs (for example, dropped session), any remaining values in the cache that
have not been returned through NEXTVAL (unused) are lost.
To recover lost sequence values, you can run an ALTER SEQUENCE command to define a new
sequence number generator, which resets the counter to the correct value in the next session.
Note: An elastic projection (a segmented projection created when Elastic Cluster is enabled)
created with a modularhash segmentation expression uses hash instead.
The behavior of sequences across HP Vertica nodes is explained in the following examples.
Note: IDENTITY and AUTO_INCREMENT columns behave in a similar manner.
Example 1: The following example, which illustrates sequence distribution, assumes a 3-node
cluster with node01 as the initiator node.
First create a simple table called dist:
Working with Tables

CREATE TABLE dist (i INT, j VARCHAR);
Create a projection called oneNode and segment by column i on node01:
CREATE PROJECTION oneNode AS SELECT * FROM dist SEGMENTED BY i NODES node01;
Create a second projection called twoNodes and segment column x by modularhash on node02 and
node03:
CREATE PROJECTION twoNodes AS SELECT * FROM dist SEGMENTED BY MODULARHASH(i) NODES node0
2, node03;
Create a third projection called threeNodes and segment column i by modularhash on all nodes (1-
3):
CREATE PROJECTION threeNodes as SELECT * FROM dist SEGMENTED BY MODULARHASH(i) ALL NODES;
Insert some values:
COPY dist FROM STDIN;1|ONE
2|TWO
3|THREE
4|FOUR
5|FIVE
6|SIX
.
Query the STORAGE_CONTAINERS table to return the projections on each node:
SELECT node_name, projection_name, total_row_count FROM storage_containers;
node_name | projection_name | total_row_count-----------+-----------------+-------------
--
node0001 | oneNode | 6 --Contains rows with i=(1,2,3,4,5,6)
node0001 | threeNodes | 2 --Contains rows with i=(3,6)
node0002 | twoNodes | 3 --Contains rows with i=(2,4,6)
node0003 | twoNodes | 3 --Contains rows with i=(1,3,5)
(6 rows)
The following table shows the segmentation of rows for projection oneNode:
1 ONE Node012 TWO Node01
3 THREE Node01
4 FOUR Node01
5 FIVE Node01
6 SIX Node01
The following table shows the segmentation of rows for projection twoNodes:
Working with Tables

3 THREE Node03
4 FOUR Node02
5 FIVE Node03
6 SIX Node02
The following table shows the segmentation of rows for projection threeNodes:
3 THREE Node01
4 FOUR Node02
5 FIVE Node03
6 SIX Node01
Create a sequence and specify a cache of 10. The sequence will cache up to 10 values in memory
for performance. As per the CREATE SEQUENCE statement, the minimum value is 1 (only one
value can be generated at a time, for example, no cache).
Example 2: Create a sequence named s1 and specify a cache of 10:
CREATE SEQUENCE s1 cache 10;SELECT s1.nextval, s1.currval, s1.nextval, s1.currval, j FROM
oneNode;
nextval | currval | nextval | currval | j
---------+---------+---------+---------+-------
1 | 1 | 1 | 1 | ONE
2 | 2 | 2 | 2 | TWO
3 | 3 | 3 | 3 | THREE
4 | 4 | 4 | 4 | FOUR
5 | 5 | 5 | 5 | FIVE
6 | 6 | 6 | 6 | SIX
(6 rows)
The following table illustrates the current state of the sequence for that session. It holds the current
value, values remaining (the difference between the current value (6) and the cache (10)), and
cache activity. There is no cache activity on node02 or node03.
Sequence Cache State Node01 Node02 Node03
Current value 6 NO CACHE NO CACHE
Remainder 4 NO CACHE NO CACHE
Example 3: Return the current values from twoNodes:
SELECT s1.currval, j FROM twoNodes; currval | j
---------+-------
6 | ONE
6 | THREE
6 | FIVE
6 | TWO
6 | FOUR
6 | SIX
Working with Tables

(6 rows)
Example 4: Now call NEXTVAL from threeNodes. The assumption is that node02 holds the cache
before node03:
SELECT s1.nextval, j from threeNodes; nextval | j
---------+-------
101 | ONE
201 | TWO
7 | THREE
102 | FOUR
202 | FIVE
8 | SIX
(6 rows)
The following table illustrates the sequence cache state with values on node01, node02, and
node03:
Current value 8 102 202
Left 2 8 8
Example 5: Insert values from twoNodes into the destination table:
SELECT s1.currval, j FROM twoNodes; nextval | j
---------+-------
202 | ONE
202 | TWO
202 | THREE
202 | FOUR
202 | FIVE
202 | SIX
(6 rows)
The following table illustrates the sequence cache state:
Left 4 8 8
Example 6: The following command runs on node02 only:
SELECT s1.nextval, j FROM twoNodes WHERE i = 2; nextval | j
---------+-----
103 | TWO
(1 row)
Working with Tables

---------+-------
7 | ONE
8 | TWO
9 | THREE
10 | FOUR
301 | FIVE
302 | SIX
(6 rows)
Left 8 4 5
Example 11: In this example, twoNodes is the outer table and threeNodes is the inner table to a
merge join. threeNodes is resegmented as per twoNodes.
SELECT s1.nextval, j FROM twoNodes JOIN threeNodes ON twoNodes.i = threeNodes.i;
SELECT s1.nextval, j FROM oneNode; nextval | j
---------+-------
206 | ONE
107 | TWO
207 | THREE
108 | FOUR
208 | FIVE
109 | SIX
(6 rows)
Left 8 1 2
Example 12: This next example shows how sequences work with buddy projections.
--Same sessionDROP TABLE t CASCADE;
CREATE TABLE t (i INT, j varchar(20));
CREATE PROJECTION threeNodes AS SELECT * FROM t
SEGMENTED BY MODULARHASH(i) ALL NODES KSAFE 1;
COPY t FROM STDIN;
1|ONE
2|TWO
3|THREE
4|FOUR
Working with Tables

5|FIVE
6|SIX
.
SELECT node_name, projection_name, total_row_count FROM storage_containers;
node_name | projection_name | total_row_count
-----------+-----------------+-----------------
node01 | threeNodes_b0 | 2
(6 rows)
The following function call assumes that node02 is down. It is the same session. Node03 takes up
the work of node02:
SELECT s1.nextval, j FROM t; nextval | j
---------+-------
401 | ONE
402 | TWO
305 | THREE
403 | FOUR
404 | FIVE
306 | SIX
(6 rows)
Left 4 0 6
Example 13: This example starts a new session.
DROP TABLE t CASCADE;CREATE TABLE t (i INT, j VARCHAR);
CREATE PROJECTION oneNode AS SELECT * FROM t SEGMENTED BY i NODES node01;
CREATE PROJECTION twoNodes AS SELECT * FROM t SEGMENTED BY MODULARHASH(i) NODES node02, n
ode03;
CREATE PROJECTION threeNodes AS SELECT * FROM t SEGMENTED BY MODULARHASH(i) ALL NODES;
INSERT INTO t values (nextval('s1'), 'ONE');
SELECT * FROM t;
i | j
-----+-------
501 | ONE
(1 rows)
Working with Tables

Current value 501 NO CACHE NO CACHE
Left 9 0 0
Example 14:
INSERT INTO t SELECT s1.nextval, 'TWO' FROM twoNodes;
SELECT * FROM t; i | j
-----+-------
501 | ONE --stored in node01 for oneNode, node02 for twoNodes, node02 for threeNodes
601 | TWO --stored in node01 for oneNode, node03 for twoNodes, node01 for threeNodes
(2 rows)
Current value 501 601 NO CACHE
Left 9 9 0
Example 15:
INSERT INTO t select s1.nextval, 'TRE' from threeNodes;SELECT * FROM t;
i | j
-----+-------
502 | TRE --stored in node01 for oneNode, node03 for twoNodes, node03 for threeNodes
(4 rows)
Left 9 9 0
Example 16:
INSERT INTO t SELECT s1.currval, j FROM threeNodes WHERE i != 502;
SELECT * FROM t; i | j
-----+-------
Working with Tables

(7 rows)
Left 9 9 0
Example 17:
INSERT INTO t VALUES (s1.currval + 1, 'QUA');
SELECT * FROM t;
i | j
-----+-------
603 | QUA
(8 rows)
Left 9 9 0
See Also
l Sequence Privileges
l ALTER SEQUENCE
l CREATE TABLE
l Column-Constraint
l CURRVAL
l DROP SEQUENCE
Working with Tables

l GRANT (Sequence)
l NEXTVAL
Loading Sequences
You can use a sequence as part of creating a table. The sequence must already exist, and have
been instantiated using the NEXTVAL statement.
Creating and Instantiating a Sequence
The following example creates an ascending sequence called my_seq, starting at 100:
dbt=> create sequence my_seq START 100;
CREATE SEQUENCE
After creating a sequence, you must call the NEXTVAL function at least once in a session to create
a cache for the sequence and its initial value. Subsequently, use NEXTVAL to increment the
sequence. Use the CURRVAL function to get the current value.
The following NEXTVAL function instantiates the newly-created my_seq sequence and sets its first
number:
nextval
---------
100
(1 row)
If you call CURRVAL before NEXTVAL, the system returns an error:
dbt=> SELECT CURRVAL('my_seq');
ERROR 4700: Sequence my_seq has not been accessed in the session
Using a Sequence in an INSERT Command
Update sequence number values by calling the NEXTVAL function, which increments/decrements
the current sequence and returns the next value. Use CURRVAL to return the current value. These
functions can also be used in INSERT and COPY expressions.
The following example shows how to use a sequence as the default value for an INSERT
command:
CREATE TABLE customer2( ID INTEGER DEFAULT NEXTVAL('my_seq'),
lname VARCHAR(25),
fname VARCHAR(25),
membership_card INTEGER
Working with Tables

);
INSERT INTO customer2 VALUES (default,'Carr', 'Mary', 87432);
Now query the table you just created. The column named ID has been incremented by (1) again to
104:
SELECT * FROM customer2; ID | lname | fname | membership_card
-----+-------+-------+-----------------
104 | Carr | Mary | 87432
(1 row)
Dropping Sequences
Use the DROP SEQUENCE function to remove a sequence. You cannot drop a sequence:
l If other objects depend on the sequence. The CASCADE keyword is not supported.
l That is used in the default expression of a column until all references to the sequence are
removed from the default expression.
Example
The following command drops the sequence named my_sequence:
=> DROP SEQUENCE my_sequence;
Synchronizing Table Data with MERGE
The most convenient way to update, delete, and/or insert table data is using the MERGE statement,
where you can combine multiple operations in a single command. When you write a MERGE
statement, you specify the following:
l A target table—The master table that contains existing rows you want to update or insert into
using rows from another (source) table.
l A source table—The table that contains the new and/or changed rows you'll use to update the
target table.
l A search condition—The merge join columns, specified in the ON clause, that HP Vertica uses to
evaluate each row in the source table to update, delete, and/or insert rows into the target table.
l Additional filters that instruct HP Vertica what to do when the search condition is or is not met.
For example, when you use:
n WHEN MATCHED THEN UPDATE clause— HP Vertica updates and/or deletes existing rows in
the target table with data from the source table.
Working with Tables

Note: HP Vertica assumes the values in the merge join column are unique. If more than
one matching value is present in either the target or source table's join column, the MERGE
statement could fail with a run-time error. See Optimized Versus Non-optimized MERGE
for more information.
n WHEN NOT MATCHED THEN INSERT clause—HP Vertica inserts into the target table all rows
from the source table that do not match any rows in the target table.
Optimized Versus Non-Optimized MERGE
By default, HP Vertica prepares an optimized query plan to improve merge performance when the
MERGE statement and its tables meet certain criteria. If the criteria are not met, MERGE could run
without optimization or return a run-time error. This section describes scenarios for both optimized
and non-optimized MERGE.
Conditions for an Optimized MERGE
HP Vertica prepares an optimized query plan when all of the following conditions are true:
l The target table's join column has a unique or primary key constraint
l UPDATE and INSERT clauses include every column in the target table
l UPDATE and INSERT clause column attributes are identical
When the above conditions are not met, HP Vertica prepares a non-optimized query plan, and
MERGE runs with the same performance as in HP Vertica 6.1 and prior.
Note: The source table's join column does not require a unique or primary key constraint to be
eligible for an optimized query plan. Also, the source table can contain more columns than the
target table, as long as the UPDATE and INSERT clauses use the same columns and the column
attributes are the same.
How to determine if a MERGE statement is eligible for optimization
To determine whether a MERGE statement is eligible for optimization, prefix MERGE with the EXPLAIN
keyword and examine the plan's textual output. (See Viewing the MERGE Query Plan for
examples.) A a Semi path indicates the statement is eligible for optimization, whereas a Right
Outer path indicates the statement is ineligible and will run with the same performance as MERGE in
previous releases unless a duplicate merge join key is encountered at query run time.
About duplicate matching values in the join column
Even if the MERGE statement and its tables meet the required criteria for optimization, MERGE could
fail with a run-time error if there are duplicate values in the join column.
Working with Tables

When HP Vertica prepares an optimized query plan for a merge operation, it enforces strict
requirements for unique and primary key constraints in the MERGE statement's join columns. If you
haven't enforced constraints, MERGE fails under the following scenarios:
l Duplicates in the source table. If HP Vertica finds more than one matching value in the source
join column for a corresponding value in the target table, MERGE fails with a run-time error.
l Duplicates in the target table. If HP Vertica finds more than one matching value in target join
column for a corresponding value in the source table, and the target join column has a unique or
primary key constraint, MERGE fails with a run-time error. If the target join column has no such
constraint, the statement runs without error and without optimization.
Be aware that if you run MERGE multiple times using the same target and source table, each
statement run has the potential to introduce duplicate values into the join columns, such as if you
use constants in the UPDATE/INSERT clauses. These duplicates could cause a run-time error the
next time you run MERGE.
To avoid duplicate key errors, be sure to enforce constraints you declare to assure unique values in
the merge join column.
Examples
The examples that follow use a simple schema to illustrate some of the conditions under which HP
Vertica prepares or does not prepare an optimized query plan for MERGE:
CREATE TABLE target(a INT PRIMARY KEY, b INT, c INT) ORDER BY b,a;
CREATE TABLE source(a INT, b INT, c INT) ORDER BY b,a;
INSERT INTO target VALUES(1,2,3);
INSERT INTO target VALUES(2,4,7);
INSERT INTO source VALUES(3,4,5);
INSERT INTO source VALUES(4,6,9);
COMMIT;
Example of an optimized MERGE statement
HP Vertica can prepare an optimized query plan for the following MERGE statement because:
l The target table's join column (ON t.a=s.a) has a primary key constraint
l All columns in the target table (a,b,c) are included in the UPDATE and INSERT clauses
l Columns attributes specified in the UPDATE and INSERT clauses are identical
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a=s.a, b=s.b, c=s.c
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a,s.b,s.c);
OUTPUT
--------
2
(1 row)
Working with Tables

The output value of 2 indicates success and denotes the number of rows updated/inserted from the
source into the target.
Example of a non-optimized MERGE statement
In the next example, the MERGE statement runs without optimization because the column attributes
in the UPDATE/INSERT clauses are not identical. Specifically, the UPDATE clause includes constants
for columns s.a and s.c and the INSERT clause does not:
WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c - 1
WHEN NOT MATCHED THEN INSERT(a,b,c) VALUES(s.a,s.b,s.c);
To make the previous MERGE statement eligible for optimization, rewrite the statement as follows,
so the attributes in the UPDATE and INSERT clauses are identical:
WHEN MATCHED THEN UPDATE SET a=s.a + 1, b=s.b, c=s.c -1
WHEN NOT MATCHED THEN INSERT(a,b,c)
VALUES(s.a + 1, s.b, s.c - 1);
Troubleshooting the MERGE Statement
Here are a few things to consider so that HP Vertica will prepare an optimized query plan for MERGE,
or to check if you encounter run-time errors after you run the MERGE statement.
MERGE performance considerations
You can help improve the performance of MERGE operations by ensuring projections are designed for
optimal use. See Projection Design for Merge Operations.
You can also improve the chances that HP Vertica prepares an optimized query plan for a MERGE
statement by making sure the statement and its tables meet certain requirements. See the
following topics for more information:
l Optimized Versus Non-optimized MERGE
l Viewing the MERGE Plan
Duplicate values in the merge join key
HP Vertica assumes that the data you want to merge conforms with constraints you declare. To
avoid duplicate key errors, be sure to enforce declared constraints to assure unique values in the
merge join column. If the MERGE statement fails with a duplicate key error, you must correct your
data.
Also, be aware that if you run MERGE multiple times with the same target and source tables, you
could introduce duplicate values into the join columns, such as if you use constants in the
UPDATE/INSERT clauses. These duplicates could cause a run-time error.
Working with Tables

Using MERGE with sequences
If you are using named sequences, HP Vertica can perform a MERGE operation if you omit the
sequence from the query.
You cannot run MERGE on identity/auto-increment columns or on columns that have primary key or
foreign key referential integrity constraints, as defined in CREATE TABLE column-constraint syntax.
Dropping and Truncating Tables
HP Vertica provides two statements to manage tables: DROP TABLE and TRUNCATE TABLE.
You cannot truncate an external table.
Dropping Tables
Dropping a table removes its definition from the HP Vertica database. For the syntax details of this
statement, see DROP TABLE in the SQL Reference Manual.
To drop a table, use the statement as follows:
=> DROP TABLE IF EXISTS mytable;DROP TABLE
=> DROP TABLE IF EXISTS mytable; -- Doesn't exist
NOTICE: Nothing was dropped
DROP TABLE
You cannot use the CASCADE option to drop an external table, since the table is read-only, you
cannot remove any of its associated files.
Truncating Tables
Truncating a table removes all storage associated with the table, but preserves the table definitions.
Use TRUNCATE TABLE for testing purposes to remove all table data without having to recreate
projections when you reload table data. For the syntax details of this statement, see TRUNCATE
TABLE in the SQL Reference Manual. You cannot truncate an external table.
The TRUNCATE TABLE statement commits the entire transaction after statement execution, even
if truncating the table fails. You cannot roll back a TRUNCATE statement.
If the truncated table is a large single (fact) table containing pre-join projections, the projections
show zero (0) rows after the transaction completes and the table is ready for data reload.
If the table to truncate is a dimension table, drop the pre-join projections before executing the
TRUNCATE TABLE statement. Otherwise, the statement returns the following error:
Cannot truncate a dimension table with pre-joined projections
If the truncated table has out-of-date projections, those projections are cleared and marked up-to-
date after the TRUNCATE TABLE operation completes.
Working with Tables

TRUNCATE TABLE takes an O (Owner) lock on the table until the truncation process completes,
and the savepoint is then released.
Working with Tables

Working with Tables

About Constraints
Constraints specify rules on data that can go into a column. Some examples of constraints are:
l Primary or foreign key
l Uniqueness
l Not NULL
l Default values
l Automatically incremented values
l Values that are generated by the database
Use constraints when you want to ensure the integrity of your data in one or more columns, but be
aware that it is your responsibility to ensure data integrity. HP Vertica can use constraints to
perform optimizations (such as the optimized MERGE) that assume the data is consistent. Do not
define constraints on columns unless you expect to keep the data consistent.
About Constraints

Adding Constraints
Add constraints on one or more table columns using the following SQL commands:
l CREATE TABLE—Add a constraint on one or more columns.
l ALTER TABLE—Add or drop a constraint on one or more columns.
There are two syntax definitions you can use to add or change a constraint:
l column-constraint—Use this syntax when you add a constraint on a column definition in a
CREATE TABLE statement.
l table-constraint—Use this syntax when you add a constraint after a column definition in a
CREATE TABLE statement, or when you add, alter, or drop a constraint on a column using
ALTER TABLE.
HP Vertica recommends naming a constraint but it is optional; if you specify the CONSTRAINT
keyword, you must give a name for the constraint.
The examples that follow illustrate several ways of adding constraints. For additional details, see:
l Primary Key Constraints
l Foreign Key Constraints
l Unique Constraints
l Not NULL Constraints
Adding Column Constraints with CREATE TABLE
There are several ways to add a constraint on a column using CREATE TABLE:
l On the column definition using the CONSTRAINT keyword, which requires that you assign a
constraint name, in this example, dim1PK:
CREATE TABLE dim1 ( c1 INTEGER CONSTRAINT dim1PK PRIMARY KEY,
c2 INTEGER
);
l On the column definition, omitting the CONSTRAINT keyword. When you omit the
CONSTRAINT keyword, you cannot specify a constraint name:
CREATE TABLE dim1 ( c1 INTEGER PRIMARY KEY,
About Constraints

c2 INTEGER
);
l After the column definition, using the CONSTRAINT keyword and assigning a name, in this
example, dim1PK:
CREATE TABLE dim1 ( c1 INTEGER,
c2 INTEGER,
CONSTRAINT dim1pk PRIMARY KEY(c1)
);
l After the column definition, omitting the CONSTRAINT keyword:
c2 INTEGER,
PRIMARY KEY(c1)
);
Adding Two Constraints on a Column
To add more than one constraint on a column, specify the constraints one after another when you
create the table column. For example, the following statement enforces both not NULL and unique
constraints on the customer_key column, indicating that the column values cannot be NULL and
must be unique:
CREATE TABLE test1 ( id INTEGER NOT NULL UNIQUE,
...
);
Adding a Foreign Key Constraint on a Column
There are four ways to add a foreign key constraint on a column using CREATE TABLE. The
FOREIGN KEY keywords are not valid on the column definition, only after the column definition:
l On the column definition, use the CONSTRAINT and REFERENCES keywords and name the
constraint, in this example, fact1dim1PK. This example creates a column with a named foreign
key constraint referencing the table (dim1) with the primary key (c1):
CREATE TABLE fact1 ( c1 INTEGER CONSTRAINT fact1dim1FK REFERENCES dim1(c1),
c2 INTEGER
);
l On the column definition, omit the CONSTRAINT keyword and use the REFERENCES
About Constraints

keyword with the table name and column:
CREATE TABLE fact1 ( c1 INTEGER REFERENCES dim1(c1),
c2 INTEGER
);
l After the column definition, use the CONSTRAINT, FOREIGN KEY, and REFERENCES
keywords and name the constraint:
CREATE TABLE fact1 ( c1 INTEGER,
c2 INTEGER,
CONSTRAINT fk1 FOREIGN KEY(c1) REFERENCES dim1(c1)
);
l After the column definition, omitting the CONSTRAINT keyword:
CREATE TABLE fact1 ( c1 INTEGER,
c2 INTEGER,
FOREIGN KEY(c1) REFERENCES dim1(c1)
);
Each of the following ALTER TABLE statements adds a foreign key constraint on an existing
column, with and without using the CONSTRAINT keyword:
ALTER TABLE fact2
ADD CONSTRAINT fk1 FOREIGN KEY (c1) REFERENCES dim2(c1);
or
ALTER TABLE fact2 ADD FOREIGN KEY (c1) REFERENCES dim2(c1);
For additional details, see Foreign Key Constraints.
Adding Multicolumn Constraints
The following example defines a primary key constraint on multiple columns by first defining the
table columns (c1 and c2), and then specifying both columns in a PRIMARY KEY clause:
CREATE TABLE dim ( c1 INTEGER,
c2 INTEGER,
PRIMARY KEY (c1, c2)
);
To specify multicolumn (compound) primary keys, the following example uses CREATE TABLE to
define the columns. After creating the table, ALTER TABLE defines the compound primary key and
names it dim2PK:
About Constraints

c2 INTEGER,
c3 INTEGER NOT NULL,
c4 INTEGER UNIQUE
);
ALTER TABLE dim2
ADD CONSTRAINT dim2PK PRIMARY KEY (c1, c2);
In the next example, you define a compound primary key as part of the CREATE TABLE
statement. Then you specify the matching foreign key constraint to table dim2 using CREATE
TABLE and ALTER TABLE:
c2 INTEGER,
c4 INTEGER UNIQUE,
PRIMARY KEY (c1, c2)
);
CREATE TABLE fact2 (
c1 INTEGER,
c2 INTEGER,
c4 INTEGER UNIQUE
);
ALTER TABLE fact2
ADD CONSTRAINT fact2FK FOREIGN KEY (c1, c2) REFERENCES dim2(c1, c2);
Specify a foreign key constraint using a reference to the table that contains the primary key. In the
ADD CONSTRAINT clause, the REFERENCES column names are optional. The following
ALTER TABLE statement is equivalent to the previous ALTER TABLE statement:
ALTER TABLE fact2ADD CONSTRAINT fact2FK FOREIGN KEY (c1, c2) REFERENCES dim2;
Adding Constraints on Tables with Existing Data
When you add a constraint on a column with existing data, HP Vertica does not check to ensure
that the column does not contain invalid values. If your data does not conform to the declared
constraints, your queries could yield unexpected results.
Use ANALYZE_CONSTRAINTS to check for constraint violations in your column. If you find
violations, use the ALTER COLUMN SET/DROP parameters of the ALTER TABLE statement to
apply or remove a constraint on an existing column.
Adding and Changing Constraints on Columns Using
ALTER TABLE
The following example uses ALTER TABLE to add a column (b) with not NULL and default 5
constraints to a table (test6):
About Constraints

CREATE TABLE test6 (a INT);
ALTER TABLE test6 ADD COLUMN b INT DEFAULT 5 NOT NULL;
Use ALTER TABLE with the ALTER COLUMN and SET NOT NULL clauses to add the constraint
on column a in table test6:
ALTER TABLE test6 ALTER COLUMN a SET NOT NULL;
Adding and Dropping NOT NULL Column Constraints
Use the SET NOT NULL or DROP NOT NULL clause to add or remove a not NULL column
constraint. Use these clauses to ensure that the column has the proper constraints when you have
added or removed a primary key constraint on a column, or any time you want to add or remove the
not NULL constraint.
Note: A PRIMARY KEY constraint includes a not NULL constraint, but if you drop the
PRIMARY KEY constraint on a column, the not NULL constraint remains on that column.
Examples
ALTER TABLE T1 ALTER COLUMN x SET NOT NULL;
ALTER TABLE T1 ALTER COLUMN x DROP NOT NULL;
For more information, see Altering Table Definitions.
Enforcing Constraints
To maximize query performance, HP Vertica checks for primary key and foreign key violations
when loading into the fact table of a pre-join projection. For more details, see Enforcing Primary
Key and Foreign Key Constraints.
HP Vertica checks for not NULL constraint violations when loading data, but it does not check for
unique constraint violations.
To enforce constraints, load data without committing it using the COPY with NO COMMIT option
and then perform a post-load check using the ANALYZE_CONSTRAINTS function. If constraint
violations are found, you can roll back the load because you have not committed it. For more
details, see Detecting Constraint Violations.
See Also
l ALTER TABLE
l CREATE TABLE
About Constraints

l COPY
l ANALYZE_CONSTRAINTS
Primary Key Constraints
A primary key (PK) is a single column or combination of columns (called a compound key) that
uniquely identifies each row in a table. A primary key constraint contains unique, non-null values.
When you apply the primary key constraint, the not NULL and unique constraints are added
implicitly. You do not need to specify them when you create the column. However, if you remove
the primary key constraint, the not NULL constraint continues to apply to the column. To remove
the not NULL constraint after removing the primary key constraint, use the ALTER COLUMN
DROP NOT NULL parameter of the ALTER TABLE statement (see Dropping Constraints).
The following statement adds a primary key constraint on the employee_id field:
CREATE TABLE employees ( employee_id INTEGER PRIMARY KEY
);
Alternatively, you can add a primary key constraint after the column is created:
CREATE TABLE employees ( employee_id INTEGER
);
ALTER TABLE employees
ADD PRIMARY KEY (employee_id);
Note: If you specify a primary key constraint using ALTER TABLE, the system returns the
following message, which is informational only. The primary key constraint is added to the
designated column.
Note: WARNING 2623: Column "employee_id" definition changed to NOT NULL
Primary keys can also constrain more than one column:
CREATE TABLE employees ( employee_id INTEGER,
employee_gender CHAR(1),
PRIMARY KEY (employee_id, employee_gender)
);
Foreign Key Constraints
A foreign key (FK) is a column that is used to join a table to other tables to ensure referential
integrity of the data. A foreign key constraint requires that a column contain only values from the
primary key column on a specific dimension table.
About Constraints

A column with a foreign key constraint can contain NULL values if it does not also have a not NULL
constraint, even though the NULL value does not appear in the PRIMARY KEY column of the
dimension table. This allows rows to be inserted into the table even if the foreign key is not yet
known.
In HP Vertica, the fact table's join columns are required to have foreign key constraints in order to
participate in pre-join projections. If the fact table join column has a foreign key constraint, outer
join queries produce the same result set as inner join queries.
You can add a FOREIGN KEY constraint solely by referencing the table that contains the primary
key. The columns in the referenced table do not need to be specified explicitly.
Examples
Create a table called inventory to store inventory data:
CREATE TABLE inventory ( date_key INTEGER NOT NULL,
warehouse_key INTEGER NOT NULL,
...
);
Create a table called warehouse to store warehouse information:
CREATE TABLE warehouse ( warehouse_key INTEGER NOT NULL PRIMARY KEY,
warehouse_name VARCHAR(20),
...
);
To ensure referential integrity between the inventory and warehouse tables, define a foreign key
constraint called fk_inventory_warehouse on the inventory table that references the warehouse
table:
ALTER TABLE inventory ADD CONSTRAINT fk_inventory_warehouse FOREIGN KEY(warehouse_key)
REFERENCES warehouse(warehouse_key);
In this example, the inventory table is the referencing table and the warehouse table is the
referenced table.
You can also create the foreign key constraint in the CREATE TABLE statement that creates the
warehouse table, eliminating the need for the ALTER TABLE statement. If you do not specify one
or more columns, the PRIMARY KEY of the referenced table is used:
CREATE TABLE warehouse (warehouse_key INTEGER NOT NULL PRIMARY KEY REFERENCES warehouse,
warehouse_name VARCHAR(20),
...);
A foreign key can also constrain and reference multiple columns. The following example uses
CREATE TABLE to add a foreign key constraint to a pair of columns:
About Constraints

CREATE TABLE t1 ( c1 INTEGER PRIMARY KEY,
c2 INTEGER,
c3 INTEGER,
FOREIGN KEY (c2, c3) REFERENCES other_table (c1, c2)
);
The following two examples use ALTER TABLE to add a foreign key constraint to a pair of
columns. When you use the CONSTRAINT keyword, you must specify a constraint name:
ALTER TABLE t
ADD FOREIGN KEY (a, b) REFERENCES other_table(c, d);
ALTER TABLE t
ADD CONSTRAINT fk_cname FOREIGN KEY (a, b) REFERENCES other_table(c, d);
Note: The FOREIGN KEY keywords are valid only after the column definition, not on the
column definition.
Unique Constraints
Unique constraints ensure that the data contained in a column or a group of columns is unique with
respect to all rows in the table.
Note: If you add a unique constraint to a column and then insert data into that column that is
not unique with respect to other values in that column, HP Vertica inserts the data anyway. If
your data does not conform to the declared constraints, your queries could yield unexpected
results. Use ANALYZE_CONSTRAINTS to check for constraint violations.
There are several ways to add a unique constraint on a column. If you use the CONSTRAINT
keyword, you must specify a constraint name. The following example adds a UNIQUE constraint
on the product_key column and names it product_key_UK:
CREATE TABLE product ( product_key INTEGER NOT NULL CONSTRAINT product_key_UK UNIQUE,
...
);
HP Vertica recommends naming constraints, but it is optional:
CREATE TABLE product ( product_key INTEGER NOT NULL UNIQUE,
...
);
You can specify the constraint after the column definition, with and without naming it:
CREATE TABLE product ( product_key INTEGER NOT NULL,
...,
CONSTRAINT product_key_uk UNIQUE (product_key)
About Constraints

);
CREATE TABLE product (
...,
UNIQUE (product_key)
);
You can also use ALTER TABLE to specify a unique constraint. This example names the
constraint product_key_UK:
ALTER TABLE product ADD CONSTRAINT product_key_UK UNIQUE (product_key);
You can use CREATE TABLE and ALTER TABLE to specify unique constraints on multiple
columns. If a unique constraint refers to a group of columns, separate the column names using
commas. The column listing specifies that the combination of values in the indicated columns is
unique across the whole table, though any one of the columns need not be (and ordinarily isn't)
unique:
c2 INTEGER,
c3 INTEGER,
UNIQUE (c1, c2)
);
Not NULL Constraints
A not NULL constraint specifies that a column cannot contain a null value. This means that new
rows cannot be inserted or updated unless you specify a value for this column.
You can apply the not NULL constraint when you create a column using the CREATE TABLE
statement. You can also add or drop the not NULL constraint to an existing column using,
respectively:
l ALTER TABLE t ALTER COLUMN x SET NOT NULL
l ALTER TABLE t ALTER COLUMN x DROP NOT NULL
The not NULL constraint is implicitly applied to a column when you add the PRIMARY KEY (PK)
constraint. When you designate a column as a primary key, you do not need to specify the not
NULL constraint.
However, if you remove the primary key constraint, the not NULL constraint still applies to the
column. Use the ALTER COLUMN x DROP NOT NULL parameter of the ALTER TABLE
statement to drop the not NULL constraint after dropping the primary key constraint.
The following statement enforces a not NULL constraint on the customer_key column, specifying
that the column cannot accept NULL values.
CREATE TABLE customer ( customer_key INTEGER NOT NULL,
About Constraints

...
);
About Constraints

Dropping Constraints
To drop named constraints, use the ALTER TABLE command.
The following example drops the constraint factfk2:
=> ALTER TABLE fact2 DROP CONSTRAINT fact2fk;
To drop constraints that you did not assign a name to, query the system table TABLE_
CONSTRAINTS, which returns both system-generated and user-named constraint names:
=> SELECT * FROM TABLE_CONSTRAINTS;
If you do not specify a constraint name, HP Vertica assigns a constraint name that is unique to that
table. In the following output, note the system-generated constraint name C_PRIMARY and the user-
defined constraint name fk_inventory_date:
-[ RECORD 1 ]--------+--------------------------constraint_id | 45035996273707984
constraint_name | C_PRIMARY
constraint_schema_id | 45035996273704966
constraint_key_count | 1
foreign_key_count | 0
table_id | 45035996273707982
foreign_table_id | 0
constraint_type | p
-[ ... ]---------+--------------------------
-[ RECORD 9 ]--------+--------------------------
constraint_id | 45035996273708016
constraint_name | fk_inventory_date
constraint_schema_id | 0
constraint_key_count | 1
foreign_key_count | 1
table_id | 45035996273708014
foreign_table_id | 45035996273707994
constraint_type | f
Once you know the name of the constraint, you can then drop it using the ALTER TABLE
command. (If you do not know the table name, use table_id to retrieve table_name from the ALL_
TABLES table.)
Notes
l Primary key constraints cannot be dropped if there is another table with a foreign key constraint
that references the primary key.
l A foreign key constraint cannot be dropped if there are any pre-join projections on the table.
l Dropping a primary or foreign key constraint does not automatically drop the not NULL constraint
on a column. You need to manually drop this constraint if you no longer want it.
About Constraints

See Also
l ALTER TABLE
About Constraints

Enforcing Primary Key and Foreign Key Constraints
Enforcing Primary Key Constraints
HP Vertica does not enforce the uniqueness of primary keys when they are loaded into a table.
However, when data is loaded into a table with a pre-joined dimension, or when the table is joined to
a dimension table during a query, a key enforcement error could result if there is not exactly one
dimension row that matches each foreign key value.
Note: : Consider using sequences or auto-incrementing columns for primary key columns,
which guarantees uniqueness and avoids the constraint enforcement problem and associated
overhead. For more information, see Using Sequences.
Enforcing Foreign Key Constraints
A table's foreign key constraints are enforced during data load only if there is a pre-join projection
that has that table as its anchor table. If there no such pre-join projection exists, then it is possible
to load data that causes a constraint violation. Subsequently, a constraint violation error can
happen when:
l An inner join query is processed.
l An outer join is treated as an inner join due to the presence of foreign key.
l A new pre-join projection anchored on the table with the foreign key constraint is refreshed.
Detecting Constraint Violations Before You Commit
Data
To detect constraint violations, you can load data without committing it using the COPY statement
with the NO COMMIT option, and then perform a post-load check using the ANALYZE_
CONSTRAINTS function. If constraint violations exist, you can roll back the load because you
have not committed it. For more details, see Detecting Constraint Violations.
About Constraints

Detecting Constraint Violations
The ANALYZE_CONSTRAINTS() function analyzes and reports on constraint violations within the
current schema search path. To check for constraint violations:
l Pass an empty argument to check for violations on all tables within the current schema
l Pass a single table argument to check for violations on the specified table
l Pass two arguments, a table name and a column or list of columns, to check for violations in
those columns
Given the following inputs, HP Vertica returns one row, indicating one violation, because the same
primary key value (10) was inserted into table t1 twice:
CREATE TABLE t1(c1 INT);
ALTER TABLE t1 ADD CONSTRAINT pk_t1 PRIMARY KEY (c1);
CREATE PROJECTION t1_p (c1) AS SELECT *
FROM t1 UNSEGMENTED ALL NODES;
INSERT INTO t1 values (10);
INSERT INTO t1 values (10); --Duplicate primary key value
x
SELECT ANALYZE_CONSTRAINTS('t1');
-[ RECORD 1 ]---+--------
Schema Name | public
Table Name | t1
Column Names | c1
Constraint Name | pk_t1
Constraint Type | PRIMARY
Column Values | ('10')
If the second INSERT statement above had contained any different value, the result would have
been 0 rows (no violations).
In the following example, create a table that contains three integer columns, one a unique key and
one a primary key:
CREATE TABLE table_1(
a INTEGER,
b_UK INTEGER UNIQUE,
c_PK INTEGER PRIMARY KEY
);
Issue a command that refers to a nonexistent table and column:
SELECT ANALYZE_CONSTRAINTS('a_BB');
ERROR: 'a_BB' is not a table name in the current search path
Issue a command that refers to a nonexistent column:
About Constraints

SELECT ANALYZE_CONSTRAINTS('table_1','x');
ERROR 41614: Nonexistent columns: 'x '
Insert some values into table table_1 and commit the changes:
INSERT INTO table_1 values (1, 1, 1);
COMMIT;
Run ANALYZE_CONSTRAINTS on table table_1. No constraint violations are reported:
SELECT ANALYZE_CONSTRAINTS('table_1');
(No rows)
Insert duplicate unique and primary key values and run ANALYZE_CONSTRAINTS on table
table_1 again. The system shows two violations: one against the primary key and one against the
unique key:
INSERT INTO table_1 VALUES (1, 1, 1);
COMMIT;
-[ RECORD 1 ]---+----------
Table Name | table_1
Column Names | b_UK
Constraint Name | C_UNIQUE
Constraint Type | UNIQUE
-[ RECORD 2 ]---+----------
Column Names | c_PK
Constraint Name | C_PRIMARY
The following command looks for constraint validations on only the unique key in the table table_1,
qualified with its schema name:
=> SELECT ANALYZE_CONSTRAINTS('public.table_1', 'b_UK');
-[ RECORD 1 ]---+---------
Column Names | b_UK
(1 row)
About Constraints

The following example shows that you can specify the same column more than once; ANALYZE_
CONSTRAINTS, however, returns the violation only once:
SELECT ANALYZE_CONSTRAINTS('table_1', 'c_PK, C_PK');
-[ RECORD 1 ]---+----------
Column Names | c_PK
The following example creates a new table, table_2, and inserts a foreign key and different
(character) data types:
CREATE TABLE table_2 (
x VARCHAR(3),
y_PK VARCHAR(4),
z_FK INTEGER REFERENCES table_1(c_PK));
Alter the table to create a multicolumn unique key and multicolumn foreign key and create
superprojections:
ALTER TABLE table_2
ADD CONSTRAINT table_2_multiuk PRIMARY KEY (x, y_PK);
WARNING 2623: Column "x" definition changed to NOT NULL
WARNING 2623: Column "y_PK" definition changed to NOT NULL
The following command inserts a missing foreign key (0) into table dim_1 and commits the
changes:
INSERT INTO table_2 VALUES ('r1', 'Xpk1', 0);
COMMIT;
Checking for constraints on the table table_2 in the public schema detects a foreign key
violation:
=> SELECT ANALYZE_CONSTRAINTS('public.table_2');
-[ RECORD 1 ]---+----------
Column Names | z_FK
Constraint Name | C_FOREIGN
Constraint Type | FOREIGN
Now add a duplicate value into the unique key and commit the changes:
About Constraints

COMMIT;
Checking for constraint violations on table table_2 detects the duplicate unique key error:
-[ RECORD 1 ]---+----------------
Column Names | z_FK
-[ RECORD 2 ]---+----------------
Column Names | x, y_PK
Constraint Name | table_2_multiuk
Column Values | ('r1', 'Xpk1')
Create a table with multicolumn foreign key and create the superprojections:
CREATE TABLE table_3(
z_fk1 VARCHAR(3),
z_fk2 VARCHAR(4));
ALTER TABLE table_3
ADD CONSTRAINT table_3_multifk FOREIGN KEY (z_fk1, z_fk2)
REFERENCES table_2(x, y_PK);
Insert a foreign key that matches a foreign key in table table_2 and commit the changes:
INSERT INTO table_3 VALUES ('r1', 'Xpk1');
COMMIT;
Checking for constraints on table table_3 detects no violations:
(No rows)
Add a value that does not match and commit the change:
INSERT INTO table_3 VALUES ('r1', 'NONE');
COMMIT;
Checking for constraints on table dim_2 detects a foreign key violation:
-[ RECORD 1 ]---+----------------
About Constraints

(5 rows)
To quickly clean up your database, issue the following command:
DROP TABLE table_1 CASCADE;
Fixing Constraint Violations
When HP Vertica finds duplicate primary key or unique values at run time, use the DISABLE_
DUPLICATE_KEY_ERROR function to suppress error messaging. Queries execute as though no
constraints are defined on the schema and the effects are session scoped.
Caution: When called, DISABLE_DUPLICATE_KEY_ERROR suppresses data integrity
checking and can lead to incorrect query results. Use this function only after you insert
duplicate primary keys into a dimension table in the presence of a pre-join projection. Correct
the violations and reenable integrity checking with REENABLE_DUPLICATE_KEY_ERROR.
The following series of commands create a table named dim and the corresponding projection:
CREATE TABLE dim (pk INTEGER PRIMARY KEY, x INTEGER);
CREATE PROJECTION dim_p (pk, x) AS SELECT * FROM dim ORDER BY x UNSEGMENTED ALL NODES;
The next two statements create a table named fact and the pre-join projection that joins fact to
dim.
CREATE TABLE fact(fk INTEGER REFERENCES dim(pk));
CREATE PROJECTION prejoin_p (fk, pk, x) AS SELECT * FROM fact, dim WHERE pk=fk ORDER BY x
;
The following statements load values into table dim. The last statement inserts a duplicate primary
key value of 1:
INSERT INTO dim values (1,1);INSERT INTO dim values (2,2);
INSERT INTO dim values (1,2); --Constraint violation
COMMIT;
Table dim now contains duplicate primary key values, but you cannot delete the violating row
because of the presence of the pre-join projection. Any attempt to delete the record results in the
following error message:
ROLLBACK: Duplicate primary key detected in FK-PK join Hash-Join (x dim_p), value 1
About Constraints

In order to remove the constraint violation (pk=1), use the following sequence of commands, which
puts the database back into the state just before the duplicate primary key was added.
To remove the violation:
1. Save the original dim rows that match the duplicated primary key:
CREATE TEMP TABLE dim_temp(pk integer, x integer);
INSERT INTO dim_temp SELECT * FROM dim WHERE pk=1 AND x=1; -- original dim row
2. Temporarily disable error messaging on duplicate constraint values:
SELECT DISABLE_DUPLICATE_KEY_ERROR();
Caution: Remember that running the DISABLE_DUPLICATE_KEY_ERROR function
suppresses the enforcement of data integrity checking.
3. Remove the original row that contains duplicate values:
DELETE FROM dim WHERE pk=1;
4. Allow the database to resume data integrity checking:
SELECT REENABLE_DUPLICATE_KEY_ERROR();
5. Reinsert the original values back into the dimension table:
INSERT INTO dim SELECT * from dim_temp;COMMIT;
6. Validate your dimension and fact tables.
If you receive the following error message, it means that the duplicate records you want to
delete are not identical. That is, the records contain values that differ in at least one column
that is not a primary key; for example, (1,1) and (1,2).
ROLLBACK: Delete: could not find a data row to delete (data integrity violation?)
The difference between this message and the rollback message in the previous example is that
a fact row contains a foreign key that matches the duplicated primary key, which has been
inserted. A row with values from the fact and dimension table is now in the pre-join projection.
In order for the DELETE statement (Step 3 in the following example) to complete successfully,
extra predicates are required to identify the original dimension table values (the values that are
in the pre-join).
About Constraints

This example is nearly identical to the previous example, except that an additional INSERT
statement joins the fact table to the dimension table by a primary key value of 1:
INSERT INTO dim values (1,1);INSERT INTO dim values (2,2);
INSERT INTO fact values (1); -- New insert statement joins fact with dim on primary
key value=1
INSERT INTO dim values (1,2); -- Duplicate primary key value=1
COMMIT;
To remove the violation:
1. Save the original dim and fact rows that match the duplicated primary key:
CREATE TEMP TABLE dim_temp(pk integer, x integer);CREATE TEMP TABLE fact_temp(fk inte
ger);
INSERT INTO dim_temp SELECT * FROM dim WHERE pk=1 AND x=1; -- original dim row
INSERT INTO fact_temp SELECT * FROM fact WHERE fk=1;
2. Temporarily suppresses the enforcement of data integrity checking:
SELECT DISABLE_DUPLICATE_KEY_ERROR();
3. Remove the duplicate primary keys. These steps also implicitly remove all fact rows with the
matching foreign key.
4. Remove the original row that contains duplicate values:
DELETE FROM dim WHERE pk=1 AND x=1;
Note: The extra predicate (x=1) specifies removal of the original (1,1) row, rather than the
newly inserted (1,2) values that caused the violation.
5. Remove all remaining rows:
DELETE FROM dim WHERE pk=1;
6. Reenable integrity checking:
SELECT REENABLE_DUPLICATE_KEY_ERROR();
7. Reinsert the original values back into the fact and dimension table:
About Constraints

INSERT INTO dim SELECT * from dim_temp;
INSERT INTO fact SELECT * from fact_temp;
COMMIT;
8. Validate your dimension and fact tables.
Reenabling Error Reporting
If you ran DISABLE_DUPLICATE_KEY_ERROR to suppress error reporting while fixing duplicate
key violations, you can get incorrect query results going forward. As soon as you fix the violations,
run the REENABLE_DUPLICATE_KEY_ERROR function to restore the default behavior of error
reporting.
The effects of this function are session scoped.
About Constraints

Working with Table Partitions
HP Vertica supports data partitioning at the table level, which divides one large table into smaller
pieces. Partitions are a table property that apply to all projections for a given table.
A common use for partitions is to split data by time. For instance, if a table contains decades of
data, you can partition it by year, or by month, if the table has a year of data.
Partitions can improve parallelism during query execution and enable some other optimizations.
Partitions segregate data on each node to facilitate dropping partitions. You can drop older data
partitions to make room for newer data.
Tip: When a storage container has data for a single partition, you can discard that storage
location (DROP_LOCATION) after dropping the partition using the DROP_PARTITION()
function.
Differences Between Partitioning and
Segmentation
There is a distinction between partitioning at the table level and segmenting a projection (hash or
range):
l Partitioning—defined by the table for fast data purges and query performance. Table
partitioning segregates data on each node. You can drop partitions.
l Segmentation—defined by the projection for distributed computing. Segmenting distributes
projection data across multiple nodes in a cluster. Different projections for the same table have
identical partitioning, but can have different segmentation clauses. See Projection Segmentation
in the Concepts Guide.
Both methods of storing and organizing data provide opportunities for parallelism during query
processing. See also Partitioning and Segmenting Data.
Partition Operations
The basic operations for working with partitions are as follow:
l Defining Partitions
l Bulk Loading Data, and engaging in other normal operations
l Forcing data partitioning, if needed
l Moving partitions to another table as part of archiving historical data

l Dropping Partitions to drop existing partitions
l Displaying partition metadata with the PARTITIONS system table, to display one row per
partition key, per ROS container.
HP Vertica provides the following functions that let you manage your partitions and obtain additional
information about them. See the Partition Management Functions in the SQL Reference Manual.
See Also
l Partitioning, Repartitioning, and Reorganizing Tables
Defining Partitions
The first step in defining data partitions is to establish the relationship between the data and
partitions. To illustrate, consider the following table called trade, which contains unpartitioned data
for the trade date (tdate), ticker symbol (tsymbol), and time (ttime).
Table 1: Unpartitioned data
tdate | tsymbol | ttime
------------+---------+----------
2008-01-02 | AAA | 13:00:00
2009-02-04 | BBB | 14:30:00
2010-09-18 | AAA | 09:55:00
2009-05-06 | AAA | 11:14:30
2008-12-22 | BBB | 15:30:00
(5 rows)
If you want to discard data once a year, a logical choice is to partition the table by year. The partition
expression PARTITION BY EXTRACT(year FROM tdate)creates the partitions shown in Table 2:
Table 2: Data partitioned by year
2008 2009 2010
tdate tsymbol ttime
---------+---------+---------
01/02/08 | AAA | 13:00:00
12/22/08 | BBB | 15:30:00
tdate tsymbol ttime
---------+---------+---------
02/04/09 | BBB | 14:30:00
05/06/09 | AAA | 11:14:30
tdate tsymbol ttime
---------+---------+---------
09/18/10 | AAA | 09:55:00
Unlike some databases, which require you to explicitly define partition boundaries in the CREATE
TABLE statement, HP Vertica selects a partition for each row based on the result of a partitioning
expression provided in the CREATE TABLE statement. Partitions do not have explicit names
associated with them. Internally, HP Vertica creates a partition for each distinct value in the
PARTITION BY expression.
After you specify a partition expression, HP Vertica processes the data by applying the partition
expression to each row and then assigning partitions.

The following syntax generates the partitions for this example, with the results shown in Table 3. It
creates a table called trade, partitioned by year. For additional information, see CREATE TABLE
in the SQL Reference Manual.
CREATE TABLE trade (
tdate DATE NOT NULL,
tsymbol VARCHAR(8) NOT NULL,
ttime TIME)
PARTITION BY EXTRACT (year FROM tdate);
CREATE PROJECTION trade_p (tdate, tsymbol, ttime) AS
SELECT * FROM trade
ORDER BY tdate, tsymbol, ttime UNSEGMENTED ALL NODES;
INSERT INTO trade VALUES ('01/02/08' , 'AAA' , '13:00:00');
INSERT INTO trade VALUES ('02/04/09' , 'BBB' , '14:30:00');
INSERT INTO trade VALUES ('12/22/08' , 'BBB' , '15:30:00');
Table 3: Partitioning Expression and Results
Partitioning By Year and Month
To partition by both year and month, you need a partition expression that pads the month out to two
digits so the partition keys appear as:
201101
201102
201103
...
201111
201112
You can use the following partition expression to partition the table using the year and month:
PARTITION BY EXTRACT(year FROM tdate)*100 + EXTRACT(month FROM tdate)

Restrictions on Partitioning Expressions
l The partitioning expression can reference one or more columns from the table.
l The partitioning expression cannot evaluate to NULL for any row, so do not include columns that
allow a NULL value in the CREATE TABLE..PARTITION BY expression.
l Any SQL functions in the partitioning expression must be immutable, meaning that they return
the exact same value regardless of when they are invoked, and independently of session or
environment settings, such as LOCALE. For example, you cannot use the TO_CHAR function
in a partition expression, because it depends on locale settings, or the RANDOM function, since
it produces different values at each invocation.
l HP Vertica meta-functions cannot be used in partitioning expressions.
l All projections anchored on a table must include all columns referenced in the PARTITION BY
expression; this allows the partition to be calculated.
l You cannot modify partition expressions once a partitioned table is created. If you want modified
partition expressions, create a new table with a new PARTITION BY clause, and then
INSERT...SELECT from the old table to the new table. Once your data is partitioned the way
you want it, you can drop the old table.
Best Practices for Partitioning
n While HP Vertica supports a maximum of 1024 partitions, few, if any, organizations will need to
approach that maximum. Fewer partitions are likely to meet your business needs, while also
ensuring maximum performance. Many customers, for example, partition their data by month,
bringing their partition count to 12. HP Vertica recommends you keep the number of partitions
between 10 and 20 to achieve excellent performance.
l Do not apply partitioning to tables used as dimension tables in pre-join projections. You can
apply partitioning to tables used as large single (fact) tables in pre-join projections.
l For maximum performance, do not partition projections on LONG VARBINARY and LONG
VARCHAR columns.
Dropping Partitions
Use the DROP_PARTITION function to drop a partition. Normally, this is a fast operation that
discards all ROS containers that contain data for the partition.
Occasionally, a ROS container contains rows that belong to more than one partition. For example,
this can happen after a MERGE_PARTITIONS operation. In this case, HP Vertica performs a split
operation to avoid discarding too much data. HP Vertica tries to keep data from different partitions
segregated into different ROS containers, but there are a small number of exceptions. For instance,
the following operations can result in a ROS container with mixed partitions:

l MERGE_PARTITIONS, which merges ROS containers that have data belonging to partitions in
a specified partition key range
l Refresh and recovery operations can generate ROS containers with mixed partitions under
some conditions. See Auto Partitioning.
The number of partitions that contain data is restricted by the number of ROS containers that can
comfortably exist in the system.
In general, if a ROS container has data that belongs to n+1 partitions and you want to drop a
specific partition, the DROP_PARTITION operation:
1. Forces the partition of data into two containers where
n one container holds the data that belongs to the partition that is to be dropped
n another container holds the remaining n partitions
2. Drops the specified partition.
DROP_PARTITION forces a moveout if there is data in the WOS (WOS is not partition aware).
DROP_PARTITION acquires an exclusive lock on the table to prevent DELETE | UPDATE |
INSERT | COPY statements from affecting the table, as well as any SELECT statements issued at
SERIALIZABLE isolation level.
Users must have USAGE privilege on schema that contains the table.
DROP_PARTITION operations cannot be performed on tables with projections that are not up to
date (have not been refreshed).
DROP_PARTITION fails if you do not set the optional third parameter to true and it encounters
ROS's that do not have partition keys.
Examples
Using the example schema in Defining Partitions, the following command explicitly drops the 2009
partition key from table trade:
SELECT DROP_PARTITION('trade', 2009);
DROP_PARTITION
-------------------
Partition dropped
(1 row)
Here, the partition key is specified:
SELECT DROP_PARTITION('trade', EXTRACT('year' FROM '2009-01-01'::date));
DROP_PARTITION
-------------------
Partition dropped

(1 row)
The following example creates a table called dates and partitions the table by year:
CREATE TABLE dates (year INTEGER NOT NULL,
month VARCHAR(8) NOT NULL)
PARTITION BY year * 12 + month;
The following statement drops the partition using a constant for Oct 2010 (2010*12 + 10 = 24130):
SELECT DROP_PARTITION('dates', '24130');
DROP_PARTITION
-------------------
Partition dropped
(1 row)
Alternatively, the expression can be placed in line: SELECT DROP_PARTITION('dates', 2010*12
+ 10);
The following command first reorganizes the data if it is unpartitioned and then explicitly drops the
2009 partition key from table trade:
SELECT DROP_PARTITION('trade', 2009, false, true);
DROP_PARTITION
-------------------
Partition dropped
(1 row)
See Also
l DROP_PARTITION
l MERGE_PARTITIONS
Partitioning and Segmenting Data
Partitioning and segmentation have completely separate functions in HP Vertica, and opposite
goals regarding data localization. Since other databases often use the terms interchangeably, it is
important to know the differences.
l Segmentation defines how data is spread among cluster nodes. The goal is to distribute data
evenly across multiple database nodes so that all nodes can participate in query execution.
l Partitioning specifies how data is organized within individual nodes. Partitioning attempts to
introduce hot spots within the node, providing a convenient way to drop data and reclaim the disk
space.

Note: Segmentation is defined by the CREATE PROJECTION statement, and partitioning is
defined by the CREATE TABLE statement. Logically, the partition clause is applied after the
segmentation clause. See the SQL Reference Manual for details.
To further illustrate the differences, partitioning data by year makes sense if you intend to retain and
drop data at the granularity of a year. On the other hand, segmenting the data by year would be
inefficient, because the node holding data for the current year would likely answer far more queries
than the other nodes.
The following diagram illustrates the flow of segmentation and partitioning on a four-node database
cluster:
1. Example table data
2. Data segmented by HASH(order_id)
3. Data segmented by hash across four nodes
4. Data partitioned by year on a single node
While partitioning occurs on all four nodes, the illustration shows partitioned data on one node for
simplicity.

See Also
l Reclaiming Disk Space From Deleted Records
l Avoiding Resegmentation During Joins
l Projection Segmentation
l CREATE PROJECTION
l CREATE TABLE
Partitioning and Data Storage
Partitions and ROS Containers
l Data is automatically split into partitions during load / refresh / recovery operations.
l The Tuple Mover maintains physical separation of partitions.
l Each ROS container contains data for a single partition, though there can be multiple ROS
containers for a single partition.
Partition Pruning
When a query predicate includes one more more columns in the partitioning clause, queries look
only at relevant ROS containers. See Partition Elimination for details.
Managing Partitions
HP Vertica provides various options to let you manage and monitor the partitions you create.
PARTITIONS system table
You can display partition metadata, one row per partition key, per ROS container, by querying the
PARTITIONS system table.
Given a projection named p1, with three ROS containers, the PARTITIONS table returns three
rows:
=> SELECT
PARTITION_KEY,
PROJECTION_NAME,
ROS_ID,
ROS_SIZE_BYTES,

ROS_ROW_COUNT,
NODE_NAME
FROM partitions;
PARTITION_KEY | PROJECTION_NAME | ROS_ID | ROS_SIZE_BYTES | ROS_ROW_COUNT |
NODE_NAME
---------------+------------------+-------------------+----------------+---------------+-
----------------------
2008 | trade_p_node0001 | 45035996273740461 | 90 | 1 |
node0001
2007 | trade_p_node0001 | 45035996273740477 | 99 | 2 |
node0001
2006 | trade_p_node0001 | 45035996273740493 | 99 | 2 |
node0001
(3 rows)
MERGE_PARTITIONS() function
The MERGE_PARTITIONS() function merges partitions between the specified values to a single
ROS container and takes the following form:
MERGE_PARTITIONS ( table_name , partition_key_from , partition_key_to )
The edge values of the partition key are included in the range, and partition_key_from must be
less than or equal to partition_key_to. Inclusion of partitions in the range is based on the
application of less than(<)/greater than(>) operators of the corresponding data type. If partition_
key_from is the same as partition_key_to, all ROS containers of the partition key are merged
into one ROS.
Note: No restrictions are placed on a partition key's data type.
Users must have USAGE privilege on schema that contains the table.
The following series of statements show how to merge partitions in a table called T1:
=> SELECT MERGE_PARTITIONS('T1', '200', '400');
=> SELECT MERGE_PARTITIONS('T1', '800', '800');
=> SELECT MERGE_PARTITIONS('T1', 'CA', 'MA');
=> SELECT MERGE_PARTITIONS('T1', 'false', 'true');
=> SELECT MERGE_PARTITIONS('T1', '06/06/2008', '06/07/2008');
=> SELECT MERGE_PARTITIONS('T1', '02:01:10', '04:20:40');
=> SELECT MERGE_PARTITIONS('T1', '06/06/2008 02:01:10', '06/07/2008 02:01:10');
=> SELECT MERGE_PARTITIONS('T1', '8 hours', '1 day 4 hours 20 seconds');
PARTITION_TABLE() function
The PARTITION_TABLE() function physically separates partitions into separate containers. Only
ROS containers with more than one distinct value participate in the split.
The following example creates a simple table called states and partitions data by state.

=> CREATE TABLE states (year INTEGER NOT NULL,
state VARCHAR NOT NULL)
PARTITION BY state;
=> CREATE PROJECTION states_p (state, year) AS
SELECT * FROM states
ORDER BY state, year UNSEGMENTED ALL NODES;
Now call the PARTITION_TABLE function to partition table states:
=> SELECT PARTITION_TABLE('states');
PARTITION_TABLE
-------------------------------------------------------
partition operation for projection 'states_p_node0004'
(1 row)
Notes
There are just a few more things worth mentioning to help you manage your partitions:
l To prevent too many ROS containers, be aware that delete operations must open all the
containers; thus, ideally create fewer than 20 partitions and avoid creating more than 50.
You can use the MERGE_PARTITIONS() function to merge old partitions to a single ROS
container.
l You cannot use non-deterministic functions in a PARTITION BY expression. One example is
TIMESTAMP WITH TIME ZONE, because the value depends on user settings.
l A dimension table in a pre-join projection cannot be partitioned.
Partitioning, Repartitioning, and Reorganizing
Tables
Using the ALTER TABLE statement with its PARTITION BY syntax and the optional REORGANIZE
keyword partitions or re-partitions a table according to the partition-clause that you define in the
statement. HP Vertica immediately drops any existing partition keys when you execute the
statement.
You can use the PARTITION BY and REORGANIZE keywords separately or together. However, you
cannot use these keywords with any other ALTER TABLE clauses.
Partition-clause expressions are limited in the following ways:

l Your partition-clause must calculate a single non-null value for each row. You can reference
multiple columns, but each row must return a single value.
l You can specify leaf expressions, functions, and operators in the partition clause expression.
l All leaf expressions in the partition clause must be either constants or columns of the table.
l Aggregate functions and queries are not permitted in the partition-clause expression.
l SQL functions used in the partition-clause expression must be immutable.
Partitioning or re-partitioning tables requires USAGE privilege on the schema that contains the
table.
Reorganizing Data After Partitioning
Partitioning is not complete until you reorganize the data. The optional REORGANIZE keyword
completes table partitioning by assigning partition keys. You can use REORGANIZE with
PARTITION BY, or as the only keyword in the ALTER TABLE statement for tables that were
previously altered with the PARTITION BY modifier, but were not reorganized with the
REORGANIZE keyword.
If you specify the REORGANIZE keyword, data is partitioned immediately to the new schema as a
background task.
Tip: As a best practice, HP recommends that you reorganize the data while partitioning the
table, using PARTITION BY with the REORGANIZE keyword. If you do not specify
REORGANIZE, performance for queries, DROP_PARTITION() operations, and node
recovery could be degraded until the data is reorganized. Also, without reorganizing existing
data, new data is stored according to the new partition expression, while the existing data
storage remains unchanged.
Monitoring Reorganization
When you use the ALTER TABLE .. REORGANIZE, the operation reorganizes the data in the
background.
You can monitor details of the reorganization process by polling the following system tables:
l V_MONITOR.PARTITION_STATUS displays the fraction of each table that is partitioned
correctly.
l V_MONITOR.PARTITION_REORGANIZE_ERRORS logs any errors issued by the
background REORGANIZE process.
l V_MONITOR.PARTITIONS displays NULLS in the partition_key column for any ROS's that
have not been reorganized.

Note: The corresponding foreground process to ALTER TABLE ... REORGANIZE is
PARTITION_TABLE().
Auto Partitioning
HP Vertica attempts to keep data from each partition stored separately. Auto partitioning occurs
when data is written to disk, such as during COPY DIRECT or moveout operations.
Separate storage provides two benefits: Partitions can be dropped quickly, and partition elimination
can omit storage that does not need to need not to participate in a query plan.
Note: If you use INSERT...SELECT in a partitioned table, HP Vertica sorts the data before
writing it to disk, even if the source of the SELECT has the same sort order as the destination.
Examples
The examples that follow use this simple schema. First create a table named t1 and partition the
data on the c1 column:
CREATE TABLE t1 ( c1 INT NOT NULL,
c2 INT NOT NULL)
SEGMENTED BY c1 ALL NODES
PARTITION BY c2;
Create two identically-segmented buddy projections:
CREATE PROJECTION t1_p AS SELECT * FROM t1 SEGMENTED BY HASH(c1) ALL NODES OFFSET 0; CREA
TE PROJECTION t1_p1 AS SELECT * FROM t1 SEGMENTED BY HASH(c1) ALL NODES OFFSET 1;
Now insert some data:
INSERT INTO t1 VALUES(10,15);INSERT INTO t1 VALUES(20,25);
INSERT INTO t1 VALUES(30,35);
INSERT INTO t1 VALUES(40,45);
Query the table to verify the inputs:
SELECT * FROM t1; c1 | c2
----+----
10 | 15
20 | 25
30 | 35
40 | 45
(4 rows)
Now perform a moveout operation on the projections in the table:

SELECT DO_TM_TASK('moveout','t1'); do_tm_task
--------------------------------
moveout for projection 't1_p1'
moveout for projection 't1_p'
(1 row)
Query the PARTITIONS system table, and you'll see that the four partition keys reside on two
nodes, each in its own ROS container (see the ros_id column). The PARTITION BY clause was
used on column c2, so HP Vertica auto partitioned the input values during the COPY operation:
SELECT partition_key, projection_name, ros_id, ros_size_bytes, ros_row_count, node_name F
ROM PARTITIONS WHERE projection_name like 't1_p1';
partition_key | projection_name | ros_id | ros_size_bytes | ros_row_count |
node_name
---------------+-----------------+-------------------+----------------+---------------+--
-----------------
15 | t1_p1 | 49539595901154617 | 78 | 1 | n
ode0002
25 | t1_p1 | 54043195528525081 | 78 | 1 | n
ode0003
35 | t1_p1 | 54043195528525069 | 78 | 1 | n
ode0003
45 | t1_p1 | 49539595901154605 | 79 | 1 | n
ode0002
(4 rows)
HP Vertica does not auto partition when you refresh with the same sort order. If you create a new
projection, HP Vertica returns a message telling you to refresh the projections; for example:
CREATE PROJECTION t1_p2 AS SELECT * FROM t1 SEGMENTED BY HASH(c1) ALL NODES OFFSET 2;
WARNING: Projection <public.t1_p2> is not available for query processing. Execute the
select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes
must be up before starting a refresh.
Run the START_REFRESH function:
SELECT START_REFRESH(); start_Refresh
----------------------------------------
Starting refresh background process.
(1 row)
Query the PARTITIONS system table again. The partition keys now reside in two ROS containers,
instead of four, which you can tell by looking at the values in the ros_id column. The ros_row_
count column holds the number of rows in the ROS container:
SELECT partition_key, projection_name, ros_id, ros_size_bytes, ros_row_count, node_name F
ROM PARTITIONS WHERE projection_name like 't1_p2';
partition_key | projection_name | ros_id | ros_size_bytes | ros_row_count |

node_name
---------------+-----------------+-------------------+----------------+---------------+--
-----------------
15 | t1_p2 | 54043195528525121 | 80 | 2 | n
ode0003
25 | t1_p2 | 58546795155895541 | 77 | 2 | n
ode0004
35 | t1_p2 | 58546795155895541 | 77 | 2 | n
ode0004
45 | t1_p2 | 54043195528525121 | 80 | 2 | n
ode0003
(4 rows)
The following command more specifically queries ROS information for the partitioned tables. In this
example, the query counts two ROS containers each on two different nodes for projection t1_p2:
SELECT ros_id, node_name, COUNT(*) FROM PARTITIONS WHERE projection_name LIKE 't1_p2' GRO
UP BY ros_id, node_name;
ros_id | node_name | COUNT
-------------------+-----------+-------
54043195528525121 | node0003 | 2
58546795155895541 | node0004 | 2
(2 rows)
This command returns a result of four ROS containers on two different nodes for projection t1_p1:
SELECT ros_id,node_name, COUNT(*) FROM PARTITIONS WHERE projection_name LIKE 't1_p1' GROU
P BY ros_id, node_name;
ros_id | node_name | COUNT
-------------------+-----------+-------
49539595901154605 | node0002 | 1
49539595901154617 | node0002 | 1
54043195528525069 | node0003 | 1
54043195528525081 | node0003 | 1
(4 rows)
See Also
l
l
l
Eliminating Partitions
If the ROS containers of partitioned tables are not needed, HP Vertica can eliminate the containers
from being processed during query execution. To eliminate ROS containers, HP Vertica compares
query predicates to partition-related metadata.

Each ROS partition expression column maintains the minimum and maximum values of data stored
in that ROS, and HP Vertica uses those min/max values to potentially eliminate ROS containers
from query planning. Partitions that cannot contain matching values are not scanned. For example,
if a ROS does not contain data that satisfies a given query predicate, the optimizer eliminates
(prunes) that ROS from the query plan. After non-participating ROS containers have been
eliminated, queries that use partitioned tables run more quickly.
Note: Partition pruning occurs at query run time and requires a query predicate on the
partitioning column.
Assume a table that is partitioned by year (2007, 2008, 2009) into three ROS containers, one for
each year. Given the following series of commands, the two ROS containers that contain data for
2007 and 2008 fall outside the boundaries of the requested year (2009) and get eliminated.
=> CREATE TABLE ... PARTITION BY EXTRACT(year FROM date);=> SELECT ... WHERE date = '12-2
-2009';
On any database that has been upgraded from version 3.5, or earlier, ROS containers are ineligible
for partition elimination because they do not contain the minimum/maximum partition key values
required. These ROS containers need to be recreated or merged by the Tuple Mover.
Making Past Partitions Eligible for Elimination
The following procedure lets you make past partitions eligible for elimination. The easiest way to
guarantee that all ROS containers are eligible is to:
1. Create a new fact table with the same projections as the existing table.
2. Use INSERT..SELECT to populate the new table.
3. Drop the original table and rename the new table.
If there is not enough disk space for a second copy of the fact table, an alternative is to:

1. Verify that the Tuple Mover has finished all post-upgrade work; for example, when the following
command shows no mergeout activity:
=> SELECT * FROM TUPLE_MOVER_OPERATIONS;
2. Identify which partitions need to be merged to get the ROS minimum/maximum values by
running the following command:
=> SELECT DISTINCT table_schema, projection_name, partition_key FROM partitions p
LEFT OUTER JOIN vs_ros_min_max_values v
ON p.ros_id = v.delid
WHERE v.min_value IS null;
3. Insert a record into each partition that has ineligible ROS containers and commit.
4. Delete each inserted record and commit again.
At this point, the Tuple Mover automatically merges ROS containers from past partitions.
Verifying the ROS Merge
1. Query the TUPLE_MOVER_OPERATIONS table again:
=> SELECT * FROM TUPLE_MOVER_OPERATIONS;
2. Check again for any partitions that need to be merged:
=> SELECT DISTINCT table_schema, projection_name, partition_key FROM partitions p
LEFT OUTER JOIN vs_ros_min_max_values v
ON p.ros_id = v.rosid
WHERE v.min_value IS null;
Examples
Assume a table that is partitioned by time and will use queries that restrict data on time.
CREATE TABLE time ( tdate DATE NOT NULL,
tnum INTEGER)
PARTITION BY EXTRACT(year FROM tdate);
CREATE PROJECTION time_p (tdate, tnum) AS
SELECT * FROM time
ORDER BY tdate, tnum UNSEGMENTED ALL NODES;
Note: Projection sort order has no effect on partition elimination.

INSERT INTO time VALUES ('03/15/04' , 1); INSERT INTO time VALUES ('03/15/05' , 2);
INSERT INTO time VALUES ('03/15/06' , 3);
INSERT INTO time VALUES ('03/15/06' , 4);
The data inserted in the previous series of commands would be loaded into three ROS containers,
one per year, since that is how the data is partitioned:
SELECT * FROM time ORDER BY tnum; tdate | tnum
------------+------
2004-03-15 | 1 --ROS1 (min 03/01/04, max 03/15/04)
2005-03-15 | 2 --ROS2 (min 03/15/05, max 03/15/05)
2006-03-15 | 3 --ROS3 (min 03/15/06, max 03/15/06)
2006-03-15 | 4 --ROS3 (min 03/15/06, max 03/15/06)
(4 rows)
Here's what happens when you query the time table:
l In the this query, HP Vertica can eliminate ROS2 because it is only looking for year 2004:
=> SELECT COUNT(*) FROM time WHERE tdate = '05/07/2004';
l In the next query, HP Vertica can eliminate both ROS1 and ROS3:
=> SELECT COUNT(*) FROM time WHERE tdate = '10/07/2005';
l The following query has an additional predicate on the tnum column for which no
minimum/maximum values are maintained. In addition, the use of logical operator OR is not
supported, so no ROS elimination occurs:
=> SELECT COUNT(*) FROM time WHERE tdate = '05/07/2004' OR tnum = 7;
Moving Partitions
You can move partitions from one table to another using the MOVE_PARTITIONS_TO_TABLE
function. Use this function as part of creating offline archives of older partitions. By moving
partitions from one table to an intermediate table, you can then create a backup of the new table,
and drop the partition. If you need the historical data later, you can restore the archived partitions,
described in Restoring Archived Partitions.
If the target table does not exist, the MOVE_PARTITIONS_TO_TABLE function creates a table
definition using the CREATE TABLE statement with its LIKE clause. Creating a table with the LIKE
clause is performed as a DDL operation. HP Vertica does not copy any data from the source table,
and the new table is not connected to its source in any way. The CREATE TABLE statement with
the LIKE clause does not copy contraints, automatic values (such as a sequences and identity
values), or a default values. Corresponding columns will exist in the new table with the same type
as the source table, but the columns will not have constraints or automatic values.

Archiving Steps
These are the steps required to archive partitions:
1. Prepare and move the partitions with the MOVE_PARTITIONS_TO_TABLE function
2. Create an object-level snapshot of the intermediate table
3. Drop the intermediate table
The next sections describe the archiving steps.
Preparing and Moving Partitions
Before moving partitions to another table, be sure to:
l Create a separate schema for the intermediate table
l Check that the name you plan to use does not conflict with an existing table name
l Use a name that represents the partition values you are moving
l Keep each partition in a different backup table
When you have created a separate schema for the intermediate table, call the MOVE_
PARTITIONS_TO_TABLE function.
If you call MOVE_PARTITIONS_TO_TABLE and the destination table does not exist, the function will
create the table automatically:
VMART=> SELECT MOVE_PARTITIONS_TO_TABLE (
'prod_trades',
'200801',
'200801',
'partn_backup.trades_200801');
MOVE_PARTITIONS_TO_TABLE
---------------------------------------------------------------------------
1 distinct partition values moved at epoch 15. Effective move epoch: 14.
(1 row)
Creating a Snapshot of the Intermediate Table
Creating an object-level snapshot of the intermediate table containing the partitions you want to
archive requires a vbr.py configuration file.
These are the two steps to create an object-level snapshot of an intermediate table so you can then
drop the table:

1. As a best practice, HP Vertica recommends that you create a full database snapshot first,
since you can only restore object-level snapshots into the original database. However, creating
a full snapshot is not a requirement.
2. Create an object-level snapshot of the intermediate table.
For details of setting up backup hosts, creating a configuration file, and taking a snapshot, see
Backing Up and Restoring the Database.
Copying the Config File to the Storage Location
When vbr.py creates the partition snapshot, it copies it to the archive storage location
automatically.
HP Vertica recommends that you also copy the configuration file for the partition snapshot to the
storage location. You can do this automatically by entering y to the Backup vertica
configurations? question when creating the configuration file for the snapshot.
Drop the Intermediate Table
You can drop the intermediate table into which you moved partitions to archive, as described in
Dropping and Truncating Tables. Dropping the intermediate table maintains database K-safety,
keeping a minimum K+1 copies of the data, and more if additional projections exist.
See Also
l
Restoring Archived Partitions
You can restore partitions that you previously moved to an intermediate table, archived as an
object-level snapshot, and then dropped.
Note: Restoring an archived partition requires that the original table definition has not changed
since the partition was archived and dropped. If you have changed the table definition, you can
only restore an archived partition using INSERT/SELECT statements, which are not described
here.
These are the steps to restoring archived partitions:
1. Restore the snapshot of the intermediate table you saved when you moved one or more
partitions to archive (see Moving Partitions).
2. Move the restored partitions from the intermediate table to the original table.
3. Drop the intermediate table.

See Also
l

Bulk Loading Data
This section describes different methods for bulk loading data into an HP Vertica database using
the COPY statement. In its basic form, use COPY as follows:
COPY to_table FROM data_source
The COPY statement loads data from a file stored on the host or client (or in a data stream) into a
database table. You can pass the COPY statement many different parameters to define various
options such as:
l The format of the incoming data
l Metadata about the data load
l Which parser COPY should use
l Load data over parallel load streams
l How to transform data as it is loaded
l How to handle errors
HP Vertica's hybrid storage model provides a great deal of flexibility for loading and managing data.
Bulk Loading Data

See the remaining sections here for other options, and the COPY statement in the SQL Reference
Manual for syntax details.
Checking Data Format Before or After Loading
HP Vertica expects all data files being loaded to be in the Unicode UTF-8 format. You can load
ASCII data, which is UTF-8 compatible. Character sets like ISO 8859-1 (Latin1), are incompatible
with UTF-8 and are not supported.
Before loading data from text files, you can use several UNIX tools to ensure that your data is in
UTF-8 format. The file command reports the encoding of any text files.
To check the type of a data file, use the file command. For example:
$ file Date_Dimension.tbl
Bulk Loading Data

Date_Dimension.tbl: ASCII text
The file command could indicate ASCII TEXT even though the file contains multibyte characters.
To check for multibyte characters in an ASCII file, use the wc command. For example:
$ wc Date_Dimension.tbl
1828 5484 221822 Date_Dimension.tbl
If the wc command returns an error such as Invalid or incomplete multibyte or wide
character, the data file is using an incompatible character set.
This example describes files that are not UTF-8 data files. Two text files have filenames starting
with the string data. To check their format, use the file command as follows:
$ file data*
data1.txt: Little-endian UTF-16 Unicode text
data2.txt: ISO-8859 text
The results indicate that neither of the files is in UTF-8 format.
Converting Files Before Loading Data
To convert files before loading them into HP Vertica, use the iconv UNIX command. For example,
to convert the data2.txt file from the previous example, use the iconv command as follows:
iconv -f ISO88599 -t utf-8 data2.txt > data2-utf8.txt
See the man pages for file and iconv for more information.
Checking UTF-8 Compliance After Loading Data
After loading data, use the ISUTF8 function to verify that all of the string-based data in the table is in
UTF-8 format. For example, if you loaded data into a table named nametable that has a VARCHAR
column named name, you can use this statement to verify that all of the strings are UTF-8 encoded:
=> SELECT name FROM nametable WHERE ISUTF8(name) = FALSE;
If all of the strings are in UTF-8 format, the query should not return any rows.
Performing the Initial Database Load
To perform the initial database load, use COPY with its DIRECT parameter from vsql.
Tip: HP Vertica supports multiple schema types. If you have a star schema, load the smaller
tables before you load the largest tables.
Bulk Loading Data

Only a superuser can use the COPY statement to bulk load data. Two exceptions to the superuser
requirement are to:
1. Run COPY to load from a stream on the host (such as STDIN) rather than a file (see Streaming
Data via JDBC).
2. Use the COPY statement with the FROM LOCAL option.
A non-superuser can also perform a standard batch insert using a prepared statement, which
invokes COPY to load data as a background task.
Extracting Data From an Existing Database
If possible, export the data in text form to a local file or attached disk. When working with large
amounts of load data (> 500GB), HP recommends that you test the load process using smaller load
files as described in Configuration Procedure to avoid compatibility or file formatting issues.
ETL products typically use ODBC or JDBC to extract data, which gives them program-level access
to modify load file column values, as needed.
Database systems typically provide a variety of export methods.
Tip: To export data from an Oracle database, run a SELECT query in Oracle’s SQL*Plus
command line query tool using a specified column delimiter, suppressed headers, and so forth.
Redirect the output to a local file.
Smaller tables generally fit into a single load file. Split any large tables into 250-500GB load files.
For example, a 10 TB fact table will require 20-40 load files to maintain performance.
Checking for Delimiter Characters in Load Data
The default delimiter for the COPY statement is a vertical bar (|). Before loading your data, make
sure that no CHAR(N) or VARCHAR(N) data values include the delimiter character.
To test for the existence of a specific character in a column, use a query such as this:
SELECT COUNT(*) FROM T WHERE X LIKE '%|%'
If only a few rows contain |, you can eliminate them from the load file using a WHERE clause and
load them separately using a different delimiter.
Tip: : For loading data from an Oracle database, use a WHERE clause to avoid problem rows
in the main load file, and the negated WHERE clause with REGEX_REPLACE for problem
rows.
Bulk Loading Data

Moving Data From an Existing Database to HP Vertica
Nodes
To move data from an existing database to HP Vertica, consider using:
l USB 2.0 (or possibly SATA) disks
l A fast local network connection
Deliver chunks of data to the different HP Vertica nodes by connecting the transport disk or by
writing files from network copy.
Loading From a Local Hard Disk
USB 2.0 disks can deliver data at about 30 MB per second, or 108 GB per hour. USB 2.0 disks are
easy to use for transporting data from Linux to Linux. Set up an ext3 filesystem on the disk and
write large files there. Linux 2.6 has USB plug-and-play support, so a USB 2.0 disk is instantly
usable on various Linux systems.
For other UNIX variants, if there is no common filesystem format available, use the disk without a
filesystem to copy a single large file. For example:
$ cp bigfile /dev/sdc1
Even without a filesystem on the disk, plug-and-play support still works on Linux to provide a device
node for the disk. To find out the assigned device, plug in the disk and enter:
$ dmesg | tail -40
SATA disks are usually internal, but can be external, or unmounted safely if they are internal.
Loading Over the Network
A 1Gbps (gigabits per second) network can deliver about 50 MB/s, or 180GB/hr. HP Vertica can
load about 30-50GB/hour/node for a 1-Ksafe projection design. Therefore, you should use a
dedicated 1Gbps LAN. Using a LAN with a performance that is < 1Gbps will be proportionally
slower. HP Vertica recommends not loading data across an external network, because the delays
over distance slow down the TCP protocol to a small fraction of its available bandwidth, even
without competing traffic.
Note: The actual load rates you obtain can be higher or lower depending on the properties of
the data, number of columns, number of projections, and hardware and network speeds. Load
speeds can be further improved by using multiple parallel streams.
Bulk Loading Data

Loading From Windows
Use NTFS for loading files directly from Windows to Linux. Although Red Hat Linux as originally
installed can read Windows FAT32 file systems, this is not recommended.
Using Load Scripts
You can write and run a load script for the COPY statement using a simple text-delimited file
format. For information about other load formats see Specifying a COPY Parser. HP Vertica
recommends that you load the smaller tables before the largest tables. To check data formats
before loading, see Checking Data Format Before or After Loading.
Using Absolute Paths in a Load Script
Unless you are using the COPY FROM LOCAL statement, using COPY on a remote client requires an
absolute path for a data file. You cannot use relative paths on a remote client. For a load script, you
can use vsql variables to specify the locations of data files relative to your Linux working directory.
To use vsql variables to specify data file locations:
1. Create a vsql variable containing your Linux current directory.
set t_pwd `pwd`
2. Create another vsql variable that uses a path relative to the Linux current directory variable for
a specific data file.
set input_file ''':t_pwd'/Date_Dimension.tbl''
3. Use the second variable in the COPY statement:
COPY Date_Dimension FROM :input_file DELIMITER '|';
4. Repeat steps 2 and 3 to load all data files.
Note: COPY FROM LOCAL does not require an absolute path for data files. You can use
paths that are relative to the client's running directory.
Running a Load Script
You can run a load script on any host, as long as the data files are on that host.
Bulk Loading Data

1. Change your Linux working directory to the location of the data files.
$ cd /opt/vertica/doc/retail_example_database
2. Run the Administration Tools.
$ /opt/vertica/bin/admintools
3. Connect to the database.
4. Run the load script.
Using COPY and COPY LOCAL
The COPY statement bulk loads data into an HP Vertica database. You can initiate loading one or
more files or pipes on a cluster host. You can load directly from a client system, too, using the
COPY statement with its FROM LOCAL option.
COPY lets you load parsed or computed data. Parsed data is from a table or schema using one or
more columns, and computed data is calculated with a column expression on one or more column
values.
COPY invokes different parsers depending on the format you specify:
l Delimited text (the default parser format, but not specified)
l Native binary (NATIVE) (not supported with COPY LOCAL)
l Native varchar (NATIVE VARCHAR) (not supported with COPY LOCAL)
l Fixed-width data (FIXEDWIDTH)
COPY has many options, which you combine to make importing data flexible. For detailed syntax
for the various options see the SQL Reference Manual. For example:
For this option... See this section...
Read uncompressed data, or data compressed with GZIP or BZIP. Specifying COPY FROM
Options
Insert data into the WOS (memory) or directly into the ROS (disk). Choosing a Load Method
Set parameters such as data delimiters and quote characters for
the entire load operation or, for specific columns.
Loading UTF-8 Format
Data
Transform data before inserting it into the database. Transforming Data During
Loads
Bulk Loading Data

Copying Data From an HP Vertica Client
Use COPY LOCAL to load files on a client to the HP Vertica database. For example, to copy a GZIP
file from your local client, use a command such as this:
=> COPY store.store_dimension FROM LOCAL '/usr/files/my_data/input_file' GZIP;
You can use a comma-separated list to load multiple files of the same compression type. COPY
local then concatenates the files into a single file, so you cannot combine files with different
compression types in the list. When listing multiple files, be sure to specify the type of every input
file, such as BZIP, as shown:
COPY simple_table FROM LOCAL 'input_file.bz' BZIP, 'input_file.bz' BZIP;
You can load on a client (LOCAL) from STDIN, as follows:
COPY simple_table FROM LOCAL STDIN;
Transforming Data During Loads
To promote a consistent database and reduce the need for scripts to transform data at the source,
HP Vertica lets you transform data as part of loading it into the target database. Transforming data
during loads is useful for computing values to insert into a target database column from other
columns in the source database.
To transform data during load, use the following syntax to specify the target column for which you
want to compute values, as an expression:
COPY [ [database-name.]schema-name.]table [( [Column as Expression] / column[FORMAT '
format']
[ ,...])]
FROM ...
Understanding Transformation Requirements
When transforming data during loads, the COPY statement must contain at least one parsed column.
The parsed column can be a FILLER column. (See Ignoring Columns and Fields in the Load File for
more information about using fillers.)
Specify only RAW data in the parsed column source data. If you specify nulls in that RAW data, the
columns are evaluated with the same rules as for SQL statement expressions.
You can intersperse parsed and computed columns in a COPY statement.
Bulk Loading Data

Loading FLOAT Values
HP Vertica parses floating-point values internally. COPY does not require you to cast floats
explicitly, unless you need to transform the values for another reason. For more information, see
DOUBLE PRECISION (FLOAT).
Using Expressions in COPY Statements
The expression you use in a COPY statement can be as simple as a single column or as complex as
a case expression for multiple columns. You can specify multiple columns in a COPY expression,
and have multiple COPY expressions refer to the same parsed column. You can specify COPY
expressions for columns of all supported data types.
COPY expressions can use many HP Vertica-supported SQL functions, operators, constants,
NULLs, and comments, including these functions:
l Date/time
l Formatting Functions
l String
l Null-handling
l System information
COPY expressions cannot use SQL meta functions (HP Vertica-specific), analytic functions,
aggregate functions, or computed columns.
For computed columns, all parsed columns in the expression must be listed in the COPY statement.
Do not specify FORMAT or RAW in the source data for a computed column.
Expressions used in a COPY statement can contain only constants. The return data type of the
expression must be coercible to that of the target column. Parsed column parameters are also
coerced to match the expression.
Handling Expression Errors
Errors that occur in COPY expressions are treated as SQL errors, not parse errors. When a parse
errors occur, COPY rejects the row and adds a copy of the row to the rejected data file. COPY also
adds a message to the exceptions file describing why the row was rejected. For example, <DBMS_
SHORT> does not implicitly cast data types during parsing. If a type mismatch occurs between the
data being loaded and a column type (such as loading a text value for a FLOAT column), COPY
rejects the row, but continues processing.
COPY expression errors are treated as SQL errors and cause the entire load to rollback. For
example, if the COPY statement has an expression with a transform function, and a syntax error
occurs in the function, the entire load is rolled back. The HP Vertica-specific log file will include the
SQL error message, but the reason for the rollback is not obvious without researching the log.
Bulk Loading Data

Transformation Example
Following is a small transformation example.
1. Create a table and corresponding projection.
CREATE TABLE t ( year VARCHAR(10),
month VARCHAR(10),
day VARCHAR(10),
k timestamp
);
CREATE PROJECTION tp (
year,
month,
day,
k)
AS SELECT * from t;
2. Use COPY to copy the table, computing values for the year, month, and day columns in the
target database, based on the timestamp columns in the source table.
3. Load the parsed column, timestamp, from the source data to the target database.
COPY t(year AS TO_CHAR(k, 'YYYY'),
month AS TO_CHAR(k, 'Month'),
day AS TO_CHAR(k, 'DD'),
k FORMAT 'YYYY-MM-DD') FROM STDIN NO COMMIT;
2009-06-17
1979-06-30
2007-11-26
.
4. Select the table contents to see the results:
SELECT * FROM t;
year | month | day | k
------+-----------+-----+---------------------
2009 | June | 17 | 2009-06-17 00:00:00
1979 | June | 30 | 1979-06-30 00:00:00
2007 | November | 26 | 2007-11-26 00:00:00
(3 rows)
Deriving Table Columns From Data File Columns
You can use COPY to derive a table column from the data file to load.
The next example illustrates how to use the year, month, and day columns from the source input to
derive and load the value for the TIMESTAMP column in the target database.
Bulk Loading Data

1. Create a table and corresponding projection:
=> CREATE TABLE t (k TIMESTAMP);=> CREATE PROJECTION tp (k) AS SELECT * FROM t;
2. Use COPY with the FILLER keyword to skip the year, month, and day columns from the
source file.
=> COPY t(year FILLER VARCHAR(10),
month FILLER VARCHAR(10),
day FILLER VARCHAR(10),
k AS TO_DATE(YEAR || MONTH || DAY, 'YYYYMMDD') )
FROM STDIN NO COMMIT;
>> 2009|06|17
>> 1979|06|30
>> 2007|11|26
>> .
3. Select from the copied table to see the results:
=> SELECT * FROM t;
k
---------------------
2009-06-17 00:00:00
1979-06-30 00:00:00
2007-11-26 00:00:00
(3 rows)
See also Using Sequences for how to generate an auto-incrementing value for columns.
See the COPY statement in the SQL Reference Manual for further information.
Specifying COPY FROM Options
Each COPY statement requires a FROM option to indicate the location of the file or files being
loaded. This syntax snippet shows the available FROM keywords, and their associated file format
options:
FROM { STDIN ...... [ BZIP | GZIP | UNCOMPRESSED ]
...| 'pathToData' [ ON nodename | ON ANY NODE ]
...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...]
...| LOCAL STDIN | 'pathToData'
...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...]
}
Each of the FROM keywords lets you optionally specify the format of the load file as
UNCOMPRESSED, BZIP, or GZIP.
Bulk Loading Data

Note: When using COPY in conjunction with a CREATE EXTERNAL TABLE statement, you
cannot use the COPY FROM STDIN or LOCAL options.
Loading From STDIN
Using STDIN for the FROM option lets you load uncompressed data, bzip, or gzip files.
Loading From a Specific Path
Use the 'pathToData' option to indicate the location of the load file, optionally indicating a node
name or ON ANY NODE to indicate which node (or nodes) should parse the load file. You can load one
or more files in the supported formats: UNCOMPRESSED, BZIP, or GZIP.
Note: Using the ON ANY NODE clause indicates that the source file to load is on all of the
nodes, so COPY opens the file and parses it from any node in the cluster. Be sure that the
source file you specify is available and accessible on each cluster node.
If pathToData resolves to a storage location, and the user invoking COPY is not a superuser, these
are the required permissions:
l The storage location must have been created with the USER option (see ADD_LOCATION)
l The user must already have been granted READ access to the storage location where the file(s)
exist, as described in GRANT (Storage Location)
Further, if a non-superuser invokes COPY from a storage location to which she has privileges, HP
Vertica also checks any symbolic links (symlinks) the user has to ensure no symlink can access an
area to which the user has not been granted privileges.
Loading BZIP and GZIP Files
You can load compressed files (BZIP and GZIP). You must indicate the BZIP or GZIP format for
each file when loading multiple files. For example, this statement copies a BZIP file into the flex
table twitter, using the fjsonparser:
VMART=> copy twitter from '/server1/TWITTER/tweets1.json.bz2' BZIP parser fjsonparser() d
irect;
Rows Loaded
-------------
172094
(1 row)
Loading with Wildcards (glob) ON ANY NODE
COPY fully supports the ON ANY NODE clause with a wildcard (glob). You can invoke COPY for a
large number of files in a shared directory with a single statement such as this:
Bulk Loading Data

COPY myTable FROM '/mydirectory/ofmanyfiles/*.dat' ON ANY NODE
Using a wildcard with the ON ANY NODE clause expands the file list on the initiator node, and then
distributes the individual files among all nodes, evenly distributing the COPY workload across the
entire cluster.
Loading From a Local Client
To bulk load data from a client, and without requiring database superuser privileges, use the COPY
FROM LOCAL option. You can load from either STDIN, or a specific path, but not from a specific
node (or ON ANY NODE), since you are loading from the client. All local files are loaded and parsed
serially with each COPY statement, so you cannot perform parallel loads with the LOCAL option.
See Using Parallel Load Streams.
You can load one or more files in the supported formats: UNCOMPRESSED, BZIP, or GZIP.
For specific information about saving rejected data and exceptions files when using COPY from
LOCAL, see Capturing Load Rejections and Exceptions.
Choosing a Load Method
Depending on what data you are loading, COPY statement has these load method options:
Load
Method Description and Use
AUTO Loads data into WOS. Use the default COPY load method for smaller bulk loads.
DIRECT Loads data directly into ROS containers. Use the DIRECT load method for large
bulk loads (100MB or more).
TRICKLE Loads only into WOS. Use for frequent incremental loads, after the initial bulk load is
complete.
Note: COPY ignores any load method you specify as part of creating an external table.
Loading Directly into WOS (AUTO)
This is the default load method. If you do not specify a load option, COPY uses the AUTO method
to load data into WOS (Write Optimized Store in memory). The default method is good for smaller
bulk loads (< 100MB). Once WOS is full, COPY continues loading directly to ROS (Read
Optimized Store on disk) containers.
Bulk Loading Data

Loading Directly to ROS (DIRECT)
Use the DIRECT keyword in the COPY statement to bypass loading data into WOS, and instead,
load data directly into ROS containers. The DIRECT option is best suited for loading large amounts
of data (100MB or more) at a time. Using DIRECT for many loads of smaller data sets results in
many ROS containers, which have to be combined later.
COPY a FROM stdin DIRECT;
COPY b FROM LOCAL STDIN DIRECT;
Note: A large initial bulk load can temporarily affect query performance while HP Vertica
organizes the data.
Loading Data Incrementally (TRICKLE)
Use the TRICKLE load option to load data incrementally after the initial bulk load is complete. Trickle
loading loads data only into the WOS. If the WOS becomes full, an error occurs and the entire data
load is rolled back. Use this option only when you have a finely-tuned load and moveout process so
that you are sure there is room in the WOS for the data you are loading. This option is more efficient
than AUTO when loading data into partitioned tables.
For other details on trickle-loading data and WOS Overflow into the ROS, see Trickle Loading.
Loading Data Without Committing Results (NO COMMIT)
Use the NO COMMIT option with COPY (unless the tables are temp tables) to perform a bulk load
transaction without automatically committing the results. This option is useful for executing multiple
COPY commands in a single transaction.
For example, the following set of COPY ... NO COMMIT statements performs several copy
statements sequentially, and then commits them all. In this way, all of the copied data is either
committed or rolled back as a single transaction.
COPY... NO COMMIT;
COPY... NO COMMIT;
COPY... NO COMMIT;
COPY X FROM LOCAL NO COMMIT;
COMMIT;
Using a single transaction for multiple COPY statements also allows HP Vertica to load the data
more efficiently since it can combine the larger amounts of data from multiple COPY statements
into fewer ROS containers.
HP recommends that you COMMIT or ROLLBACK the current transaction before you use COPY.
You can combine NO COMMIT with most other existing COPY options, but not the
REJECTED DATA AS TABLE option. The standard transaction semantics apply. If a transaction is in
progress that was initiated by a statement other than COPY (such as INSERT), using COPY with NO
Bulk Loading Data

COMMIT adds rows to the existing transaction, rather than starting a new one. The previous
statements are NOT committed.
Note: NO COMMIT is ignored when COPY is part of the CREATE EXTERNAL TABLE FROM
COPY statement.
Using NO COMMIT to Detect Constraint Violations
You can use the NO COMMIT option to detect constraint violations as part of the load process.
HP Vertica checks for constraint violations when running a query, but not when loading data. To
detect constraint violations, load data with the NO COMMIT keyword and then test the load using
ANALYZE_CONSTRAINTS. If you find any constraint violations, you can roll back the load
because you have not committed it.
See Detecting Constraint Violations for detailed instructions.
Using COPY Interactively
HP Vertica recommends using the COPY statement in one or more script files, as described in
Using Load Scripts. You can also use commands such as the following interactively by piping a
text file to vsql and executing COPY (or COPY FROM LOCAL) statement with the standard input
stream as the input file. For example:
$ cat fact_table.tbl | vsql -c "COPY FACT_TABLE FROM STDIN DELIMITER '|' DIRECT";
$ cat fact_table.tbl | vsql -c "COPY FACT_TABLE FROM LOCAL STDIN DELIMITER '|' DIRECT";
Canceling a COPY Statement
If you cancel a bulk data load, the COPY statement rolls back all rows that it attempted to load.
Specifying a COPY Parser
By default, COPY uses the DELIMITER parser to load raw data into the database. Raw input data
must be in UTF-8, delimited text format. Data is compressed and encoded for efficient storage. If
your raw data does not consist primarily of delimited text, specify the parser COPY should use to
align most closely with the load data:
l NATIVE
l NATIVE VARCHAR
l FIXEDWIDTH
Note: You do not specify the DELIMITER parser directly; absence of a specific parser
indicates the default.
Bulk Loading Data

Two other parsers are available specifically to load unstructured data into flex tables, as described
in the Using Flex Table Parsers section of the Flex Tables Guide:
l FJSONPARSER
l FDELIMITEDPARSER
While these two parsers are for use with flex tables, you can also use them with columnar tables, to
make loading data more flexible. For instance, you can load JSON data into a columnar table in one
load, and delimited data into the same table in another. The Flex Tables Guide describes this use
case and presents an example.
Using a different parser for your data can improve load performance. If delimited input data includes
binary data types, COPY translates the data on input. See Using Load Scripts and Loading Binary
(Native) Data for examples. You can also load binary data, but only if it adheres to the COPY format
requirements, described in Creating Native Binary Format Files.
You cannot mix raw data types that require different parsers (such as NATIVE and FIXEDWIDTH)
in a single bulk load COPY statement. To check data formats before (or after) loading, see
Checking Data Format Before or After Loading.
Specifying Load Metadata
In addition to choosing a parser option, COPY supports other options to determine how to handle
the raw load data. These options are considered load metadata, and you can specify metadata
options at different parts of the COPY statement as follows:
Metadata Option
As a Column or Expression
Option
As a
COLUMN
OPTION
As a FROM
Level Option
DELIMITER X X X
ENCLOSED BY X X X
ESCAPE AS X X X
NULL X X X
TRIM X X
RECORD
TERMINATOR
X
SKIP X
SKIP BYTES X (Fixed-width
only)
TRAILING
NULLCOLS
X
The following precedence rules apply to all data loads:
Bulk Loading Data

l All column-level parameters override statement-level parameters.
l COPY uses the statement-level parameter if you do not specify a column-level parameter.
l COPY uses the default metadata values for the DELIMITER, ENCLOSED BY, ESCAPE AS,
and NULL options if you do not specify them at either the statement- or column-level.
When you specify any metadata options, COPY uses the parser to produce the best results and
stores the raw data and its corresponding metadata in the following formats:
Raw data format Metadata Format Parser
UTF-8 UTF-8 DELIMITER
Binary Binary NATIVE
UTF-8 Binary NATIVE VARCHAR
UTF-8 UTF-8 FIXEDWIDTH
See Also
l COPY
Interpreting Last Column End of Row Values
When bulk-loading delimited text data using the default parser (DELIMITED), the last column end of
row value can be any of the following:
l Record terminator
l EOF designator
l Delimiter and a record terminator
Note: The FIXEDWIDTH parser always requires exactly a record terminator. No other
permutations work.
For example, given a three-column table, the following input rows for a COPY statement using a
comma (,) delimiter are each valid:
1,1,11,1,1,
1,1,
1,1,,
The following examples illustrate how COPY can interpret different last column end of data row
values.
Bulk Loading Data

Using a Single End of Row Definition
To see how COPY interprets a single end of row definition:
1. Create a two-column table two_col, specifying column b with a default value of 5:
VMart=> create table two_col (a int, b int DEFAULT 5);CREATE TABLE
2. COPY the two_col table using a comma (,) delimiter, and enter values for only one column (as
a single, multi-line entry):
VMart=> copy two_col from stdin delimiter ',';Enter data to be copied followed by a n
ewline.
>> 1,
>> 1,
>> .
The COPY statement complete successfully.
3. Query table two_col, to display the two NULL values for column b as blank:
VMart=> select * from two_col; a | b
---+---
1 |
1 |
(2 rows)
Here, COPY expects two values for each column, but gets only one. Each input value is followed
by a delimiter (,), and an implicit record terminator (a newline character, n). You supply a record
terminator with the ENTER or RETURN key. This character is not represented on the screen.
In this case, the delimiter (,) and record terminator ( n) are handled independently. COPY interprets
the delimiter (,) to indicate the end of one value, and the record terminator (n) to specify the end of
the column row. Since no value follows the delimiter, COPY supplies an empty string before the
record terminator. By default, the empty string signifies a NULL, which is a valid column value.
Using a Delimiter and Record Terminator End of Row
Definition
To use a delimiter and record terminator together as an end of row definition:
Bulk Loading Data

1. Copy column a (a) of the two_col table, using a comma delimiter again, and enter two values:
VMart=> copy two_col (a) from stdin delimiter ',';Enter data to be copied followed by
a newline.
>> 2,
>> 2,
>> .
The COPY statement again completes successfully.
2. Query table two_col to see that column b now includes two rows with its default value (5):
VMart=> select * from two_col; a | b
---+---
1 |
1 |
2 | 5 2 | 5
(4 rows)
In this example, COPY expects values for only one column, because of the column (a) directive.
As such, COPY interprets the delimiter and record terminator together as a single, valid, last
column end of row definition. Before parsing incoming data, COPY populates column b with its
default value, because the table definition has two columns and the COPY statement supplies only
one. This example populates the second column with its default column list value, while the
previous example used the supplied input data.
Loading UTF-8 Format Data
You can specify these parameters at either a statement or column basis:
l ENCLOSED BY
l ESCAPE AS
l NULL
l DELIMITER
Loading Special Characters As Literals
The default COPY statement escape key is a backslash (). By preceding any special character
with an escape character, COPY interprets the character that follows literally, and copies it into the
database. These are the special characters that you escape to load them as literals:
Special Character COPY Statement Usage
Vertical bar (|) Default COPY ... DELIMITER character
Bulk Loading Data

Special Character COPY Statement Usage
Empty string ('') Default COPY ... NULL string.
Backslash () Default COPY ... ESC character.
Newline and other control characters Various
To use a special character as a literal, prefix it with an escape character. For example, to include a
literal backslash () in the loaded data (such as when including a file path), use two backslashes
(). COPY removes the escape character from the input when it loads escaped characters.
Using a Custom Column Separator (DELIMITER)
The default COPY delimiter is a vertical bar (|). The DELIMITER is a single ASCII character used
to separate columns within each record of a file. Between two delimiters, COPY interprets all string
data in load files as characters. Do not enclose character strings in quotes, since quote characters
are also treated as literals between delimiters.
You can define a different delimiter using any ASCII value in the range E'000' to E'177'
inclusive. For instance, if you are loading CSV data files, and the files use a comma (,) character as
a delimiter, you can change the default delimiter to a comma. You cannot use the same character
for both the DELIMITER and NULL options.
If the delimiter character is among a string of data values, use the ESCAPE AS character ( by
default) to indicate that the delimiter should be treated as a literal.
The COPY statement accepts empty values (two consecutive delimiters) as valid input data for
CHAR and VARCHAR data types. COPY stores empty columns as an empty string (''). An
empty string is not equivalent to a NULL string.
To indicate a non-printing delimiter character (such as a tab), specify the character in extended
string syntax (E'...'). If your database has StandardConformingStrings enabled, use a Unicode
string literal (U&'...'). For example, use either E't' or U&'0009' to specify tab as the
delimiter.
Using a Custom Column Option DELIMITER
This example, redefines the default delimiter through the COLUMN OPTION parameter.
1. Create a simple table.
=> CREATE TABLE t( pk INT,
col1 VARCHAR(10),
col2 VARCHAR(10),
col3 VARCHAR(10),
col4 TIMESTAMP);
2. Use the COLUMN OPTION parameter to change the col1 default delimiter to a tilde (~).
Bulk Loading Data

=> COPY t COLUMN OPTION(col1 DELIMITER '~') FROM STDIN NO COMMIT;>> 1|ee~gg|yy|1999-1
2-12
>> .
=> SELECT * FROM t;
pk | col1 | col2 | col3 | col4
----+------+------+------+---------------------
1 | ee | gg | yy | 1999-12-12 00:00:00
(1 row)
Defining a Null Value (NULL)
The default NULL value for COPY is an empty string (''). You can specify a NULL as any ASCII
value in the range E'001' to E'177' inclusive (any ASCII character except NUL: E'000'). You
cannot use the same character for both the DELIMITER and NULL options.
When NULL is an empty string (''), use quotes to insert an empty string instead of a NULL. For
example, using NULL " ENCLOSED BY '"':
l 1||3 — Inserts a NULL in the second column.
l 1|""|3 — Inserts an empty string instead of a NULL in the second columns.
To input an empty or literal string, use quotes (ENCLOSED BY); for example:
NULL ''NULL 'literal'
A NULL is case-insensitive and must be the only value between the data field delimiters. For
example, if the null string is NULL and the delimiter is the default vertical bar (|):
|NULL| indicates a null value.
| NULL | does not indicate a null value.
When you use the COPY command in a script, you must substitute a double-backslash for each null
string that includes a backslash. For example, the scripts used to load the example database
contain:
COPY ... NULL E'n' ...
Loading NULL Values
You can specify NULL by entering fields without content into a data file, using a field delimiter.
For example, given the default delimiter (|) and default NULL (empty string) definition, COPY
inserts the following input data:
| | 1| 2 | 3
4 | | 5
Bulk Loading Data

6 | |
into the table as follows:
(null, null, 1)(null, 2, 3)
(4, null, 5)
(6, null, null)
If NULL is set as a literal ('null'), COPY inserts the following inputs:
null | null | 1null | 2 | 3
4 | null | 5
6 | null | null
as follows:
(null, null, 1)(null, 2, 3)
(4, null, 5)
(6, null, null)
Filling Columns with Trailing Nulls (TRAILING
NULLCOLS)
Loading data using the TRAILING NULLCOLS option inserts NULL values into any columns without
data. Before inserting TRAILING NULLCOLS, HP Vertica verifies that the column does not have a
NOT NULL constraint.
To use the TRAILING NULLCOLS parameter to handle inserts with fewer values than data columns:
1. Create a table:
=> CREATE TABLE z ( a INT,
b INT,
c INT );
2. Insert some values into the table:
=> INSERT INTO z VALUES (1, 2, 3);
3. Query table z to see the inputs:
=> SELECT * FROM z; a | b | c
---+---+---
Bulk Loading Data

1 | 2 | 3
(1 row)
4. Insert two rows of data from STDIN, using TRAILING NULLCOLS:
=> COPY z FROM STDIN TRAILING NULLCOLS;>> 4 | 5 | 6
>> 7 | 8
>> .
5. Query table z again to see the results. Using TRAILING NULLCOLS, the COPY statement
correctly handled the third row of column c, which had no value:
=> SELECT * FROM z; a | b | c
---+---+---
1 | 2 | 3
4 | 5 | 6
7 | 8 |
(3 rows)
Attempting to Fill a NOT NULL Column with TRAILING
NULLCOLS
You cannot use TRAILING NULLCOLS on a column that has a NOT NULL constraint. For
instance:
1. Create a table n, declaring column b with a NOT NULL constraint:
=> CREATE TABLE n ( a INT,
b INT NOT NULL,
c INT );
2. Insert some table values:
=> INSERT INTO n VALUES (1, 2, 3);=> SELECT * FROM n;
a | b | c
---+---+---
1 | 2 | 3
(1 row)
3. Use COPY with TRAILING NULLCOLS on table n to see the COPY error due to the column
constraint:
=> COPY n FROM STDIN trailing nullcols abort on error;Enter data to be copied
Bulk Loading Data

followed by a newline.
>> 4 | 5 | 6
>> 7 | 8
>> 9
>> .
ERROR: COPY: Input record 3 has been rejected (Cannot set trailing column to NULL as
column 2 (b) is NOT NULL)
4. Query the table to see that the COPY statement values were rejected:
=> SELECT * FROM n; a | b | c
---+---+---
1 | 2 | 3
(1 row)
Changing the Default Escape Character (ESCAPE AS)
The default escape character is a backslash (). To change the default to a different character, use
the ESCAPE AS option. To use an alternative escape character:
=> COPY mytable FROM '/data/input.txt' ESCAPE AS E('001');
You can set the escape character to be any ASCII value value in the range E'001' to E'177'
inclusive.
Eliminating Escape Character Handling
If you do not want any escape character and want to prevent any characters from being interpreted
as escape sequences, use the NO ESCAPE option as part of the COPY statement.
Delimiting Characters (ENCLOSED BY)
The COPY ENCLOSED BY parameter lets you set an ASCII character to delimit characters to embed
in string values. You can use any ASCII value in the range E'001' to E'177' inclusive (any
ASCII character except NULL: E'000') for the ENCLOSED BY value. Using double quotation marks
(") is the most commonly used quotation character. For instance, the following parameter specifies
that input data to the COPY statement is enclosed within double quotes:
ENCLOSED BY '"'
With the following input (using the default DELIMITER (|) character), specifying:
"vertica | value"
Results in:
Bulk Loading Data

l Column 1 containing "vertica
l Column 2 containing value"
Notice the double quotes (") before vertica and after value.
Using the following sample input data as follows, columns are distributed as shown:
"1", "vertica,value", ",", "'"
col1 | col2 | col3 | col4------+---------------+------+-----
1 | vertica,value | , | '
(1 row)
Alternatively, write the above example using any ASCII character of your choosing:
~1~, ~vertica,value~, ~,~, ~'~
If you use a single quote ('), rather than double quotes (") as the ENCLOSED BY character, you must
escape it using extended string syntax, a Unicode literal string (if StandardConformingStrings is
enabled), or by using four single quotes:
ENCLOSED BY E'''ENCLOSED BY U&'0027'
ENCLOSED BY ''''
Using any of the definitions means the following input is properly parsed:
'1', 'vertica,value', ',', '''
See String Literals (Character) for an explanation of the string literal formats you can use to specify
the ENCLOSED BY parameter.
Use the ESCAPE AS character to embed the ENCLOSED BY delimiter within character string values.
For example, using the default ESCAPE AS character () and double quote as the ENCLOSED BY
character, the following input returns "vertica":
""vertica""
Using ENCLOSED BY for a Single Column
The following example uses double quotes to enclose a single column (rather than the entire row).
The COPY statement also specifies a comma (,) as the delimiter.
=> COPY Retail.Dim (Dno, Dname ENCLOSED BY '"', Dstore) FROM '/home/dbadmin/dim3.txt'
DELIMITER ','
EXCEPTIONS '/home/dbadmin/exp.txt';
This example correctly loads data such as:
Bulk Loading Data

123,"Smith, John",9832
Specifying a Custom End of Record String (RECORD
TERMINATOR)
To specify the literal character string that indicates the end of a data file record, use the RECORD
TERMINATOR parameter, followed by the string to use. If you do not specify a value, then HP Vertica
attempts to determine the correct line ending, accepting either just a linefeed (E'n') common on
UNIX systems, or a carriage return and linefeed (E'rn') common on Windows platforms.
For example, if your file contains comma-separated values terminated by line feeds that you want
to maintain, use the RECORD TERMINATOR option to specify an alternative value:
=> COPY mytable FROM STDIN DELIMITER ',' RECORD TERMINATOR E'n';
To specify the RECORD TERMINATOR as non-printing characters, use either the extended string
syntax or Unicode string literals. The following table lists some common record terminator
characters. See String Literals for an explanation of the literal string formats.
Extended String Syntax Unicode Literal String
Description ASCII Decimal
E'b' U&'0008' Backspace 8
E't' U&'0009' Horizontal tab 9
E'n' U&'000a' Linefeed 10
E'f' U&'000c' Formfeed 12
E'r' U&'000d' Carriage return 13
E'' U&'005c' Backslash 92
If you use the RECORD TERMINATOR option to specify a custom value, be sure the input file matches
the value. Otherwise, you may get inconsistent data loads.
Note: The record terminator cannot be the same as DELIMITER, NULL, ESCAPE,or ENCLOSED
BY.
If using JDBC, HP recommends that you use the following value for the RECORD TERMINATOR:
System.getProperty("line.separator")
Examples
The following examples use a comma (,) as the DELIMITER for readability.
Bulk Loading Data

,1,2,3,,1,2,3
1,2,3,
Leading (,1) and trailing (3,) delimiters are ignored. Thus, the rows all have three columns.
123,'n',n,456 Using a non-default null string, the row is interpreted as:
123newline
n
456
123,this, that, or the other,something else,456
The row would be interpreted as:
123this, that, or the other
something else
456
Loading Native Varchar Data
Use the NATIVE VARCHAR parser option when the raw data consists primarily of CHAR or
VARCHAR data. COPY performs the conversion to the actual table data types on the database
server. This option with COPY LOCAL is not supported.
Using NATIVE VARCHAR does not provide the same efficiency as NATIVE. However, NATIVE
VARCHAR precludes the need to use delimiters or to escape special characters, such as quotes,
which can make working with client applications easier.
Note: NATIVE VARCHAR does not support concatenated BZIP and GZIP files.
Batch data inserts performed through the HP Vertica ODBC and JDBC drivers automatically use
the NATIVE VARCHAR formats.
Loading Binary (Native) Data
You can load binary data using the NATIVE parser option, except with COPY LOCAL, which does
not support this option. Since binary-format data does not require the use and processing of
delimiters, it precludes the need to convert integers, dates, and timestamps from text to their native
storage format, and improves load performance over delimited data. All binary-format files must
adhere to the formatting specifications described in Appendix: Binary File Formats.
Native binary format data files are typically larger than their delimited text format counterparts, so
use GZIP or BZIP to compress the data before loading it. NATIVE BINARY does not support
concatenated BZIP and GZIP files. You can load native (binary) format files when developing plug-
ins to ETL applications.
There is no copy format to load binary data byte-for-byte because the column and record separators
in the data would have to be escaped. Binary data type values are padded and translated on input,
and also in the functions, operators, and casts supported.
Bulk Loading Data

Loading Hexadecimal, Octal, and Bitstring Data
You can use hexadecimal, octal, and bitstring formats only to load binary columns. To specify
these column formats, use the COPY statement's FORMAT options:
l Hexadecimal
l Octal
l Bitstring
The following examples illustrate how to use the FORMAT option.
1. Create a table:
=> CREATE TABLE t( oct VARBINARY(5),
hex VARBINARY(5),
bitstring VARBINARY(5) );
2. Create the projection:
=> CREATE PROJECTION t_p(oct, hex, bitstring) AS SELECT * FROM t;
3. Use the COPY command from STDIN (not a file), specifying each of the formats:
=> COPY t (oct FORMAT 'octal', hex FORMAT 'hex',
bitstring FORMAT 'bitstring')
FROM STDIN DELIMITER ',';
4. Enter the data to load, ending the statement with a backslash () and a period (.) on a separate
line:
>> 141142143144145,0x6162636465,0110000101100010011000110110010001100101>> .
5. Use a select query on table t to view the input values results:
=> SELECT * FROM t;oct | hex | bitstring
-------+-------+-----------
abcde | abcde | abcde
(1 row)
COPY uses the same default format to load binary data, as used to input binary data. Since the
backslash character ('') is the default escape character, you must escape octal input values. For
example, enter the byte '141' as '141'.
Bulk Loading Data

Note: If you enter an escape character followed by an invalid octal digit or an escape character
being escaped, COPY returns an error.
On input, COPY translates string data as follows:
l Uses the HEX_TO_BINARY function to translate from hexadecimal representation to binary.
l Uses the BITSTRING_TO_BINARY function to translate from bitstring representation to binary.
Both functions take a VARCHAR argument and return a VARBINARY value.
You can also use the escape character to represent the (decimal) byte 92 by escaping it twice; for
example, ''. Note that vsql inputs the escaped backslash as four backslashes. Equivalent
inputs are hex value '0x5c' and octal value '134' (134 = 1 x 8^2 + 3 x 8^1 + 4 x 8^0 = 92).
You can load a delimiter value if you escape it with a backslash. For example, given delimiter '|',
'001|002' is loaded as {1,124,2}, which can also be represented in octal format as
'001174002'.
If you insert a value with more bytes than fit into the target column, COPY returns an error. For
example, if column c1 is VARBINARY(1):
=> INSERT INTO t (c1) values ('ab'); ERROR: 2-byte value too long for type Varbinary(1)
If you implicitly or explicitly cast a value with more bytes than fit the target data type, COPY silently
truncates the data. For example:
=> SELECT 'abcd'::binary(2); binary
--------
ab
(1 row)
Hexadecimal Data
The optional '0x' prefix indicates that a value is hexadecimal, not decimal, although not all
hexadecimal values use A-F; for example, 5396. COPY ignores the 0x prefix when loading the input
data.
If there are an odd number of characters in the hexadecimal value, the first character is treated as
the low nibble of the first (furthest to the left) byte.
Octal Data
Loading octal format data requires that each byte be represented by a three-digit octal code. The
first digit must be in the range [0,3] and the second and third digits must both be in the range [0,7].
If the length of an octal value is not a multiple of three, or if one of the three digits is not in the proper
range, the value is invalid and COPY rejects the row in which the value appears. If you supply an
invalid octal value, COPY returns an error. For example:
Bulk Loading Data

SELECT '000387'::binary(8);ERROR: invalid input syntax for type binary
Rows that contain binary values with invalid octal representations are also rejected. For example,
COPY rejects '008' because ' 008' is not a valid octal number.
BitString Data
Loading bitstring data requires that each character must be zero (0) or one (1), in multiples of eight
characters. If the bitstring value is not a multiple of eight characters, COPY treats the first n
characters as the low bits of the first byte (furthest to the left), where n is the remainder of the
value's length, divided by eight.
Examples
The following example shows VARBINARY HEX_TO_BINARY(VARCHAR) and VARCHAR TO_HEX
(VARBINARY) usage.
1. Create table t and and its projection with binary columns:
=> CREATE TABLE t (c BINARY(1));=> CREATE PROJECTION t_p (c) AS SELECT c FROM t;
2. Insert minimum and maximum byte values, including an IP address represented as a character
string:
=> INSERT INTO t values(HEX_TO_BINARY('0x00')); => INSERT INTO t values(HEX_TO_BINARY
('0xFF'));
=> INSERT INTO t values (V6_ATON('2001:DB8::8:800:200C:417A'));
Use the TO_HEX function to format binary values in hexadecimal on output:
=> SELECT TO_HEX(c) FROM t;to_hex
--------
00
ff
20
(3 rows)
See Also
l COPY
l Binary Data Types
Bulk Loading Data

l Formatting Functions
l ASCII
Loading Fixed-Width Format Data
Use the FIXEDWIDTH parser option to bulk load fixed-width data. You must specify the COLSIZES
option values to specify the number of bytes for each column. The definition of the table you are
loading (COPY table f (x, y, z)) determines the number of COLSIZES values to declare.
To load fixed-width data, use the COLSIZES option to specify the number of bytes for each input
column. If any records do not have values, COPY inserts one or more null characters to equal the
specified number of bytes. The last record in a fixed-width data file must include a record terminator
to determine the end of the load data.
Supported Options for Fixed-Width Data Loads
Loading fixed-width data supports the options listed in the COPY Option Summary.
These options are not supported:
l DELIMITER
l ENCLOSED BY
l ESCAPE AS
l TRAILING NULLCOLS
Using Nulls in Fixed-Width Data
The default NULL string for a fixed-width load cannot be an empty string, and instead, consists of
all spaces. The number of spaces depends on the column width declared with the COLSIZES
(integer, [,...]) option.
For fixed-width loads, the NULL definition depends on whether you specify NULL at the column or
statement level, as follows:
l Statement level—NULL must be defined as a single-character. The default (or custom) NULL
character is repeated for the entire width of the column.
l Column Level—NULL must be defined as a string whose length matches the column width.
For fixed-width loads, if the input data column has fewer values than the specified column size,
COPY inserts NULL characters. The number of NULLs must match the declared column width. If
you specify a NULL string at the column level, COPY matches the string with the column width.
Note: To turn off NULLs, use the NULL AS option and specify NULL AS ''.
Bulk Loading Data

Defining a Null Character (Statement Level)
1. Create a two-column table (fw):
VMart=> create table fw(co int, ci int);CREATE TABLE
2. Copy the table, specifying null as 'N', and enter some data:
VMart=> copy fw from STDIN fixedwidth colsizes(2,2) null as 'N' no commit;Enter data
to be copied followed by a newline.
>> NN12
>> 23NN
>> NNNN
>> nnnn
>> .
3. Select all (*) from the table:
VMart=> select * from fw;
co | ci
----+----
| 12
23 |
|
|
|
(5 rows)
Defining a Custom Record Terminator
To define a record terminator other than the COPY default when loading fixed-width data, take
these steps:
1. Create a table, fw, with two columns, co and ci:
VMart=> create table fw(co int, ci int);CREATE TABLE
2. Copy table fw, specifying two 2-byte column sizes, and specifying a comma (,) as the record
terminator:
VMart=> copy fw from STDIN fixedwidth colsizes(2,2) record terminator ',';Enter data
to be copied followed by a newline.
Bulk Loading Data

>> 1234,1444,6666
>> .
VMart=> select * from fw; co | ci
----+----
12 | 34
14 | 44
(2 rows)
The SELECT output indicates only two values. COPY rejected the third value (6666) because it
was not followed by a comma (,) record terminator. Fixed-width data requires a trailing record
terminator only if you explicitly specify a record terminator explicitly.
Copying Fixed-Width Data
Use COPY FIXEDWIDTH COLSIZES (n [,...) to load files into an HP Vertica database. By default,
all spaces are NULLs. For example, in the simple case:
=> create table mytest(co int, ci int);=> create projection mytest_p1 as select * from my
test segmented by hash(co) all nodes;
=> create projection mytest_p2 as select * from mytest segmented by hash(co) all nodes of
fset 1;
=> copy mytest(co,ci) from STDIN fixedwidth colsizes(6,4) no commit;
=> select * from mytest order by co;
co | ci
----+----
(0 rows)
Skipping Content in Fixed-Width Data
The COPY statement has two options to skip input data. The SKIP BYTES option is only for fixed-
width data loads:
SKIP BYTES total Skips the total number (integer) of bytes from the input data.
SKIP records Skips the number (integer) of records you specify.
This example uses SKIP BYTES to skip 10 bytes when loading a fixed-width table with two columns
(4 and 6 bytes):
1. Copy a table, using SKIP BYTES to skip 10 bytes of input data:
VMart=> copy fw from stdin fixedwidth colsizes (4,6) SKIP BYTES 10;Enter data to be
Bulk Loading Data

copied followed by a newline.
>> 2222666666
>> 1111999999
>> 1632641282
>> .
VMart=> select * from fw order by co; co | ci
------+--------
1111 | 999999
1632 | 641282
(2 rows)
The select output indicates that COPY skipped the first 10 bytes of load data, as directed.
This example uses SKIP when loading a fixed-width (4,6) table to skip one (1) record of input data:
1. Copy a table, using SKIP to skip 1 record of input data:
VMart=> copy fw from stdin fixedwidth colsizes (4,6) SKIP 1;Enter data to be copied f
ollowed by a newline.
>> 2222666666
>> 1111999999
>> 1632641282
>> .
VMart=> select * from fw order by co; co | ci
------+--------
1111 | 999999
1632 | 641282
(2 rows)
The SELECT output indicates that COPY skipped the first record of load data, as directed.
Trimming Characters in Fixed-Width Data Loads
Use the TRIM option to trim a character. TRIM accepts a single-byte character, which is trimmed at
the beginning and end of the data. For fixed-width data loads, when you specify a TRIM character,
COPY first checks to see if the row is NULL. If the row is not null, COPY trims the character(s).
The next example instructs COPY to trim the character A, and shows the results. Only the last two
lines entered comply to the specified (4, 6) fixed width:
Bulk Loading Data

1. Copy table fw, specifying the TRIM character, A:
VMart=> copy fw from stdin fixedwidth colsizes(4,6) TRIM 'A';
Enter data to be copied followed by a newline.
>> A2222A444444
>> 2222A444444
>> A22A444444
>> A22AA4444A
>> .
VMart=> select * from fw order by co;
co | ci
----+--------
22 | 4444
22 | 444444
(2 rows)
Using Padding in Fixed-Width Data Loads
By default, the padding character is ' ' (a single space). The padding behavior for fixed-width data
loads is similar to how a space is treated in other formats, differing by data type as follows:
Datatype Padding
Integer Leading and trailing spaces
Bool Leading and trailing spaces
Float Leading and trailing spaces
[var]Binary None. All characters are significant.
[Var]Char Trailing spaces if string is too large
DateInterval
Time
Timestamp
TimestampTZ
TimeTZ
None. All characters are significant. The COPY statement uses an internal
algorithm to parse these data types.
Date (formatted) Use the COPY FORMAT option string to match the expected column length.
Numerics Leading and trailing spaces
Bulk Loading Data

Ignoring Columns and Fields in the Load File
When bulk loading data, your source data may contain a column that does not exist in the
destination table. Use the FILLER option to have COPY ignore an input column and the fields it
contains when a corresponding column does not exist in the destination table. You can also use
FILLER to transform data through derivation from the source into the destination. Use FILLER for:
l Omitting columns that you do not want to transfer into a table.
l Transforming data from a source column and then loading the transformed data to the
destination table, without loading the original, untransformed source column (parsed column).
(See Transforming Data During Loads.)
Using the FILLER Parameter
Your COPY statement expressions can contain one or more filler columns. You can use any
number of filler columns in the expression. The only restriction is that at least one column must not
be a filler column. You cannot specify target table columns as filler, regardless of whether they are
in the column list.
A data file can consist entirely of filler columns, indicating that all data in a file can be loaded into
filler columns and then transformed and loaded into table columns. The filler column must be a
parsed column, not a computed column. Also, the name of the filler column must be unique within
both the source file and the destination table. You must specify the data type of the filler column as
part of the FILLER parameter.
FILLER Parameter Examples
You can specify all parser parameters for filler columns, and all statement level parser parameters
apply to filler columns.
To ignore a column, use the COPY statement FILLER parameter, followed by its data type. The
next example creates a table with one column, and then copies it using two filler parameters. Since
the second filler column is not part of any expression, it is discarded:
create table t (k timestamp);copy t(y FILLER date FORMAT 'YYYY-MM-DD', t FILLER varchar(1
0), k as y) from STDIN no commit;
2009-06-17|2009-06-17
.
The following example derives and loads the value for the timestamp column in the target database
from the year, month, and day columns in the source input. The year, month, and day columns are
not loaded because the FILLER parameter specifies to skip each of those columns:
CREATE TABLE t (k TIMESTAMP);CREATE PROJECTION tp (k) AS SELECT * FROM t;
COPY t(year FILLER VARCHAR(10),
month FILLER VARCHAR(10),
Bulk Loading Data

day FILLER VARCHAR(10),
k AS TO_DATE(YEAR || MONTH || DAY, 'YYYYMMDD'))
FROM STDIN NO COMMIT;
2009|06|17
1979|06|30
2007|11|26
.
SELECT * FROM t;
k
---------------------
2009-06-17 00:00:00
1979-06-30 00:00:00
2007-11-26 00:00:00
(3 rows)
See the COPY statement in the SQL Reference Manual for more information about syntax and
usage.
Loading Data into Pre-Join Projections
A pre-join projection stores rows of a fact table joined with rows of dimension tables. Storing pre-join
projections improves query performance, since the join does not occur when you query the data, but
is already stored.
To insert a row into the fact table of a pre-join projection, the associated values of the dimension
table's columns must be looked up. Thus, an insert into a pre-join projection shares some of the
qualities of a query. The following sections describe the behaviors associated with loading data into
pre-join projections.
Foreign and Primary Key Constraints
To ensure referential integrity, foreign and primary key constraints are enforced on inserts into fact
tables of pre-join projections. If a fact row attempts to reference a row that does not exist in the
dimension table, the load is automatically rolled back. The load is also rolled back if a fact row
references more than one dimension row.
Note: Unless it also has a NOT NULL constraint, a column with a FOREIGN KEY constraint
can contain a NULL value even though the dimension table's primary key column does not
contain a NULL value. This allows for records to be inserted into the fact table even though the
foreign key in the dimension table is not yet known.
The following tables and SQL examples highlight these concepts.
l Fact Table: Employees
l Dimension Table: HealthPlans
l Pre-join Projection: Joins Employees to HealthPlans using the PlanID column
Bulk Loading Data

CREATE PROJECTION EMP_HEALTH (EmployeeID, FirstName, LastName, Type)
AS (SELECT EmployeeID, FirstName, LastName,
Type FROM Employees, HealthPlans
WHERE Employees.HealthPlanID = HealthPlans.PlanID)
Employees (Fact Table)
EmployeeID(PK) FirstName LastName PlanID(FK)
---------------+-----------+----------+------------
1000 | David | Taylor | 01
1001 | Sunil | Ray | 02
1002 | Janet | Hildreth | 02
1003 | Pequan | Lee | 01
HealthPlans (Dimension Table)
PlanID(PK) Description Type
-----------+-------------+-------
01 | PlanOne | HMO
02 | PlanTwo | PPO
The following sequence of commands generate a missing foreign key error that results in a rollback
because the reference is to a non-existent dimension row.
INSERT INTO Employees (EmployeeID, First, Last, PlanID) VALUES (1004, 'Ben', 'Smith', 0
4);
The following sequence of commands generate a foreign key error that results in a rollback because
a duplicate row in the HealthPlans dimension table is referenced by an insert in the Employees fact
table. The error occurs when the Employees fact table references the HealthPlans dimension table.
INSERT INTO HealthPlan VALUES(02, 'MyPlan', 'PPO');
INSERT INTO Employee VALUES(1005, 'Juan', 'Hernandez', 02);
Concurrent Loads into Pre-Join Projections
HP Vertica supports concurrent inserts where two transactions can simultaneously insert rows into
the same table. A transaction inserting records into a pre-join projection can run concurrently with
another transaction inserting records into either the fact table or a dimension table of the pre-join
projection. A load into a pre-join projection cannot run concurrently with updates or deletes on either
the fact or the dimension tables.
When concurrently loading fact and dimension tables, the state of the dimension tables is fixed at
the start of the insert or load into the fact table. Rows that are added to a dimension table after the
start of an insert or load into a fact table are not available for joining because they are not visible to
the fact table. The client is responsible for ensuring that all values in dimension tables are present
before issuing the insert or load statement.
The following examples illustrate these behaviors.
Bulk Loading Data

l Fact Table: Sales
l Dimension Table: Employees
l Pre-join Projection: sales join employees on sales.seller=employees.empno
Success
Session A Session B Description
INSERT INTO EMPLOYEES (EMP
NO, NAME) VALUES
(1, 'Bob');
COPY INTO SALES (AMT, SELLE
R)
5000 | 1
3500 | 1
.
.
.
Records loaded by
this COPY command
all refer to Bob's sales
(SELLER = 1)
INSERT INTO EMPLOYEES (EMPNO, NAME)VAL
UES
(2,'Frank');
7234 | 1 COPY INTO SALES (AMT,SELLER)
50 | 2
75 | 2
.
.
.
Records loaded by
this COPY command
all refer to Frank's
sales (SELLER = 2).
COMMIT; COMMIT; Both transactions are
successful.
Failure
Session A Session B Description
INSERT INTO EMPLOYEES (EMPNO, NAME)
1 | Bob
2 | Terry COPY INTO SALES (AMT,SELLER)
5000 | 1
The transaction in
Session B fails because
the value inserted into the
dimension table in
Session A was not visible
before the COPY into the
pre-join in Session B was
initiated.
Bulk Loading Data

Using Parallel Load Streams
You can use COPY for multiple parallel load streams to load an HP Vertica database. COPY
LOCAL parses files serially, and does not support parallel load streams.
These are the options for parallel load streams:
l Issue multiple separate COPY statements to load different files from different nodes.
This option lets you use vsql, ODBC, ADO.net, or JDBC. You can load server-side files, or
client-side files using the COPY from LOCAL statement.
l Issue a single multi-node COPY command that loads different files from different nodes
specifying the nodename option for each file.
l Issue a single multi-node COPY command that loads different files from any node, using the ON
ANY NODE option.
l Use the COPY x WITH SOURCE PloadDelimitedSource option to parallel load using all cores
on the server node on which the file resides.
Files can be of different formats, such as BZIP, GZIP, and others. The multi-node option is not
available with the COPY from LOCAL parameter.
The single multi-node COPY options (nodename | ON ANY NODE) are possible only using the vsql
command, and not all COPY options support this behavior. However, using this method to copy
data can result in significantly higher performance and efficient resource usage.
See COPY in the SQL Reference Manual for syntax details.
While there is no restriction to the number of files you can load, the optimal number of load streams
depends on several factors, including the number of nodes, the physical and logical schemas, host
processors, memory, disk space, and so forth. Too many load streams can cause systems to run
out of memory. See Best Practices for Managing Workload Resources for advice on configuring
load streams.
Monitoring COPY Loads and Metrics
You can check COPY loads using:
l HP Vertica functions
l LOAD_STREAMS system table
Using HP Vertica Functions
Two meta-functions return COPY metrics for the number of accepted or rejected rows from a
COPY statement:
Bulk Loading Data

1. To get the number of accepted rows, use the GET_NUM_ACCEPTED_ROWS function:
VMart=> select get_num_accepted_rows();
get_num_accepted_rows
-----------------------
11
(1 row)
2. To check the number of rejected rows, use the GET_NUM_REJECTED_ROWS function:
VMart=> select get_num_rejected_rows();
get_num_rejected_rows
-----------------------
0
(1 row)
Using the CURRENT_LOAD_SOURCE() Function
When you include the CURRENT_LOAD_SOURCE function as a part of the COPY statement, the
input file name or value computed from it, can be inserted into a column.
To insert the file names into a column from multiple source files:
=> COPY t (c1, c2, c3 as CURRENT_LOAD_SOURCE()) FROM '/home/load_file_1' ON exampledb_no
de02, '/home/load_file_2' ON exampledb_node03 DELIMITER ',';
Using the LOAD_STREAMS System Table
HP Vertica includes a set of system tables that include monitoring information, as described in
Using System Tables. The LOAD_STREAMS system table includes information about load stream
metrics from COPY and COPY FROM VERTICA statements, so you can query table values to get
COPY metrics.
To see all table columns:
VMart=> select * from load_streams;
Using the STREAM NAME Parameter
Using the STREAM NAME parameter as part of the COPY statement labels COPY streams
explicitly so they are easier to identify in the LOAD_STREAMS system table.
To use the STREAM NAME parameter:
=> COPY mytable FROM myfile DELIMITER '|' DIRECT STREAM NAME 'My stream name';
Bulk Loading Data

The LOAD_STREAMS system table includes stream names for every COPY statement that takes
more than 1-second to run. The 1-second duration includes the time to plan and execute the
statement.
HP Vertica maintains system table metrics until they reach a designated size quota (in kilobytes).
The quota is set through internal processes and cannot be set or viewed directly.
Other LOAD_STREAMS Columns for COPY Metrics
These LOAD_STREAMS table column values depend on the load status:
l ACCEPTED_ROW_COUNT
l REJECTED_ROW_COUNT
l PARSE_COMPLETE_PERCENT
l SORT_COMPLETE_PERCENT
When a COPY statement using the DIRECT option is in progress, the ACCEPTED_ROW_COUNT field can
increase to the maximum number of rows in the input file as the rows are being parsed.
If COPY reads input data from multiple named pipes, the PARSE_COMPLETE_PERCENT field will
remain at zero (0) until all named pipes return an EOF. While COPY awaits an EOF from multiple
pipes, it may seem to be hung. Before canceling the COPY statement, however, check your
system CPU and disk accesses to see if any activity is in progress.
In a typical load, PARSE_COMPLETE_PERCENT can either increase slowly to 100%, or jump to 100%
quickly if you are loading from named pipes or STDIN, while SORT_COMPLETE_PERCENT is at 0.
Once PARSE_COMPLETE_PERCENT reaches 100%, SORT_COMPLETE_PERCENT increases to 100%.
Depending on the data sizes, a significant lag can occur between the time PARSE_COMPLETE_
PERCENT reaches 100% and the time SORT_COMPLETE_PERCENT begins to increase.
This example sets the VSQL expanded display, and then selects various columns of data from the
LOAD_STREAMS system table:
=> pset expanded
=> SELECT stream_name, table_name, load_start, accepted_row_count,
rejected_row_count, read_bytes, unsorted_row_count, sorted_row_count,
sort_complete_percent FROM load_streams;
-[ RECORD 1 ]----------+---------------------------
stream_name | fact-13
table_name | fact
load_start | 2010-12-28 15:07:41.132053
accepted_row_count | 900
rejected_row_count | 100
read_bytes | 11975
input_file_size_bytes | 0
parse_complete_percent | 0
unsorted_row_count | 3600
sorted_row_count | 3600
sort_complete_percent | 100
Bulk Loading Data

See the SQL Reference Manual for other meta-function details.
See the Programmer's Guide for client-specific documentation.
Capturing Load Rejections and Exceptions
Rejected data occurs during the parsing phase of loading data. A COPY operation can encounter
other problems and failures during different load phases, but rejections consist only of parse errors.
Following are a few examples of parsing errors that cause a rejected row:
l Unsupported parsing options
l Incorrect data types for the table into which data is being loaded
l Malformed context for the parser in use
l Missing delimiters
The COPY statement automatically saves copies of rejected rows in a rejected-data file, and an
explanation for the rejected row in an exceptions file. By default, HP Vertica saves both files in the
database catalog subdirectory, CopyErrorLogs:
v_mart_node003_catalogCopyErrorLogstrans-STDIN-copy-from-rejected-data.1
v_mart_node003_catalogCopyErrorLogstrans-STDIN-copy-from-exceptions.1
You can specify different files in which to save COPY rejections and exceptions. Use the REJECTED
DATA and EXCEPTIONS parameters to save one, or both, files to a file location of your choice.
You can also save rejected rows and their explanations in a table, using the REJECTED DATA AS
TABLE clause.
Using COPY Parameters To Handle Rejections and
Exceptions
Several optional parameters let you determine how strictly the COPY statement handles the rejection
it encounters when loading data. For example, you can have COPY fail when it encounters a single
rejection or permit a specific number of rejected rows before the load fails. Two COPY arguments,
REJECTED DATA and EXCEPTIONS, are designed to work together for these purposes.
This section describes the parameters you use to specify the rejections file or table of your choice,
and to control load exception handling.
Specifying Where to Save Rejected Data Rows
When you use the REJECTED DATA path argument, you specify one or both of the following:
Bulk Loading Data

l The path and file in which to save rejected data.
The rejected data file includes only the rejected records themselves. If you want to see the
reason each record was rejected, you must also specify the EXCEPTIONS path option. The
reasons for rejection are written to a separate file.
l The name of the table into which rejected data rows are saved.
Including the AS TABLE clause creates reject_table and populates it with both the rejected
record and the reason for rejection. You can then query the table to access rejected data
information. For more information, see Saving Load Rejections.
For COPY LOCAL operations, the rejected data file must reside on the client.
If path resolves to a storage location and the user invoking COPY is not a superuser, the following
permissions are required:
l The storage location must have been created with the USER usage type (see ADD_LOCATION).
l The user must already have been granted access to the storage location where the files exist, as
described in GRANT (Storage Location).
Saving the Reason for the Rejected Data Row
An EXCEPTIONS file stores the reason each rejected record was rejected during the parsing phase.
To save this file, include the EXCEPTIONS clause in the COPY statement and specify the file name or
absolute path for the file.
Note: You cannot specify an EXCEPTIONS file if you are using the REJECTED DATA..AS TABLE
clause. See Saving Load Exceptions (EXCEPTIONS).
If you are running COPY LOCAL operations, the file must reside on the client.
If path resolves to a storage location and the user invoking COPY is not a superuser, the following
permissions are required:
l The storage location must have been created with the USER usage type (see ADD_LOCATION).
l The user must already have been granted access to the storage location where the files exist, as
described in GRANT (Storage Location).
Enforcing Truncating or Rejecting Rows
(ENFORCELENGTH)
The ENFORCELENGTH parameter determines whether COPY truncates or rejects rows of data type
CHAR, VARCHAR, BINARY, and VARBINARY when they do not fit the target table. By default, COPY
truncates offending rows of these data types without rejecting them.
Bulk Loading Data

For example, if you omit the ENFORCELENGTH argument and load 'abc' into a table column
specified as VARCHAR(2), COPY truncates the value to 'ab' and loads it. If you load the same value
with the ENFORCELENGTH parameter, COPY rejects the 'abc' value, rather than truncating it.
Note: HP Vertica supports NATIVE and NATIVE VARCHAR values up to 65K. If any value
exceeds this limit, COPY rejects the row, even when ENFORCELENGTH is not in use.
Specifying Maximum Rejections Before a Load Fails
(REJECTMAX)
The REJECTMAX parameter specifies the maximum number of logical records that can be rejected
before a load fails. A rejected row consists of the data that caused an exception and could not be
parsed into the corresponding data type during a bulk load. Rejected data does not indicate
referential constraints.
When the number of rejected records will be greater than the REJECTMAX value (REJECTMAX+1), the
load fails. If you do not specify a value for REJECTMAX, or if the value is 0, COPY allows an unlimited
number of exceptions to occur.
Note: COPY does not accumulate rejected records across files or nodes while data is loading. If
one rejected data file exceeds the maximum reject number, the entire load fails.
Aborting Data Loads for Any Error (ABORT ON ERROR)
Using the ABORT ON ERROR argument is the most restrictive way to load data, because no
exceptions or rejections are allowed. A COPY operation stops if any row is rejected. No data is
loaded and HP Vertica rolls back the command.
If you use the ABORT ON ERROR as part of a CREATE EXTERNAL TABLE AS COPY FROM statement,
the option is used whenever a query references the external table. The offending error is saved in
the COPY exceptions or rejected data file.
Understanding Row Rejections and Rollback Errors
Depending on the type of error that COPY encounters, HP Vertica does one of the following:
l Rejects the offending row and loads the other rows into the table
l Rolls back the entire COPY statement without loading any data
Note: If you specify ABORT ON ERROR with the COPY command, the load automatically rolls
back if any row causes an exception. The offending row or error is written to the applicable
exceptions or rejected data file or to a rejections table, if specified.
Bulk Loading Data

The following table summarizes the difference between a rejected row or rollback. See also the
example following the table.
Rejects offending row Rolls back entire load
When HP Vertica encounters an error parsing
records in the input file, it rejects the offending
row and continues the load.
Rows are rejected if they contain any of the
following:
l Incompatible data types
l Missing fields
l Missing delimiters
When HP Vertica rolls back a COPY statement,
the command fails without loading data.
The following conditions cause a load rollback:
l Server-side errors, such as lack of memory
l Violations of primary key or foreign key
constraints
l Loading NULL data into a NOT NULL column
Example: Rejection versus Rollback
This example illustrates what happens when HP Vertica can't parse a row to the requested data
type. For example, in the following COPY statement, "a::INT + b::INT" is a SQL expression in
which "a" and "b" are derived values:
=> CREATE TABLE t (i INT);
=> COPY t (a FILLER VARCHAR, b FILLER VARCHAR, i AS a::INT + b::INT)
FROM STDIN;
>> cat|dog
>> .
Vertica Analytic Database cannot parse the row to the requested data type and rejects the row:
ERROR 2827: Could not convert "cat" from column "*FILLER*".a to an int8
If "a" were 'cat' and "b" 'dog', the expression 'cat'::INT + 'dog'::INT would return an error
and the COPY statement would fail (roll back) without loading any data.
=> SELECT 'cat'::INT + 'dog'::INT;
ERROR 3681: Invalid input syntax for integer: "cat"
The following statement would also roll back because Vertica Analytic Database can't parse the
row to the requested data type:
=> COPY t (a FILLER VARCHAR, i AS a::INT) FROM STDIN;
However, in the following COPY statement, Vertica Analytic Database rejects only the offending row
without rolling back the statement. Instead of evaluating the 'cat' row as a VARCHAR type, it
tried to parse 'cat' directly as an INTEGER.
Bulk Loading Data

=> COPY t (a FILLER INT, i AS a) FROM STDIN;
In the case of the above statement, the load parser (unlike the expression evaluator) rejects the row
if it contains a field that can't be parsed to the requested data type.
Saving Load Exceptions (EXCEPTIONS)
COPY exceptions consist of informational messages describing why a row of data could not be
parsed. The optional EXCEPTIONS parameter lets you specify a file to which COPY writes exceptions.
If you do not use this parameter, COPY saves exception files to the following default location:
catalog_dir/CopyErrorLogs/tablename-filename-of-source-copy-from-exceptions
catalog_dir The database catalog files directory
tablename-filename- of-source The names of the table and data file
-copy-from-exceptions The file suffix appended to the table and source file name
Note: Using the REJECTED DATA parameter with the AS TABLE clause is mutually exclusive
with specifying a load Exceptions file. Since the exceptions messages are included in the
rejected data table, COPY does not permit both options and displays this error if you try to use
them:
ERROR 0: Cannot specify both an exceptions file and a rejected table in the same stateme
nt
The optional EXCEPTIONS parameter lets you specify a file of your choice to which COPY writes load
exceptions. The EXCEPTIONS file indicates the input line number and the reason for each data
record exception in this format:
COPY: Input record number in <pathofinputfile> has been rejected (reason). Please see pat
htorejectfile, record recordnum for the rejected record.
If copying from STDIN, the filename-of-source is STDIN.
Note: You can use specific rejected data and exceptions files with one or more of the files you
are loading. Separate consecutive rejected data and exception file names with a comma (,) in
the COPY statement.
You must specify a filename in the path to load multiple input files. Keep in mind that long table
names combined with long data file names can exceed the operating system's maximum length
(typically 255 characters). To work around file names exceeding the maximum length, use a path
for the exceptions file that differs from the default path; for example, tmp<shorter-file-name>.
For all data loads (except for COPY LOCAL), COPY behaves as follows:
Bulk Loading Data

No Exceptions file specified... Exceptions file specified...
For one data source file
(pathToData or STDIN), all
information stored as one file in
the default directory.
For one data file, the path is treated as a file, and COPY
stores all information in this file. If the path is not a file,
COPY returns an error.
For multiple data files, all
information stored as separate
files, one for each data file in
default directory.
For multiple data files, the path is treated as a directory and
COPY stores all information in separate files, one for each
data file. If path is not a directory, COPY returns an error.
Exceptions files are not stored on the initiator node.
You can specify only one path per node. If you specify more
than one path per node, COPY returns an error.
Saving Load Rejections (REJECTED DATA)
COPY rejections are the data rows from the load file that caused a parser exception and did not load.
The optional REJECTED DATA parameter lets you specify either a file or table to which COPY writes
rejected data records. If you omit this parameter, COPY saves rejected data files to the following
location without saving a table:
catalog_dir/CopyErrorLogs/tablename-filename-of-source-copy-from-rejections
catalog_dir The database catalog files directory
tablename-filename-of-source The names of the table and data file
-copy-from-rejections The file suffix appended to the table and source file name
Once a rejections file exists, you can review its contents to resolve any load problems and reload
the data. If you save rejected data to a table, you can query the table to see both exceptions and the
rejected data.
If copying from STDIN, the filename of source is STDIN.
Note: You can use specific rejected data and exceptions files with one or more of the files you
are loading. Separate consecutive rejected data and exception file names with a comma (,) in
the COPY statement. Do not use the ON ANY NODE option with rejected data and exceptions
files, because ON ANY NODE is applicable only to the load file.
When you load multiple input files, you must specify a file name in the path. Keep in mind that long
input file names, combined with rejected data file names, can exceed the operating system's
maximum length (typically 255 characters). To work around file names exceeding the maximum
length, use a path for the rejected data file that differs from the default path; for example,
tmp<shorter-file-name>.
For all data loads (except for COPY LOCAL), COPY behaves as follows:
Bulk Loading Data

No rejected data file specified... Rejected data file specified...
For one data source file
(pathToData or STDIN), all
information stored as one file in the
default directory.
For one data file, the path is interpreted as a file, and COPY
stores all rejected data in this file. If the path is not a file,
COPY returns an error.
For multiple data files, all
information stored as separate
files, one for each data file in the
default directory.
For multiple data files, the path is treated as a directory and
COPY stores all information in separate files, one for each
data file. If path is not a directory, COPY returns an error.
Rejected data files are not shipped to the initiator node.
Only one path per node is accepted. If more than one is
provided, COPY returns an error.
Saving Rejected Data to a Table
Using the REJECTED DATA parameter with the AS TABLE clause lets you specify a table in which to
save the data. The capability to save rejected data is on by default. If you want to keep saving
rejected data to a file, do not use the AS TABLE clause.
When you use the AS TABLE clause, HP Vertica creates a new table if one does not exist. If no
parsing rejections occur during a load, the table exists but is empty. The next time you load data,
HP Vertica appends any rejected rows to the existing table.
The load rejection tables are a special type of table with the following capabilities and limitations:
l Support SELECT statements
l Can use DROP TABLE
l Cannot be created outside of a COPY statement
l Do not support DML and DDL activities
l Are not K-safe
To make the data in a rejected table K-safe, you can do one of the following:
l Write a CREATE TABLE..AS statement, such as this example:
CREATE TABLE new_table AS SELECT * FROM rejected_table;
l Create a table to store rejected records, and run INSERT..SELECT operations into the new table
If you use COPY NO COMMIT
Bulk Loading Data

If you include the NO COMMIT and REJECTED DATA AS TABLE clauses in your COPY statement and
the reject_table does not already exist, Vertica Analytic Database saves the rejected-data table
as a LOCAL TEMP table and returns a message that a LOCAL TEMP table is being created.
Rejected-data tables are especially useful for Extract-Load-Transform workflows (where you'll
likely use temporary tables more frequently) by letting you quickly load data and identify which
records failed to load due to a parsing error. If you load data into a temporary table that you created
using the ON COMMIT DELETE clause, the COPY operation won't commit, even if you specify the ON
COMMIT DELETE clause.
Rejection Records for Table Files
In the current implementation, the rejected records used to populate a table are also saved to a file
in the default rejected data file directory:
catalog_dir/CopyErrorLogs/CopyRejectedRecords
The file contents differ from the rejected data files that COPY saves by default. The rejected records
for table files contain rejected data and the reason for the rejection (exceptions), along with other
data columns, described next. HP Vertica suggests that you periodically log in to each server and
drop the rejections tables that you no longer need.
Querying a Rejection Records Table
To use the AS TABLE clause:
1. Create a sample table in which to load data:
dbs=> CREATE TABLE loader(a INT)
CREATE TABLE
2. Use the COPY statement to load values, and save any rejected rows to a table called loader_
rejects:
dbs=> COPY loader FROM STDIN REJECTED DATA AS table "loader_rejects";
>> 1
>> 2
>> 3
>> 4
>> a
>> b
>> c
>> .
3. Query the loader table after loading data into it directly. Notice only four (4) values exist:
dbs=> SELECT * FROM loader;
Bulk Loading Data

Value
-------
1
2
3
4
(4 rows)
4. Query the loader_rejects table to see its column rows. The following example lists one
record from the table (although more exist):
dbs=> x
dbs=> SELECT * FROM loader_rejects;
-[ RECORD 1 ]-------------+--------------------------------------------
node_name | v_dbs_node0001
file_name |
session_id | engvmqa2401-12617:0x1cb0
statement_id | 1
batch_number | 0
row_number | 5
rejected_data | a
rejected_data_orig_length | 1
rejected_reason | Invalid integer format 'a' for column 1 (a)
-[ RECORD 2 ]-------------+--------------------------------------------
The rejection data table has the following columns:
Column Data Type Description
node_name VARCHAR The name of the HP Vertica node on which the input
load file was located.
file_name VARCHAR The name of the file being loaded, which applies if you
loaded a file (as opposed to using STDIN).
session_id VARCHAR The session ID number in which the COPY statement
occurred.
transaction_id INTEGER Identifier for the transaction within the session, if any;
otherwise NULL.
Bulk Loading Data

statement_id INTEGER The unique identification number of the statement
within the transaction that included the rejected data.
Tip: You can use the session_id,
transaction_id, and statement_id columns to
create joins with many system tables. For
example, if you join against the QUERY_REQUESTS
table using those three columns, the QUERY_
REQUESTS.REQUEST column contains the actual
COPY statement (as a string) used to load this
data.
batch_number INTEGER INTERNAL USE. Represents which batch (chunk)
the data comes from.
row_number INTEGER The rejected row number from the input file.
rejected_data LONG VARCHAR The actual load data.
rejected_data_ori
g_length
INTEGER The length of the rejected data.
rejected_reason VARCHAR The error that caused the rejected row. This column
returns the same message that would exist in the load
exceptions file, if you saved exceptions to a file,
rather than to a table.
Exporting the Rejected Records Table
You can export the contents of the rejected_data column to a file, correct the rejected rows, and
load the corrected rows from the updated file.
To export rejected records:
1. Create a sample table:
dbs=> create table t (i int);
CREATE TABLE
2. Copy data directly into the table, using a table to store rejected data:
dbs=> copy t from stdin rejected data as table "t_rejects";
>> 1
>> 2
>> 3
Bulk Loading Data

>> 4
>> a
>> b
>> c
>> .
3. Show only tuples and set the output format:
dbs=> t
Showing only tuples.
dbs=> a
Output format is unaligned.
4. Output to a file (rejected.txt):
dbs=> o rejected.txt
dbs=> select rejected_data from t_rejects;
dbs=> o
5. Use the catcommand on the saved file:
dbs=> ! cat rejected.txt
a
b
c
dbs=>
After a file exists, you can fix load errors and use the corrected file as load input to the COPY
statement.
COPY Rejected Data and Exception Files
When executing a multi-node COPY statement, each node processes part of the load data. If the load
succeeds, all parser rejections that occur during the node's load processing are written to that
node's specific rejected data and exceptions files. If the load fails, the file contents can be
incomplete, or empty.
Both rejected data and exceptions files are saved and stored on a per-node basis. This example
uses multiple files as COPY inputs. Since the statement does not include either the REJECTED DATA
or EXCEPTIONS parameters, rejected data and exceptions files are written to the default location, the
database catalog subdirectory, CopyErrorLogs, on each node:
set dir `pwd`/data/ set remote_dir /vertica/test_dev/tmp_ms/
set file1 '''':dir'C1_large_tbl.dat'''
set file2 '''':dir'C2_large_tbl.dat'''
set file3 '''':remote_dir'C3_large_tbl.dat'''
set file4 '''':remote_dir'C4_large_tbl.dat'''
Bulk Loading Data

COPY large_tbl FROM :file1 ON site01, :file2 ON site01,
:file3 ON site02,
:file4 ON site02
DELIMITER '|';
Note: Always use the COPY statement REJECTED DATA and EXCEPTIONS parameters to save
load rejections. Using the RETURNREJECTED parameter is supported only for internal use by the
JDBC and ODBC drivers. HP Vertica's internal-use options can change without notice.
Specifying Rejected Data and Exceptions Files
The optional COPY REJECTED DATA and EXCEPTIONS parameters 'path' element lets you specify a
non-default path in which to store the files.
If path resolves to a storage location, and the user invoking COPY is not a superuser, these are the
required permissions:
l The storage location must have been created (or altered) with the USER option (see ADD_
LOCATION and ALTER_LOCATION_USE)
l The user must already have been granted READ access to the storage location where the file(s)
exist, as described in GRANT (Storage Location)
Both parameters also have an optional ON nodename clause that uses the specified path:
...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ]...[ REJECTED DATA 'path' [ ON nodename ]
[, ...] ]
While 'path' specifies the location of the rejected data and exceptions files (with their corresponding
parameters), the optional ON nodename clause moves any existing rejected data amd exception
files on the node to the specified path on the same node.
Saving Rejected Data and Exceptions Files to a Single
Server
The COPY statement does not have a facility to merge exception and rejected data files after COPY
processing is complete. To see the contents of exception and rejected data files requires logging on
and viewing each node's specific files.
Note: If you want to save all exceptions and rejected data files on a network host, be sure to
give each node's files unique names, so different cluster nodes do not overwrite other nodes'
files. For instance, if you set up a server with two directories, such as /vertica/exceptions
and /vertica/rejections, be sure the corresponding file names for each HP Vertica cluster
node identify each node, such as node01_exceptions.txt and node02_exceptions.txt, and
so on. In this way, each cluster node's files will reside in one directory, and be easily
distinguishable.
Bulk Loading Data

Using VSQL Variables for Rejected Data and Exceptions
Files
This example uses vsql variables to specify the path and file names to use with the exceptions
and rejected data parameters (except_s1 and reject_s1). The COPY statement specifies a
single input file (large_tbl) on the initiator node:
set dir `pwd`/data/ set file1 '''':dir'C1_large_tbl.dat'''
set except_s1 '''':dir'exceptions'''
set reject_s1 '''':dir'rejections'''
COPY large_tbl FROM :file1 ON site01DELIMITER '|'
REJECTED DATA :reject_s1 ON site01
EXCEPTIONS :except_s1 ON site01;
This example uses variables to specify exception and rejected date files (except_s2 and reject_
s2) on a remote node. The COPY statement consists of a single input file on a remote node
(site02):
set remote_dir /vertica/test_dev/tmp_ms/set except_s2 '''':remote_dir'exceptions'''
set reject_s2 '''':remote_dir'rejections'''
COPY large_tbl FROM :file1 ON site02DELIMITER '|'
REJECTED DATA :reject_s2 ON site02
EXCEPTIONS :except_s2 ON site02;
This example uses variables to specify that the exception and rejected data files are on a remote
node (indicated by :remote_dir). The inputs to the COPY statement consist of multiple data files
on two nodes (site01 and site02). The exceptions and rejected data options use ON
nodename clause with the vsql variables to indicate where the files reside (site01 and site02):
set dir `pwd`/data/ set remote_dir /vertica/test_dev/tmp_ms/
set except_s1 '''':dir''''
set reject_s1 '''':dir''''
set except_s2 '''':remote_dir''''
set reject_s2 '''':remote_dir''''
COPY large_tbl FROM :file1 ON site01, :file2 ON site01,
:file3 ON site02,
:file4 ON site02
DELIMITER '|'
REJECTED DATA :reject_s1 ON site01, :reject_s2 ON site02
EXCEPTIONS :except_s1 ON site01, :except_s2 ON site02;
COPY LOCAL Rejection and Exception Files
Invoking COPY LOCAL (or COPY LOCAL FROM STDIN) does not automatically create rejected
data and exceptions files. This behavior differs from using COPY, which saves both files
automatically, regardless of whether you use the optional REJECTED DATA and EXCEPTIONS
parameters to specify either file explicitly.
Bulk Loading Data

Use the REJECTED DATA and EXCEPTIONS parameters with COPY LOCAL and COPY LOCAL
FROM STDIN to save the corresponding output files on the client. If you do not use these options,
rejected data parsing events (and the exceptions that describe them) are not retained, even if they
occur.
You can load multiple input files using COPY LOCAL (or COPY LOCAL FROM STDIN). If you also
use the REJECTED DATA and EXCEPTIONS options, the statement writes rejected rows and
exceptions and to separate files. The respective files contain all rejected rows and corresponding
exceptions, respectively, regardless of how many input files were loaded.
Note: Because COPY LOCAL (and COPY LOCAL FROM STDIN) must write any rejected
rows and exceptions to the client, you cannot use the [ON nodename ] clause with either the
rejected data or exceptions options.
Specifying Rejected Data and Exceptions Files
To save any rejected data and their exceptions to files:
1. In the COPY LOCAL (and COPY LOCAL FROM STDIN) statement, use the REJECTED DATA
'path' and the EXCEPTIONS 'path' parameters, respectively.
2. Specify two different file names for the two options. You cannot use one file for both the
REJECTED DATA and the EXCEPTIONS.
3. When you invoke COPY LOCAL or COPY LOCAL FROM STDIN, the files you specify need
not pre-exist. If they do, COPY LOCAL must be able to overwrite them.
You can specify the path and file names with vsql variables:
set rejected ../except_reject/copyLocal.rejectedset exceptions ../except_reject/copyLoc
al.exceptions
Note: COPY LOCAL does not support storing rejected data in a table, as you can when using the
COPY statement.
When you use the COPY LOCAL or COPY LOCAL FROM STDIN statement, specify the variable
names for the files with their corresponding parameters:
COPY large_tbl FROM LOCAL rejected data :rejected exceptions :exceptions;
COPY large_tbl FROM LOCAL STDIN rejected data :rejected exceptions :exceptions;
Referential Integrity Load Violation
HP Vertica checks for constraint violations when queries are executed, not when loading data.
Bulk Loading Data

If you have a pre-joined projection defined on the table being loaded, HP Vertica checks for
constraint violations (duplicate primary keys or non-existent foreign keys) during the join operation
and reports errors. If there are no pre-joined projections, HP Vertica performs no such checks.
To avoid constraint violations, you can load data without committing it and then use the ANALYZE_
CONSTRAINTS function to perform a post-load check of your data. If the function finds constraint
violations, you can roll back the bulk load because you have not committed it.
See Also
l Detecting Constraint Violations
l COPY
l ANALYZE_CONSTRAINTS
Bulk Loading Data

Bulk Loading Data

Trickle Loading Data
Once you have a working database and have bulk loaded your initial data, you can use trickle
loading to load additional data on an ongoing basis. By default, HP Vertica uses the transaction
isolation level of READ COMMITTED, which allows users to see the most recently committed data
without holding any locks. This allows new data to be loaded while concurrent queries are running.
See Change Transaction Isolation Levels.
Using INSERT, UPDATE, and DELETE
The SQL data manipulation language (DML) commands INSERT, UPDATE, and DELETE perform
the same functions that they do in any ACID compliant database. The INSERT statement is
typically used to load individual rows into physical memory or load a table using INSERT AS
SELECT. UPDATE and DELETE are used to modify the data.
You can intermix the INSERT, UPDATE, and DELETE commands. HP Vertica follows the SQL-92
transaction model. In other words, you do not have to explicitly start a transaction but you must use
a COMMIT or ROLLBACK command (or COPY) to end a transaction. Canceling a DML statement
causes the effect of the statement to be rolled back.
HP Vertica differs from traditional databases in two ways:
l DELETE does not actually delete data from disk storage; it marks rows as deleted so that they
can be found by historical queries.
l UPDATE writes two rows: one with new data and one marked for deletion.
Like COPY, by default, INSERT, UPDATE and DELETE commands write the data to the WOS
and on overflow write to the ROS. For large INSERTS or UPDATES, you can use the DIRECT
keyword to force HP Vertica to write rows directly to the ROS. Loading large number of rows as
single row inserts are not recommended for performance reasons. Use COPY instead.
WOS Overflow
The WOS exists to allow HP Vertica to efficiently batch small loads into larger ones for I/O
purposes. Loading to the WOS is fast because the work of sorting, encoding, and writing to disk is
deferred and performed in the background by the Tuple Mover's moveout process. Since the WOS
has a finite amount of available space, it can fill up and force HP Vertica to spill small loads directly
to disk. While no data is lost or rejected when the WOS gets full, it can result in wasted I/O
bandwidth. Thus, follow the Tuning the Tuple Mover guidelines to avoid WOS overflow.

Copying and Exporting Data
HP Vertica can easily import data from and export data to other HP Vertica databases. Importing
and exporting data is useful for common tasks such as moving data back and forth between a
development or test database and a production database, or between databases that have different
purposes but need to share data on a regular basis.
Moving Data Directly Between Databases
Two statements move data to and from another HP Vertica database:
l COPY FROM VERTICA
l EXPORT TO VERTICA
To execute either of these statements requires first creating a connection to the other HP Vertica
database.
Creating SQL Scripts to Export Data
Three functions return a SQL script you can use to export database objects to recreate elsewhere:
l EXPORT_CATALOG
l EXPORT_OBJECTS
l EXPORT_TABLES
Copying and exporting data is similar to Backing Up and Restoring the Database. However, they
have two different purposes, outlined below:
Task
Backup and
Restore
COPY and EXPORT
Statements
Back up or restore an entire database, or incremental
changes
YES NO
Manage database objects (a single table or selected
table rows)
YES YES
Use external locations to back up and restore your
database
YES NO
Use direct connections between two databases NO YES
Use external shell scripts to back up and restore your
database
YES NO

Task
Backup and
Restore
COPY and EXPORT
Statements
Use SQL commands to incorporate copy and export
tasks into DB operations
NO YES
The following sections explain how you import and export data between HP Vertica databases.
When importing from or exporting to an HP Vertica database, you can connect only to a database
that uses trusted- (username-only) or password-based authentication. Neither LDAP nor SSL
authentication is supported. For more information, see the Implementing Security section of the
Vertica documentation.
Exporting Data
You can export data from an earlier HP Vertica release, as long as the earlier release is a version of
the last major release. For instance, for Version 6.x, you can export data from any version of 5.x,
but not from 4.x.
You can export a table, specific columns in a table, or the results of a SELECT statement to
another HP Vertica database. The table in the target database receiving the exported data must
already exist, and have columns that match (or can be coerced into) the data types of the columns
you are exporting.
Exported data is always written in AUTO mode.
When loading data using AUTO mode, HP Vertica inserts the data first into the WOS. If the WOS is
full, then HP Vertica inserts the data directly into ROS. See the COPY statement for more details.
Exporting data fails if either side of the connection is a single-node cluster installed to localhost or
you do not specify a host name or IP address.
Exporting is a three-step process:
1. Use the CONNECT SQL statement to connect to the target database that will receive your
exported data.
2. Use the EXPORT SQL statement to export the data. If you want to export multiple tables or the
results of multiple SELECT statements, you need to use multiple EXPORT statements. All
statements will use the same connection to the target database.
3. When you are finished exporting data, use the DISCONNECT SQL statement to disconnect
from the target database.
See the entries for CONNECT, EXPORT, and DISCONNECT statements in the SQL Reference
Manual for syntax details.

Exporting Identity Columns
When you use the EXPORT TO VERTICA statement, HP Vertica exports Identity (and Auto-
increment) columns as they are defined in the source data. The Identity column value does not
increment automatically, and requires that you use ALTER SEQUENCE to make updates.
Export Identity (and Auto-increment) columns as follows:
l If source and destination tables have an Identity column, you do not need to list them.
l If source has an Identity column, but not the destination, specify both the source and destination
columns.
Note: In earlier releases, Identity columns were ignored. Now, failure to list which Identity
columns to export can cause an error, because the Identity column is not ignored and will be
interpreted as missing in the destination table.
The default behavior for EXPORT TO VERTICA is to let you export Identity columns by specifying
them directly in the source table. To disable this behavior globally, set the
CopyFromVerticaWithIdentity configuration parameter, described in Configuration Parameters.
Examples of Exporting Data
The following example demonstrates using the three-step process listed above to export data.
First, open the connection to the other database, then perform a simple export of an entire table to
an identical table in the target database.
=> CONNECT TO VERTICA testdb USER dbadmin PASSWORD '' ON 'VertTest01',5433;
CONNECT
=> EXPORT TO VERTICA testdb.customer_dimension FROM customer_dimension;
Rows Exported
---------------
23416
(1 row)
The following statement demonstrates exporting a portion of a table using a simple SELECT
statement.
=> EXPORT TO VERTICA testdb.ma_customers AS SELECT customer_key, customer_name, annual_in
come
-> FROM customer_dimension WHERE customer_state = 'MA';
Rows Exported
---------------
3429
(1 row)

This statement exports several columns from one table to several different columns in the target
database table using column lists. Remember that when supplying both a source and destination
column list, the number of columns must match.
=> EXPORT TO VERTICA testdb.people (name, gender, age) FROM customer_dimension
-> (customer_name, customer_gender, customer_age);
Rows Exported
---------------
23416
(1 row)
You can export tables (or columns) containing Identity and Auto-increment values, but the
sequence values are not incremented automatically at their destination.
You can also use the EXPORT TO VERTICA statement with a SELECT AT EPOCH LATEST
expression to include data from the latest committed DML transaction.
Disconnect from the database when the export is complete:
=> DISCONNECT testdb;
DISCONNECT
Note: Closing your session also closes the database connection. However, it is a good
practice to explicitly close the connection to the other database, both to free up resources and
to prevent issues with other SQL scripts you may run in your session. Always closing the
connection prevents potential errors if you run a script in the same session that attempts to
open a connection to the same database, since each session can only have one connection to
a particular database at a time.
Copying Data
You can import a table or specific columns in a table from another HP Vertica database. The table
receiving the copied data must already exist, and have columns that match (or can be coerced into)
the data types of the columns you are copying from the other database. You can import data from
an earlier HP Vertica release, as long as the earlier release is a version of the last major release. For
instance, for Version 6.x, you can import data from any version of 5.x, but not from 4.x.
Importing and exporting data fails if either side of the connection is a single-node cluster installed to
localhost, or you do not specify a host name or IP address.
Importing is a three-step process:
1. Use the CONNECT SQL statement to connect to the source database containing the data you
want to import.
2. Use the COPY FROM VERTICA SQL statement to import the data. If you want to import
multiple tables, you need to use multiple COPY FROM VERTICA statements. They all use the
same connection to the source database.

3. When you are finished importing data, use the DISCONNECT SQL statement to disconnect
from the source database.
See the entries for CONNECT, COPY FROM VERTICA, and DISCONNECT statements in the
SQL Reference Manual for syntax details.
Importing Identity Columns
You can import Identity (and Auto-increment) columns as follows:
l If source and destination tables have an Identity column, you do not need to list them.
l If source has an Identity column, but not the destination, specify both the source and destination
columns.
Note: In earlier releases, Identity columns were ignored. Now, failure to list which Identity
columns to export can cause an error, because the Identity column is not ignored and will be
interpreted as missing in the destination table.
After importing the columns, the Identity column values do not increment automatically. Use
ALTER SEQUENCE to make updates.
The default behavior for this statement is to import Identity (and Auto-increment) columns by
specifying them directly in the source table. To disable this behavior globally, set the
CopyFromVerticaWithIdentity configuration parameter, described in Configuration Parameters.
Examples
This example demonstrates connecting to another database, copying the contents of an entire table
from the source database to an identically-defined table in the current database directly into ROS,
and then closing the connection.
=> CONNECT TO VERTICA vmart USER dbadmin PASSWORD '' ON 'VertTest01',5433;
CONNECT
=> COPY customer_dimension FROM VERTICA vmart.customer_dimension DIRECT;
Rows Loaded
-------------
500000
(1 row)
=> DISCONNECT vmart;
DISCONNECT
This example demonstrates copying several columns from a table in the source database into a
table in the local database.
=> CONNECT TO VERTICA vmart USER dbadmin PASSWORD '' ON 'VertTest01',5433;
CONNECT
=> COPY people (name, gender, age) FROM VERTICA

-> vmart.customer_dimension (customer_name, customer_gender,
-> customer_age);
Rows Loaded
-------------
500000
(1 row)
=> DISCONNECT vmart;
DISCONNECT
You can copy tables (or columns) containing Identity and Auto-increment values, but the sequence
values are not incremented automatically at their destination.

Using Public and Private IP Networks
In many configurations, HP Vertica cluster hosts use two network IP addresses as follows:
l A private address for communication between the cluster hosts.
l A public IP address for communication with for client connections.
By default, importing from and exporting to another HP Vertica database uses the private network.
To use the public network address for copy and export activities, configure the system to use the
public network to support exporting to or importing from another HP Vertica cluster:
l Identify the Public Network to HP Vertica
l Identify the Database or Nodes Used for Import/Export
Identify the Public Network to HP Vertica
To be able to import to or export from a public network, HP Vertica needs to be aware of the IP
addresses of the nodes or clusters on the public network that will be used for import/export
activities. Your public network might be configured in either of these ways:
l Public network IP addresses reside on the same subnet (create a subnet)
l Public network IP addresses are on multiple subnets (create a network interface)
To identify public network IP addresses residing on the same subnet:
l Use the CREATE SUBNET statement provide your subnet with a name and to identify the
subnet routing prefix.
To identify public network IP addresses residing on multiple subnets:
l Use the CREATE NETWORK INTERFACE statement to configure import/export from specific
nodes in the HP Vertica cluster.
After you've identified the subnet or network interface to be used for import/export, you must
Identify the Database Or Nodes Used For Impor/tExport.
See Also
l CREATE SUBNET
l ALTER SUBNET
l DROP SUBNET

l CREATE NETWORK INTERFACE
l ALTER NETWORK INTERFACE
l DROP NETWORK INTERFACE
Identify the Database or Nodes Used for Import/Export
Once you've identified the public network to HP Vertica, you can configure databases and nodes to
use the public network for import/export. You can configure by:
l Specifying a subnet for the database.
l Specifying a network interface for each node in the database.
To Configure a Database to Import/Export on the Public Network-
l Use the ALTER DATABASE statement to specify the subnet name of the public network. When
you do so, all nodes in the database will automatically use the network interface on the subnet
for import/export operations.
To Configure Each Individual Node to Import/Export on a Public Network:
l Use the ALTER NODE statement to specify the network interface of the public network on each
individual node.
See Also
l ALTER DATABASE
l CREATE SUBNET
l CREATE NETWORK INTERFACE
l V_MONITOR_NETWORK INTERFACES

Using EXPORT Functions
HP Vertica provides several EXPORT_ functions that let you recreate a database, or specific
schemas and tables, in a target database. For example, you can use the EXPORT_ functions to
transfer some or all of the designs and objects you create in a development or test environment to a
production database.
The EXPORT_ functions create SQL scripts that you can run to generate the exported database
designs or objects. These functions serve different purposes to the export statements, COPY
FROM VERTICA (pull data) and EXPORT TO VERTICA (push data). These statements transfer
data directly from source to target database across a network connection between both. They are
dynamic actions and do not generate SQL scripts.
The EXPORT_ functions appear in the following table. Depending on what you need to export, you
can use one or more of the functions. EXPORT_CATALOG creates the most comprehensive SQL
script, while EXPORT_TABLES and EXPORT_OBJECTS are subsets of that function to narrow
the export scope.
Use this
function... To recreate...
EXPORT_
CATALOG
These catalog items:
l An existing schema design, tables, projections, constraints, and views
l The Database Designer-created schema design, tables, projections,
constraints, and views
l A design on a different cluster.
EXPORT_
TABLES
Non-virtual objects up to, and including, the schema of one or more tables.
EXPORT_
OBJECTS
Catalog objects in order dependency for replication.
The designs and object definitions that the script creates depend on the EXPORT_ function scope
you specify. The following sections give examples of the commands and output for each function
and the scopes it supports.
Saving Scripts for Export Functions
All of the examples in this section were generated using the standard HP Vertica VMART
database, with some additional test objects and tables. One output directory was created for all
SQL scripts that the functions created:
/home/dbadmin/xtest

If you specify the destination argument as an empty string (''), the function writes the export
results to STDOUT.
Note: A superuser can export all available database output to a file with the EXPORT_
functions. For a non-superuser, the EXPORT_ functions generate a script containing only the
objects to which the user has access.
Exporting the Catalog
Exporting the catalog is useful to quickly move a database design to another cluster. The
EXPORT_CATALOG function generates a SQL script to run on a different cluster to replicate the
physical schema design of the source database. You choose what to export by specifying the
export scope:
To export... Enter this scope...
Schemas, tables, constraints, views, and projections. DESIGN
(This is the default scope.)
All design objects and system objects created in Database
Designer, such as design contexts and their tables.
DESIGN ALL
All tables, constraints, and projections. TABLES
Function Summary
Here is the function syntax, described in EXPORT_CATALOG in the SQL Reference Manual:
EXPORT_CATALOG ( [ 'destination' ] , [ 'scope' ] )
Exporting All Catalog Objects
Use the DESIGN scope to export all design elements of a source database in order dependency.
This scope exports all catalog objects in their OID (unique object ID) order, including schemas,
tables, constraints, views, and projections. This is the most comprehensive export scope, without
the Database Designer elements, if they exist.
Note: The result of this function yields the same SQL script as EXPORT_OBJECTS used
with an empty string ('') as its scope.
VMart=> select export_catalog('/home/dbadmin/xtest/sql_cat_design.sql','DESIGN');
export_catalog
-------------------------------------
Catalog data exported successfully
(1 row)

The SQL script includes the following types of statements, each needed to provision a new
database:
l CREATE SCHEMA
l CREATE TABLE
l CREATE VIEW
l CREATE SEQUENCE
l CREATE PROJECTION (with ORDER BY and SEGMENTED BY)
l ALTER TABLE (to add constraints)
l PARTITION BY
Projection Considerations
If a projection to export was created with no ORDER BY clause, the SQL script reflects the default
behavior for projections. <DB_SHORT> implicitly creates projections using a sort order based on
the SELECT columns in the projection definition. The EXPORT_CATALOG script reflects this
behavior.
The EXPORT_CATALOG script is portable as long as all projections were generated using
UNSEGMENTED ALL NODES or SEGMENTED ALL NODES.
Exporting Database Designer Schema and Designs
Use the DESIGN ALL scope to generate a script to recreate all design elements of a source
database and the design and system objects that were created by the Database Designer:
VMart=> select export_catalog('/home/dbadmin/xtest/sql_cat_design_all.sql','DESIGN_ALL');
export_catalog
-------------------------------------
(1 row)
Exporting Table Objects
Use the TABLES scope to generate a script to recreate all schemas tables, constraints, and
sequences:
VMart=> select export_catalog('/home/dbadmin/xtest/sql_cat_tables.sql','TABLES');
export_catalog
-------------------------------------
(1 row)
The SQL script includes the following types of statements:

l CREATE SCHEMA
l CREATE TABLE
l CREATE SEQUENCE
See Also
l EXPORT_CATALOG
l EXPORT_OBJECTS
l EXPORT_TABLES
l Exporting Tables
l Exporting Objects
Exporting Tables
Use the EXPORT_TABLES function to recreate one or more tables, and related objects, on a
different cluster. Specify one of the following options to determine the scope:
To export... Use this scope...
All non-virtual objects to which the user has access, including
constraints.
An empty string (' ')
One or more named objects, such as tables or sequences in one or
more schemas. You can optionally qualify the schema with a
database prefix, myvertica.myschema.newtable.
A comma-delimited list of
items:
'myschema.newtable,
yourschema.oldtable'
A named database object in the current search path. You can
specify a schema, table, or sequence. If the object is a schema,
the script includes non-virtual objects to which the user has
access.
A single object, 'myschema'
The SQL script includes only the non-virtual objects to which the current user has access.
Note: You cannot export a view with this function, even if a list includes the view relations.
Specifying a view name will not issue a warning, but the view will not exist in the SQL script.
Use EXPORT_OBJECTS.

Function Syntax
EXPORT_TABLES ( [ 'destination' ] , [ 'scope' ] )
For more information, see EXPORT_TABLES in the SQL Reference Manual.
Exporting All Tables and Related Objects
Specify an empty string ('') for the scope to export all tables and their related objects.
VMart=> select export_tables('/home/dbadmin/xtest/sql_tables_empty.sql','');
export_tables
-------------------------------------
(1 row)
The SQL script includes the following types of statements, depending on what is required to
recreate the tables and any related objects (such as sequences):
l CREATE SCHEMA
l CREATE TABLE
l CREATE SEQUENCE
l PARTITION BY
Exporting a List Tables
Use EXPORT_TABLE with a comma-separated list of objects, including tables, views, or
schemas:
VMart=> select export_tables('/home/dbadmin/xtest/sql_tables_del.sql','public.student, pu
blic.test7');
export_tables
-------------------------------------
(1 row)
The SQL script includes the following types of statements, depending on what is required to create
the list of objects:
l CREATE SCHEMA
l CREATE TABLE

l CREATE SEQUENCE
Exporting a Single Table or Object
Use the EXPORT_TABLES function to export one or more database table objects.
This example exports a named sequence, my_seq, qualifying the sequence with the schema name
(public):
VMart=> select export_tables('/home/dbadmin/xtest/export_one_sequence.sql', 'public.my_se
q');
export_tables
-------------------------------------
(1 row)
Following are the contents of the export_one_sequence.sql output file using a more command:
[dbadmin@node01 xtest]$ more export_one_sequence.sql
CREATE SEQUENCE public.my_seq ;
Exporting Objects
Use EXPORT_OBJECTS function to recreate the exported objects. Specify one of the following
options to determine the scope:
To export... Use this scope...
All non-virtual objects to which the user has access, including
constraints.
An empty string (' ')
One or more named objects, such as tables or views in one or
more schemas. You can optionally qualify the schema with a
database prefix, myvertica.myschema.newtable.
A comma-delimited list of
items:
'myschema.newtable,
yourschema.oldtable'
A named database object in the current search path. You can
specify a schema, table, or view. If the object is a schema, the
script includes non-virtual objects to which the user has access.
A single object, 'myschema'
The SQL script includes only the non-virtual objects to which the current user has access.
The EXPORT_OBJECTS function always attempts to recreate projection statements with the
KSAFE clauses that existed in the original definitions, or with OFFSET clauses, if they did not.

Function Syntax
EXPORT_OBJECTS( [ 'destination' ] , [ 'scope' ] , [ 'ksafe' ] )
For more information, see EXPORT_OBJECTS in the SQL Reference Manual.
Exporting All Objects
Specify an empty string ('') for the scope to export all non-virtual objects from the source database
in order dependency. Running the generated SQL script on another cluster creates all referenced
objects and their dependent objects.
By default, this function includes the KSAFE argument as true, so the script includes the MARK_
DESIGN_KSAFE statement. Using this function is useful to run the generated SQL script in a new
database so it will inherit the K-safety value of the original database.
Note: The result of this function yields the same SQL script as EXPORT_CATALOG with a
DESIGN scope.
VMart=> select export_objects('/home/dbadmin/xtest/sql_objects_all.sql','', 'true');
export_objects
-------------------------------------
(1 row)
The SQL script includes the following types of statements:
l CREATE SCHEMA
l CREATE TABLE
l CREATE VIEW
l CREATE SEQUENCE
l CREATE PROJECTION (with ORDER BY and SEGMENTED BY)
l PARTITION BY
Here is a snippet from the start of the output SQL file, and the end, showing the KSAFE statement:
CREATE SCHEMA store;CREATE SCHEMA online_sales;
CREATE SEQUENCE public.my_seq ;
CREATE TABLE public.customer_dimension
(

customer_key int NOT NULL,
customer_type varchar(16),
customer_name varchar(256),
customer_gender varchar(8),
title varchar(8),
household_id int,
.
.
.
);
.
.
.
SELECT MARK_DESIGN_KSAFE(0);
Exporting a List of Objects
Use a comma-separated list of objects as the function scope. The list can include one or more
tables, sequences, and views in the same, or different schemas, depending on how you qualify the
object name. For instance, specify a table from one schema, and a view from another
(schema2.view1).
The SQL script includes the following types of statements, depending on what objects you include
in the list:
l CREATE SCHEMA
l CREATE TABLE
l CREATE VIEW
l CREATE SEQUENCE
If you specify a view without its dependencies, the function displays a WARNING. The SQL script
includes a CREATE statement for the dependent object, but will be unable to create it without the
necessary relations:
VMart=> select export_objects('nameObjectsList', 'test2, tt, my_seq, v2' );
WARNING 0: View public.v2 depends on other relations
export_objects
-------------------------------------
(1 row)
This example includes the KSAFE argument explicitly:
VMart=> select export_objects('/home/dbadmin/xtest/sql_objects_table_view_KSAFE.sql','v1,
test7', 'true');
export_objects

-------------------------------------
(1 row)
Here are the contents of the output file of the example, showing the sample table test7 and the v1
view:
CREATE TABLE public.test7
(
a int,
c int NOT NULL DEFAULT 4,
bb int
);
CREATE VIEW public.v1 AS
SELECT tt.a
FROM public.tt;
Exporting a Single Object
Specify a single database object as the function scope. The object can be a schema, table,
sequence, or view. The function exports all non-virtual objects associated with the one you specify.
VMart=> select export_objects('/home/dbadmin/xtest/sql_objects_viewobject_KSAFE.sql','v1'
, 'KSAFE');
export_objects
-------------------------------------
(1 row)
The output file contains the v1 view:
CREATE VIEW public.v1 AS
SELECT tt.a
FROM public.tt;

Bulk Deleting and Purging Data
HP Vertica provides multiple techniques to remove data from the database in bulk.
Command Description
DROP
TABLE
Permanently removes a table and its definition. Optionally removes associated
views and projections as well.
DELETE
FROM
TABLE
Marks rows with delete vectors and stores them so data can be rolled back to a
previous epoch. The data must eventually be purged before the database can
reclaim disk space. See Purging Deleted Data.
TRUNCATE
TABLE
Removes all storage and history associated with a table. The table structure is
preserved for future use. The results of this command cannot be rolled back.
DROP_
PARTITION
Removes one partition from a partitioned table. Each partition contains a related
subset of data in the table. Partitioned data can be dropped efficiently, and
provides query performance benefits. See Working with Table Partitions.
The following table provides a quick reference for the different delete operations you can use. The
"Saves History" column indicates whether data can be rolled back to an earlier epoch and queried at
a later time.
Syntax Performance
Commits
Tx
Saves
History
DELETE FROM base_table Normal No Yes
DELETE FROM temp_table High No No
DELETE FROM base_table WHERE Normal No Yes
DELETE FROM temp_table WHERE Normal No Yes
DELETE FROM temp_table WHERE temp_table ON COMMIT PRE
SERVE
ROWS
Normal No Yes
DELETE FROM temp_table WHERE temp_table ON COMMIT DEL
ETE
ROWS
High Yes No
DROP base_table High Yes No
TRUNCATE base_table High Yes No
TRUNCATE temp_table High Yes No
DROP PARTITION High Yes No

Choosing the Right Technique for Deleting Data
l To delete both table data and definitions and start from scratch, use the DROP TABLE
[CASCADE] command.
l To drop data, while preserving table definitions so that you can quickly and easily reload data,
use TRUNCATE TABLE. Note that unlike DELETE, TRUNCATE does not have to mark each
row with delete vectors, so it runs much more quickly.
l To perform bulk delete operations on a regular basis, HP Vertica recommends using Partitioning.
l To perform occasional small deletes or updates with the option to roll back or review history, use
DELETE FROM TABLE. See Best Practices for DELETE and UPDATE.
For details on syntax and usage, see DELETE, DROP TABLE, TRUNCATE TABLE, CREATE
TABLE and DROP_PARTITION in the SQL Reference Manual.

Best Practices for DELETE and UPDATE
HP Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not
achieve the same level of performance as other queries. DELETE and UPDATE operations go to the
WOS by default, but if the data is sufficiently large and would not fit in memory, HP Vertica
automatically switches to using the ROS. See Using INSERT, UPDATE, and DELETE.
The topics that follow discuss best practices when using DELETE and UPDATE operations in HP
Vertica.
Performance Considerations for DELETE and UPDATE
Queries
To improve the performance of your DELETE and UPDATE queries, consider the following issues:
l Query performance after large deletes—A large number of (unpurged) deleted rows can
negatively affect query performance.
To eliminate rows that have been deleted from the result, a query must do extra processing. If
10% or more of the total rows in a table have been deleted, the performance of a query on the
table degrades. However, your experience may vary depending on the size of the table, the table
definition, and the query. If a table has a large number of deleted rows, consider purging those
rows to improve performance. For more information on purging, see Purging Deleted Data.
l Recovery performance—Recovery is the action required for a cluster to restore K-safety after
a crash. Large numbers of deleted records can degrade the performance of a recovery. To
improve recovery performance, purge the deleted rows. For more information on purging, see
Purging Deleted Data.
l Concurrency—DELETE and UPDATE take exclusive locks on the table. Only one DELETE or
UPDATE transaction on a table can be in progress at a time and only when no loads (or INSERTs)
are in progress. DELETEs and UPDATEs on different tables can be run concurrently.
l Pre-join projections—Avoid pre-joining dimension tables that are frequently updated. DELETE
and UPDATE operations on pre-join projections cascade to the fact table, causing large DELETE or
UPDATE operations.
For detailed tips about improving DELETE and UPDATE performance, see Optimizing DELETEs and
UPDATEs for Performance.
Caution: HP Vertica does not remove deleted data immediately but keeps it as history for the
purposes of historical query. A large amount of history can result in slower query
performance. For information about how to configure the appropriate amount of history to
retain, see Purging Deleted Data.

Optimizing DELETEs and UPDATEs for Performance
The process of optimizing DELETE and UPDATE queries is the same for both operations. Some
simple steps can increase the query performance by tens to hundreds of times. The following
sections describe several ways to improve projection design and improve DELETE and UPDATE
queries to significantly increase DELETE and UPDATE performance.
Note: For large bulk deletion, HP Vertica recommends using Partitioned Tables where
possible because they provide the best DELETE performance and improve query performance.
Projection Column Requirements for Optimized Deletes
When all columns required by the DELETE or UPDATE predicate are present in a projection, the
projection is optimized for DELETEs and UPDATEs. DELETE and UPDATE operations on such
projections are significantly faster than on non-optimized projections. Both simple and pre-join
projections can be optimized.
For example, consider the following table and projections:
CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;
In the following query, both p1 and p2 are eligible for DELETE and UPDATE optimization because
column a is available:
DELETE from t WHERE a = 1;
In the following example, only projection p1 is eligible for DELETE and UPDATE optimization because
the b column is not available in p2:
DELETE from t WHERE b = 1;
Optimized Deletes in Subqueries
To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE
statement's WHERE clause must be in the projection definition.
For example, the following simple schema has two tables and three projections:
CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
CREATE TABLE tb2 (g INT, h INT, i INT, j INT);
The first projection references all columns in tb1 and sorts on column a:

CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;
The buddy projection references and sorts on column a in tb1:
CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;
This projection references all columns in tb2 and sorts on column i:
CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;
Consider the following DML statement, which references tb1.a in its WHERE clause. Since both
projections on tb1 contain column a, both are eligible for the optimized DELETE:
DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);
Restrictions
Optimized DELETEs are not supported under the following conditions:
l With pre-join projections on nodes that are down
l With replicated and pre-join projections if subqueries reference the target table. For example, the
following syntax is not supported:
DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e = tb1.e);
l With subqueries that do not return multiple rows. For example, the following syntax is not
supported:
DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
Projection Sort Order for Optimizing Deletes
Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in the
sort order of all projections for large DELETEs and UPDATEs.
For example, suppose most of the DELETE queries you perform on a projection look like the
following:
DELETE from t where time_key < '1-1-2007'
To optimize the DELETEs, make time_key appear in the ORDER BY clause of all your projections.
This schema design results in better performance of the DELETE operation.

In addition, add additional sort columns to the sort order such that each combination of the sort key
values uniquely identifies a row or a small set of rows. For more information, see Choosing Sort
Order: Best Practices. To analyze projections for sort order issues, use the EVALUATE_DELETE_
PERFORMANCE function.

Purging Deleted Data
In HP Vertica, delete operations do not remove rows from physical storage. Unlike most
databases, the DELETE command in HP Vertica marks rows as deleted so that they remain
available to historical queries. These deleted rows are called historical data. Retention of
historical data also applies to the UPDATE command, which is actually a combined DELETE and
INSERT operation.
The cost of retaining deleted data in physical storage can be measured in terms of:
l Disk space for the deleted rows and delete markers
l A performance penalty for reading and skipping over deleted data
A purge operation permanently removes deleted data from physical storage so that the disk space
can be reused. HP Vertica gives you the ability to control how much deleted data is retained in the
physical storage used by your database by performing a purge operation using one of the following
techniques:
l Setting a Purge Policy
l Manually Purging Data
Both methods set the Ancient History Mark (AHM), which is an epoch that represents the time
until which history is retained. History older than the AHM are eligible for purge.
Note: Large delete and purge operations in HP Vertica could take a long time to complete, so
use them sparingly. If your application requires deleting data on a regular basis, such as by
month or year, HP recommends that you design tables that take advantage of table
partitioning. If partitioning tables is not suitable, consider the procedure described in Rebuilding
a Table. The ALTER TABLE..RENAME command lets you build a new table from the old table,
drop the old table, and rename the new table in its place.
Setting a Purge Policy
The preferred method for purging data is to establish a policy that determines which deleted data is
eligible to be purged. Eligible data is automatically purged when the Tuple Mover performs
mergeout operations.
HP Vertica provides two methods for determining when deleted data is eligible to be purged:
l Specifying the time for which delete data is saved
l Specifying the number of epochs that are saved

Specifying the Time for Which Delete Data Is Saved
Specifying the time for which delete data is saved is the preferred method for determining which
deleted data can be purged. By default, HP Vertica saves historical data only when nodes are
down.
To change the the specified time for saving deleted data, use the HistoryRetentionTime
configuration parameter:
=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionTime', '{ <seconds> | -1 }' );
In the above syntax:
l seconds is the amount of time (in seconds) for which to save deleted data.
l -1 indicates that you do not want to use the HistoryRetentionTime configuration parameter to
determine which deleted data is eligible to be purged. Use this setting if you prefer to use the
other method (HistoryRetentionEpochs) for determining which deleted data can be purged.
The following example sets the history epoch retention level to 240 seconds:
=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionTime', '240');
Specifying the Number of Epochs That Are Saved
Unless you have a reason to limit the number of epochs, HP recommends that you specify the time
over which delete data is saved.
To specify the number of historical epoch to save through the HistoryRetentionEpochs
configuration parameter:
1. Turn off the HistoryRetentionTime configuration parameter:
=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionTime', '-1');
2. Set the history epoch retention level through the HistoryRetentionEpochs configuration
parameter:
=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionEpochs', '{<num_epochs>|-1}');
n num_epochs is the number of historical epochs to save.
n -1 indicates that you do not want to use the HistoryRetentionEpochs configuration
parameter to trim historical epochs from the epoch map. By default,
HistoryRetentionEpochs is set to -1.
The following example sets the number of historical epochs to save to 40:

=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionEpochs', '40');
Modifications are immediately implemented across all nodes within the database cluster. You do
not need to restart the database.
Note: If both HistoryRetentionTime and HistoryRetentionEpochs are specified,
HistoryRetentionTime takes precedence.
See Epoch Management Parameters for additional details.
Disabling Purge
If you want to preserve all historical data, set the value of both historical epoch retention parameters
to -1, as follows:
=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionTime', '-1');
=> SELECT SET_CONFIG_PARAMETER('HistoryRetentionEpochs', '-1');
Manually Purging Data
Manually purging deleted data consists of the following series of steps:
1. Determine the point in time to which you want to purge deleted data.
2. Set the Ancient History Mark (AHM) to this point in time using one of the following SQL
functions (described in the SQL Reference Manual):
n SET_AHM_TIME() sets the AHM to the epoch that includes the specified TIMESTAMP
value on the initiator node.
n SET_AHM_EPOCH() sets the AHM to the specified epoch.
n GET_AHM_TIME() returns a TIMESTAMP value representing the AHM.
n GET_AHM_EPOCH() returns the number of the epoch in which the AHM is located.
n MAKE_AHM_NOW() sets the AHM to the greatest allowable value (now), and lets you drop
pre-existing projections. This purges all deleted data.
When you use SET_AHM_TIME or GET_AHM_TIME, keep in mind that the timestamp you
specify is mapped to an epoch, which has (by default) a three-minute granularity. Thus, if you
specify an AHM time of '2008-01-01 00:00:00.00' the resulting purge could permanently
remove as much as the first three minutes of 2008, or could fail to remove the last three
minutes of 2007.

Note: The system prevents you from setting the AHM beyond the point at which it would
prevent recovery in the event of a node failure.
3. Manually initiate a purge using one of the following SQL functions (described in the SQL
Reference Manual):
n PURGE_PROJECTION() purges a specified projection.
n PURGE_TABLE() purges all projections on the specified table.
n PURGE() purges all projections in the physical schema.
4. The Tuple Mover performs a mergeout operation to purge the data.
Note: Manual purge operations can take a long time.

Managing the Database
This section describes how to manage the HP Vertica database, including:
l Connection Load Balancing
l Managing Nodes
l Adding Disk Space to a Node
l Managing Tuple Mover Operations
l Managing Workloads
Connection Load Balancing
Each client connection to a host in the HP Vertica cluster requires a small overhead in memory and
processor time. If many clients connect to a single host, this overhead can begin to affect the
performance of the database. You can attempt to spread the overhead of client connections by
dictating that certain clients connect to specific hosts in the cluster. However, this manual
balancing becomes difficult as new clients and hosts are added to your environment.
Connection load balancing helps automatically spread the overhead caused by client connections
across the cluster by having hosts redirect client connections to other hosts. By redirecting
connections, the overhead from client connections is spread across the cluster without having to
manually assign particular hosts to individual clients. Clients can connect to a small handful of
hosts, and they are naturally redirected to other hosts in the cluster.
There are two ways you can implement load balancing on your HP Vertica cluster:
l Native connection load balancing is a feature built into the HP Vertica server and client libraries
that redirect client connections at the application level.
l Internet Protocol Virtual Server (IPVS) is software that can be installed on several hosts in the
HP Vertica cluster that provides connection load balancing at the network level.
In most situations, you should use native connection load balancing instead of IPVS. The following
sections explain each option in detail and explain their benefits.
Native Connection Load Balancing Overview
Native connection load balancing is a feature built into the Vertica Analytic Database server and
client libraries as well as vsql. Both the server and the client need to enable load balancing for it to
function. If connection load balancing is enabled, a host in the database cluster can redirect a
client's attempt to it to another currently-active host in the cluster. This redirection is based on a
load balancing policy. This redirection can only take place once, so a client is not bounced from one
host to another.

Since native connection load balancing is incorporated into the HP Vertica client libraries, any client
application that connects to HP Vertica transparently takes advantage of it simply by setting a
connection parameter.
For more about native connection load balancing, see About Native Connection Load Balancing.
IPVS Overview
IPVS is a feature of the Linux kernel which lets a single host act as a gateway to an entire cluster of
hosts. The load balancing host creates a virtual IP address on the network. When a client connects
to the virtual IP address, the IPVS load balancer transparently redirects the connection to one of the
hosts in the cluster. For more on IPVS, see Connection Load Balancing Using IPVS.
Choosing Whether to Use Native Connection Load
Balancing or IPVS
Native connection load balancing several advantages over IPVS. Native connection load balancing
is:
l Easy to set up. All that is required is that the client and server enable connection load balancing.
IPVS requires the configuration of several additional software packages and the network.
l Easily overridden in cases where you need to connect to a specific host (for example, when
using the COPY statement to load a file a specific host's filesystem).
l Less at risk of host failures affecting connection load balancing. IPVS is limited to a master and
a backup server. All hosts in HP Vertica are capable of redirecting connections for connection
load balancing. As long as HP Vertica is running, it can perform connection load balancing.
l Less memory and CPU intensive than running a separate IPVS process on several database
hosts.
l IPVS hurts load performance, since there is an additional network hop from the IPVS server to a
host in the database. Using native connection load balancing, clients connect directly to hosts in
the Vertica Analytic Database cluster.
l Supported by HP. Currently, IPVS is not being actively supported.
l Is supported on all Linux platforms supported by HP Vertica. HP only supplies IPVS installation
packages on Red Hat. IPVS has known compatibility issues with Debian-based Linux
distributions.
There are a few cases where IPVS may be more suitable:
l Restrictive firewalls between the clients and the hosts in the database cluster can interfere with
native connection load balancing. In order for native connection load balancing to work, clients
must be able to access every host in the HP Vertica cluster. Any firewall between the hosts and
the clients must be configured to allow connections to the HP Vertica database. Otherwise, a

client could be redirected to a host that it cannot access. IPVS can be configured so that the
virtual IP address clients use to connect is outside of the firewall while the hosts can be behind
the firewall.
l Since native connection load balancing works at the application level, clients must opt-in to have
their connections load balanced. Because it works at the network protocol level, IPVS can force
connections to be balanced.
How you choose to implement connection load balancing depends on your network environment.
Since native load connection balancing is easier to implement, you should use it unless your
network configuration requires that clients be separated from the hosts in the HP Vertica database
by a firewall.
About Native Connection Load Balancing
Native connection load balancing is a feature built into the HP Vertica server and client libraries that
helps spread the CPU and memory overhead caused by client connections across the hosts in the
database. It can prevent hosts from becoming burdened by having many client connections while
other hosts in the cluster do not.
Native connection load balancing only has an effect when it is enabled by both the server and the
client. When both have enabled native connection load balancing, the following process takes place
whenever the client attempts to open a connection to HP Vertica::
1. The client connects to a host in the database cluster, with a connection parameter indicating
that it is requesting a load-balanced connection.
2. The host chooses a host from the list of currently up hosts in the database based on the current
load balancing scheme (see below).
3. The host tells the client which host it has selected to handle the client's connection.
4. If the host chose another host in the database to handle the client connection, the client
disconnects from the initial host. Otherwise, the client jumps to step 6.
5. The client establishes a connection to the host that will handle its connection. The client sets
this second connection request so that the second host does not interpret the connection as a
request for load balancing.
6. The client connection proceeds as usual, (negotiating encryption if the connection has
SSL enabled, and proceeding to authenticating the user ).
This entire process is transparent to the client application. The client driver automatically
disconnects from the initial host and reconnects to the host selected for load balancing.

Notes
l Native connection load balancing works with the ADO.NET driver's connection pooling. The
connection the client makes to the initial host and the final connection to the load balanced host
used pooled connections if they are available.
l For client applications using the JDBC and ODBC driver in conjunction with third-party
connection pooling solutions, the initial connection is not pooled since it is not a full client
connection. The final connection is pooled, since it is a standard client connection.
l The client libraries include a failover feature that allow them to connect to backup hosts if the
host specified in the connection properties is unreachable. When using native connection load
balancing, this failover feature is only used for the initial connection to the database. If the host
to which the client was redirected does not respond to the client's connection request, the client
does not attempt to connect to a backup host and instead returns a connection error to the user.
Since clients are only redirected to hosts that are known to be up, this sort of connection failure
should only occur if the targeted host happened to go down at the same moment the client is
redirected to it. See ADO.NET Connection Failover, JDBC Connection Failover, and ODBC
Connection Failover in the Programmer's Guide for more information.
Load Balancing Schemes
The load balancing scheme controls how a host selects which host to handle a client connection.
There are three available schemes:
l NONE: Disables native connection load balancing. This is the default setting.
l ROUNDROBIN: Chooses the next host from a circular list of currently up hosts in the database
(i.e. node #1, node #2, node #3, etc. until it wraps back to node #1 again). Each host in the
cluster maintains its own pointer to the next host in the circular list, rather than there being a
single cluster-wide state.
l RANDOM: Chooses a host at random from the list of currently up hosts in the cluster.
You set the native connection load balancing scheme using the SET_LOAD_BALANCE_POLICY
function. See Enabling and Disabling Native Connection Load Balancing for instructions.
Related Tasks
Enabling and Disabling Native Connection Load
Balancing
Only a database superuser can enable or disable native connection load balancing. To enable or
disable load balancing, use the SET_LOAD_BALANCE_POLICY function to set the load balance
policy. Setting the load balance policy to anything other than 'NONE' enables load balancing on the

server. The following example enables native connection load balancing by setting the load
balancing policy to ROUNDROBIN.
=> SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN');
SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: roundrobin
(1 row)
To disable native connection load balancing, use SET_LOAD_BALANCE_POLICY to set the
policy to 'NONE':
=> SELECT SET_LOAD_BALANCE_POLICY('NONE');
SET_LOAD_BALANCE_POLICY
--------------------------------------------------------------------------
Successfully changed the client initiator load balancing policy to: none
(1 row)
Note: By default, client connections are not load balanced, even when connection load
balancing is enabled on the server. Clients must set a connection parameter to indicates they
are willing to have their connection request load balanced. See Enabling Native Connection
Load Balancing in ADO.NET, Enabling Native Connection Load Balancing in JDBC and
Enabling Native Connection Load Balancing in ODBC in the Programmer's Guide for more
information.
Resetting the Load Balancing State
When the load balancing policy is ROUNDROBIN, each host in the HP Vertica cluster maintains
its own state of which host it will select to handle the next client connection. You can reset this
state to its initial value (usually, the host with the lowest-node id) using the RESET_LOAD_
BALANCE_POLICY function:
=> SELECT RESET_LOAD_BALANCE_POLICY();
RESET_LOAD_BALANCE_POLICY
-------------------------------------------------------------------------
Successfully reset stateful client load balance policies: "roundrobin".
(1 row)
Related Information
Related Tasks
Monitoring Native Connection Load Balancing
Query the LOAD_BALANCE_POLICY column of the V_MONITOR.DATABASES to determine the
state of native connection load balancing on your server:

=> SELECT LOAD_BALANCE_POLICY FROM V_CATALOG.DATABASES;
LOAD_BALANCE_POLICY
---------------------
roundrobin
(1 row)
Determining to which Node a Client Has Connected
A client can determine the node to which is has connected by querying the NODE_NAME column
of the V_MONITOR.CURRENT_SESSION table:
=> SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION;
NODE_NAME
------------------
v_vmart_node0002
(1 row)
Related Information
Related Tasks

Connection Load Balancing Using IPVS
The IP Virtual Server (IPVS) provides network-protocol-level connection load balancing. When
used with an HP Vertica database cluster, it is installed on two database hosts.
IPVS is made up of the following components:
l The Virtual IP (VIP): The IP address that is accessed by all client connections.
l Real server IPs (RIP): The IP addresses of client network interfaces used for connecting
database clients to the database engine.
l Cluster: A cluster of real HP Vertica servers (nodes).
l Virtual server: The single point of entry that provides access to a cluster, based on dynamic
node selection.
The IPVS load balancer is two node stand-by redundancy only. The IPVS redundancy model is
different than that of the HP Vertica Analytic Database. See Failure Recovery for details on HP
Vertica redundancy.
Client connections made through the Virtual IP (VIP) are managed by a primary (master) director
node, which is one of the real server nodes (RIP). The master director handles the routing of
requests by determining which node has the fewest connections and sending connections to that
node. If the director node fails for any reason, a failover (slave) director takes over request routing
until the primary (master) director comes back online.
For example, if a user connects to node03 in a three-node cluster and node03 fails, the current
transaction rolls back, the client connection fails, and a connection must be reestablished on
another node.
The following graphic illustrates a three-node database cluster where all nodes share a single VIP.
The cluster contains a master director (node01), a slave director (node02), and an additional host
(node03) that together provide the minimum configuration for high availability (K-safety). In this
setup (and in the configuration and examples that follow), node01 and node02 play dual roles as
IPVS directors and HP Vertica nodes.

Notes
l Load balancing on a VIP is supported for Linux Red Hat Enterprise Linux 5 and 6, 64-bit.
l HP Vertica must be installed on each node in the cluster; the database can be installed on any
node, but only one database can be running on an HP Vertica cluster at a time.
l Although a 0 K-safety (two-node) design is supported, HP strongly recommends that you create
the load-balancing network using a minimum three-node cluster with K-safety set to 1. This way
if one node fails, the database stays up. See Designing for K-Safety for details.
l When K-safety is set to 1, locate the IPVS master and slave on HP Vertica database nodes that
comprise a buddy projections pair. This is the best way to ensure high-availability load-
balancing. See High Availability Through Projections for details on buddy projections.
l If the node that is the IPVS master fails completely, the slave IPVS takes over load balancing.
However, if the master only partially fails (i.e., it loses some of its processes but the node is still
up), you may have to modify IP addresses to direct network traffic to the slave node.
Alternatively, you can try to restart the processes on the master.
l Subsequent topics in this section describe how to set up two directors (master and slave), but
you can set up more than two directors. See the Keepalived User Guide for details. See also the
Linux Virtual Server Web site.

Configuring HP Vertica Nodes
This section describes how to configure an HP Vertica cluster of nodes for load balancing. You'll
set up two directors in a master/slave configuration and include a third node for K-safety.
AN HP Vertica cluster designed for load balancing uses the following configuration:
l Real IP (RIP) address is the public interface and includes:
n The master director/node, which handles the routing of requests. The master is collocated
with one of the database cluster nodes
n The slave director/node, which communicates with the master and takes over routing
requests in the event of a master node failure. The slave is collocated with another database
cluster node
n nodes database cluster, such as at least one failover node to provide the minimum configuration
for high availability. (K-safety).
l Virtual IP (VIP) address (generally assigned to eth0 in Linux) is the public network interface
over which database clients connect.
Note: The VIP must be public so that clients outside the cluster can contact it.
Once you have set up an HP Vertica cluster and created a database, you can choose the nodes
that will be directors. To achieve the best high-availability load balancing result when K-safety is set
to 1, ensure that the IPVS master node and the slave node are located on HP Vertica database
nodes with a buddy projections pair. (See High Availability Through Projections for information on
buddy projections.)
The instructions in this section use the following node configuration:
Pre-configured IP Node assignment Public IPs Private IPs
VIP shared among all nodes 10.10.51.180
RIP master director node01 10.10.51.55 192.168.51.1
RIP slave director node02 10.10.51.56 192.168.51.2
RIP failover node node03 10.10.51.57 192.168.51.3
Notes
l In the above table, the private IPs determine which node to send a request to. They are not the
same as the RIPs.
l The VIP must be on the same subnet as the nodes in the HP Vertica cluster.

l Both the master and slave nodes (node01 and node02 in this section) require additional
installation and configuration, as described in Configuring the Directors.
l Use the command $ cat /etc/hosts to display a list of all hosts in your cluster
The following external web sites might be useful. The links worked at the last date of publication,
but HP Vertica does not manage this content.
See Also
l Linux Virtual Server Web site
l LVS-HOWTO Page
l Keepalived.conf(5) man page
l ipvsadm man page
Set Up the Loopback Interface
This procedure sets up the loopback (lo) interface with an alias on each node.
1. Log in as root on the master director (node01):
$ su - root
2. Use the text editor of your choice to open ifcfg-lo:
[root@node01]# vi /etc/sysconfig/network-scripts/ifcfg-lo
3. Set up the loopback adapter with an alias for the VIP by adding the following block to the end of
the file:
## vip deviceDEVICE=lo:0
IPADDR=10.10.51.180
NETMASK=255.255.255.255
ONBOOT=yes
NAME=loopback
Note: When you add the above block to your file, be careful not to overwrite the
127.0.0.1 parameter, which is required for proper system operations.
4. Start the device:

[root@node01]# ifup lo:0
5. Repeat steps 1-4 on each node in the HP Vertica cluster.
Disable Address Resolution Protocol (ARP)
This procedure disables ARP (Address Resolution Protocol) for the VIP.
1. On the master director (node01), log in as root:
$ su - root
2. Use the text editor of your choice to open the sysctl configuration file:
[root@node01]# vi /etc/sysctl.conf
3. Add the following block to the end of the file:
#LVSnet.ipv4.conf.eth0.arp_ignore =1
net.ipv4.conf.eth0.arp_announce = 2
# Enables packet forwarding
net.ipv4.ip_forward =1
Note: For additional details, refer to the LVS-HOWTO Page. You might also refer to the
Linux Virtual Server Wiki page for information on using arp_announce/arp_ignore to
disable the Address Resolution Protocol.
4. Use ifconfig to verify that the interface is on the same subnet as the VIP:
[root@node01]# /sbin/ifconfig
In the following output, the eth0 inet addr is the VIP, and subnet 51 matches the private RIP
under the eth1 heading:
eth0 Link encap:Ethernet HWaddr 84:2B:2B:55:4B:BE inet addr:10.10.51
.55 Bcast:10.10.51.255 Mask:255.255.255.0
inet6 addr: fe80::862b:2bff:fe55:4bbe/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:91694543 errors:0 dropped:0 overruns:0 frame:0
TX packets:373212 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:49294294011 (45.9 GiB) TX bytes:66149943 (63.0 MiB)

Interrupt:15 Memory:da000000-da012800
eth1 Link encap:Ethernet HWaddr 84:2B:2B:55:4B:BF
inet addr:192.168.51.55 Bcast:192.168.51.255 Mask:255.255.255.0
inet6 addr: fe80::862b:2bff:fe55:4bbf/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX bytes:449050544237 (418.2 GiB) TX bytes:46302821625 (43.1 GiB)
Interrupt:14 Memory:dc000000-dc012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX bytes:21956498 (20.9 MiB) TX bytes:21956498 (20.9 MiB)
lo:0 Link encap:Local Loopback
inet addr:10.10.51.180 Mask:255.255.255.255
5. Use ifconfig to verify that the loopback interface is up:
[root@node01]# /sbin/ifconfig lo:0
You should see output similar to the following:
lo:0 Link encap:Local Loopback inet addr:10.10.51.180 Mask:255.255.255.255
If you do not see UP LOOPBACK RUNNING, bring up the loopback interface:
[root@node01]# /sbin/ifup lo
6. Issue the following command to commit changes to the kernel from the configuration file:
[root@node01]# /sbin/sysctl -p
7. Repeat steps 1-6 on all nodes in the HP Vertica cluster.

Configuring the Directors
Now you are ready to install the HP Vertica IPVS Load Balancer package and configure the master
(node01) and slave (node02) directors.
Install the HP Vertica IPVS Load Balancer Package
The following instructions describe how to download and install the HP Vertica IPVS Load Balancer
package for Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6.
Note: For illustrative purposes only, this procedure uses node01 for the master director and
node02 for the slave director.
Before You Begin
Before installing IPVS, you must:
l Install HP Vertica
l Create a database cluster
If You Are Using Red Hat Enterprise Linux 5.x:
Make sure you have downloaded and installed the HP Vertica Analytics Database RPM and the
IPVS Load Balancer package for Red Hat Enterprise Linux 5.
1. On the master director (node01) log in as root:
$ su - root
2. Download the IPVS Load Balancer package for Red Hat Enterprise Linux 5 from the
my.vertica.com website to a location on the master server, such as to /tmp.
3. Change directory to the location of the downloaded file:
# cd /tmp
4. Install (or upgrade) the Load Balancer package using the rpm -Uvh command.
The following is an example for the Red Hat Linux package only; package names name could
change between releases:

# rpm -Uvh vertica-ipvs-load-balancer-<current-version>.x86_64.RHEL5.rpm
5. Repeat steps 1-4 on the slave director (node02).
If You Are Using Red Hat Enterprise Linux 6.x:
Make sure you have downloaded and installed the HP Vertica Analytics Database RPM and the
IPVS Load Balancer package for Red Hat Enterprise Linux 6.
$ su - root
2. Download the IPVS Load Balancer package for Red Hat Enterprise Linux 6 from the
my.vertica.com website to a location on the master server, such as to /tmp.
3. Change directory to the location of the downloaded file:
# cd /tmp
4. Run this command as root:
/sbin/modprobe ip_vs
5. Verify that ip_vs is loaded correctly using this command:
lsmod | grep ip_vs
6. Install (or upgrade) the Load Balancer package using the rpm -Uvh command.
The following is an example for the Red Hat Linux package only; package names name could
change between releases:
# rpm -Uvh vertica-ipvs-load-balancer-<current-version>.x86_64.RHEL6.rpm
7. Repeat steps 1-6 on the slave director (node02).
Configure the HP Vertica IPVS Load Balancer
HP Vertica provides a script called configure-keepalived.pl in the IPVS Load Balancer
package. The script is located in /sbin, and if you run it with no options it prints a usage summary:

--ripips | Comma separated list of HP Vertica nodes; public IPs (e.g., 10.10.50.116,
etc.)
--priv_ips | Comma separated list of HP Vertica nodes; private IPs (e.g., 192.168.51.1
16, etc.)
--ripport | Port on which HP Vertica runs. Default is 5433
--iface | Public ethernet interface HP Vertica is configured to use (e.g., eth0)
--emailto | Address that should get alerts (e.g., user@server.com)
--emailfrom | Address that mail should come from (e.g., user@server.com)
--mailserver | E-mail server IP or hostname (e.g., mail.server.com)
--master | If this director is the master (default), specify --master
--slave | If this director is the slave, specify --slave
--authpass | Password for keepalived
--vip | Virtual IP address (e.g., 10.10.51.180)
--delayloop | Seconds keepalived waits between healthchecks. Default is 2
--algo | Sets the algorithm to use: rr, wrr, lc (default), wlc, lblc, lblcr, dh, s
h, sed, nq
--kind | Sets the routing method to use. Default is DR.
--priority | By default, master has priority of 100 and the backup (slave) has priorit
y of 50
For details about each of these parameters, refer to the ipvsadm(8) - Linux man page.
Public and Private IPs
If your cluster uses private interfaces for spread cluster communication, you need to use the --priv_
ips switch to enter the private IP addresses that correspond to the public IP addresses (or RIPs).
The IPVS keepalive daemon uses these private IPs to determine when a node has left the cluster.
The IP host ID of the RIPs must correspond to the IP host ID of the private interfaces. For example,
given the following IP address mappings:
Public Private (for spread)
10.10.50.116 192.168.51.116
10.10.50.117 192.168.51.117
10.10.50.118 192.168.51.118
You need to enter the IP addresses in the following order:
--ripips 10.10.50.116,10.10.50.117,10.10.50.118
--priv_ips 192.168.51.116,192.168.51.117,192.168.51.118
You must use IP addresses, not node names, or the spread.pl script could fail.
If you do not specify private interfaces, HP Vertica uses the public RIPs for the MISC check, as
shown in step 3 below.

Set up the HP Vertica IPVS Load Balancer Configuration File
$ su - root
2. Run the HP-supplied configuration script with the appropriate switches; for example:
# /sbin/configure-keepalived.pl --ripips 10.10.50.116,10.10.50.117,10.10.50.118
--priv_ips 192.168.51.116,192.168.51.117,192.168.51.118
--ripport 5433
--iface eth0
--emailto dbadmin@companyname.com
--emailfrom dbadmin@companyname.com
--mailserver mail.server.com
--master
--authpass password
--vip 10.10.51.180
--delayloop 2
--algo lc
--kind DR
--priority 100
Caution: The --authpass (password) switch must be the same on both the master and
slave directors.
3. Check the keepalived.conf file to verify private and public IP settings for the --ripips and -
-priv_ips switches and make sure the real_server IP address is public.
# cat /etc/keepalived/keepalived.conf
An entry in the keepalived.conf file would resemble the following:
real_server 10.10.50.116 5433 {
MISC_CHECK {
misc_path "/etc/keepalived/check.pl 192.168.51.116"
}
}
4. Start spread:
# /etc/init.d/spread.pl start

The spread.pl script writes to the check.txt file, which is is rewritten to include only the
remaining nodes in the event of a node failure. Thus, the virtual server knows to stop sending
vsql requests to the failed node.
5. Start keepalived on node01:
# /etc/init.d/keepalived start
6. If not already started, start sendmail to allow mail messages to be sent by the directors:
# /etc/init.d/sendmail start
7. Repeat steps 1-5 on the slave director (node02), using the same switches, except
(IMPORTANT) replace the --master switch with the --slave switch.
Note: Tip: Use a lower priority for the slave --priority switch. HP currently suggests 50.
# /sbin/configure-keepalived.pl
--ripips 10.10.50.116,10.10.50.117,10.10.50.118
--priv_ips 192.168.51.116,192.168.51.117,192.168.51.118
--ripport 5433
--iface eth0
--emailto dbadmin@companyname.com
--emailfrom dbadmin@companyname.com
--mailserver mail.server.com
--slave
--authpass password
--vip 10.10.51.180
--delayloop 2
--algo lc
--kind DR
--priority 100
See Also
l Keepalived.conf(5) -Linux man page
Connecting to the Virtual IP (VIP)
To connect to the Virtual IP address using vsql, issue a command similar to the following. The IP
address, which could also be a DNS address, is the VIP that is shared among all nodes in the HP
Vertica cluster.
$ /opt/vertica/bin/vsql -h 10.10.51.180 -U dbadmin

To verify connection distribution over multiple nodes, repeat the following statement multiple times
and observe connection distribution in an lc (least amount of connections) fashion.
$ vsql -h <VIP> -c "SELECT node_name FROM sessions"
Replace <VIP> in the above script with the IP address of your virtual server; for example:
$ vsql -h 10.10.51.180 -c "SELECT node_name FROM sessions" node_name
-----------------
v_ipvs_node01
v_ipvs_node02
v_ipvs_node03
(3 rows)
Monitoring Shared Node Connections
If you want to monitor which nodes are sharing connections, view the check.txt file by issuing the
following command at a shell prompt:
# watch cat /etc/keepalived/check.txtEvery 2.0s: cat /etc/keepalived/check.txt Wed Nov
3 10:02:20 2010
N192168051057
N192168051056
N192168051055
The check.txt is a file located in the /etc/keepalived/ directory, and it gets updated when you
submit changes to the kernel using sysctl -p, described in Disable the Address Resolution
Protocol (ARP). For example, the spread.pl script (see Configuring the Directors), writes to the
check.txt file, which is then modified to include only the remaining nodes in the event of a node
failure. Thus, the virtual server knows to stop sending vsql requests to the failed node.
You can also look for messages by issuing the following command at a shell prompt:
# tail -f /var/log/messages
Nov 3 09:21:00 p6 Keepalived: Starting Keepalived v1.1.17 (05/17,2010)Nov 3 09:21:00 p6
Keepalived: Starting Healthcheck child process, pid=32468
Nov 3 09:21:00 p6 Keepalived: Starting VRRP child process, pid=32469
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Using LinkWatch kernel netlink reflector...
Nov 3 09:21:00 p6 Keepalived_vrrp: Using LinkWatch kernel netlink reflector...
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Netlink reflector reports IP 10.10.51.55 ad
ded
Nov 3 09:21:00 p6 Keepalived_vrrp: Netlink reflector reports IP 10.10.51.55 added
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Netlink reflector reports IP 192.168.51.55
added
Nov 3 09:21:00 p6 Keepalived_vrrp: Registering Kernel netlink reflector
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Registering Kernel netlink reflector
Nov 3 09:21:00 p6 Keepalived_vrrp: Registering Kernel netlink command channel
Nov 3 09:21:00 p6 Keepalived_vrrp: Registering gratuitous ARP shared channel
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Registering Kernel netlink command channel
Nov 3 09:21:00 p6 Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Opening file

'/etc/keepalived/keepalived.conf'.
Nov 3 09:21:00 p6 Keepalived_vrrp: Configuration is using : 63730 Bytes
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Configuration is using : 16211 Bytes
Nov 3 09:21:00 p6 Keepalived_healthcheckers: Activating healthcheckers for service [10.1
0.51.55:5433]
0.51.56:5433]
0.51.57:5433]
Nov 3 09:21:00 p6 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(10,11)]
Nov 3 09:21:01 p6 Keepalived_healthcheckers: Misc check to [10.10.51.56] for [/etc/keepa
lived/check.pl 192.168.51.56] failed.
Nov 3 09:21:01 p6 Keepalived_healthcheckers: Removing service [10.10.51.56:5433] from VS
[10.10.51.180:5433]
Nov 3 09:21:01 p6 Keepalived_healthcheckers: Remote SMTP server [127.0.0.1:25] connecte
d.
[10.10.51.180:5433]
d.
[10.10.51.180:5433]
d.
Nov 3 09:21:01 p6 Keepalived_healthcheckers: SMTP alert successfully sent.
Nov 3 09:21:10 p6 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 3 09:21:20 p6 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 3 09:21:20 p6 Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol VIPs.
Nov 3 09:21:20 p6 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 f
or 10.10.51.180
Nov 3 09:21:20 p6 Keepalived_healthcheckers: Netlink reflector reports IP 10.10.51.180 a
dded
Nov 3 09:21:20 p6 Keepalived_vrrp: Remote SMTP server [127.0.0.1:25] connected.
Nov 3 09:21:20 p6 Keepalived_vrrp: SMTP alert successfully sent.
Nov 3 09:21:25 p6 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 f
or 10.10.51.1
Determining Where Connections Are Going
Ipvsadm is the user code interface to the IP Virtual Server. It is used to set up, maintain, or inspect
the virtual server table in the Linux kernel.
If you want to identify where user connections are going, install ipvsadm.
1. Log in to the master director (node01) as root:
$ su - root

2. Install ipvsadm:
[root@node01]# yum install ipvsadmLoading "installonlyn" plugin
Setting up Install Process
Setting up repositories
Reading repository metadata in from local files
Parsing package install arguments
Resolving Dependencies
--> Populating transaction set with selected packages. Please wait.
---> Downloading header for ipvsadm to pack into transaction set.
ipvsadm-1.24-10.x86_64.rp 100% |=========================| 6.6 kB 00:00
---> Package ipvsadm.x86_64 0:1.24-10 set to be updated
--> Running transaction check
Dependencies Resolved
=============================================================================
Package Arch Version Repository Size
=============================================================================
Installing:
ipvsadm x86_64 1.24-10 base 32 k
Transaction Summary
=============================================================================
Install 1 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 32 k
Is this ok [y/N]: y
Downloading Packages:
(1/1): ipvsadm-1.24-10.x8 100% |=========================| 32 kB 00:00
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing: ipvsadm ######################### [1/1]
Installed: ipvsadm.x86_64 0:1.24-10
Complete!
3. Run ipvsadm:
[root@node01 ~]# ipvsadmIP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP vs-wks1.verticacorp.com:pyrr lc
-> node03.verticacorp.com:pyr Route 1 1 8
-> node02.verticacorp.com:pyr Route 1 0 8
-> node01.verticacorp.com:pyr Local 1 0 8
See Also
l ipvsadm man page

Virtual IP Connection Problems
Issue
Users cannot connect to the database.
Resolution
Try to telnet to the VIP and port:
# telnet 10.10.51.180 5433
If telnet reports no route to host, recheck your /etc/keepalived/keepalived.conf file to make
sure you entered the correct VIP and RIPs.
Errors and informational messages from the keepalived daemon are written to the
/var/log/messages file, so check the messages file first:
# tail -f /var/log/messagesMay 18 09:04:32 dell02 Keepalived_vrrp: VRRP_Instance(VI_1) Se
nding gratuitous
ARPs on eth0 for 10.10.10.100
May 18 09:04:32 dell02 avahi-daemon[3191]: Registering new address record for
10.10.10.100 on eth0.
May 18 09:04:32 dell02 Keepalived_healthcheckers: Netlink reflector reports IP
10.10.10.100 added
Expected E-mail Messages From the Keepalived Daemon
l Upon startup:
Subject: [node01] VRRP Instance VI_1 - Entering MASTER state=> VRRP Instance is now ow
ning VRRP VIPs <=
l When a node fails:
Subject:[node01] Realserver 10.10.10.1:5433 - DOWN=> MISC CHECK failed on service <=
l When a node comes back up:
Subject: [node02] Realserver 10.10.10.1:5433 - UP=> MISC CHECK succeed on service <=

Troubleshooting Keepalived Issues
If there are connection or other issues related to the Virtual IP server and Keepalived, try some of
the following tips:
l Set KEEPALIVED_OPTIONS="-D -d" in the /etc/sysconfig/keepalived file to enable both
debug mode and dump configuration.
l Monitor the system log in /var/log/messages. If keepalived.conf is incorrect, the only
indication is in the messages log file. For example:
$ tail /var/log/messages
Errors and informational messages from the keepalived daemon are also written to the
/var/log/messages files.
l Type ip addr list and see the configured VIP addresses for eth0. For example:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:
00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 10.10.51.180/32 brd 127.255.255.255 scope global lo:0
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 84:2b:2b:55:4b:be brd ff:ff:ff:ff:ff:ff
inet 10.10.51.55/24 brd 10.10.51.255 scope global eth0
inet6 fe80::862b:2bff:fe55:4bbe/64 scope link
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 84:2b:2b:55:4b:bf brd ff:ff:ff:ff:ff:ff
inet 192.168.51.55/24 brd 192.168.51.255 scope global eth1
inet6 fe80::862b:2bff:fe55:4bbf/64 scope link
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
l Check iptables and notice the PREROUTING rule on the BACKUP (slave) director. Even
though ipvsadm has a complete list of real servers to manage, it does not route anything as the
prerouting rule redirects packets to the loopback interface.
# /sbin/iptables -t nat -n -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
REDIRECT tcp -- 0.0.0.0/0 10.10.51.180
Chain POSTROUTING (policy ACCEPT)

Chain OUTPUT (policy ACCEPT)
Note: The master and the 3rd node do not have the REDIRECT entry when you run the
same command on those nodes. Only the slave does.
On some kernels, the nat tables does not show by default without the -t parameter, and -
n is used to avoid long DNS lookups. See the iptables(8) - Linux man page for details.
l During failover, it is normal to expect delay in new connection establishment until the slave node
takes control. The delay could be several minutes depending on the load on the cluster. If you
cannot connect to the database, try to telnet to the VIP and port:
# telnet 10.10.51.180 5433
If telnet reports no route to host, recheck the keepalived configuration file
(/etc/keepalived/keepalived.conf) to make sure you entered the correct VIP and RIPs.

Managing Nodes
HP Vertica provides the ability to add, remove, and replace nodes on a live cluster that is actively
processing queries. This ability lets you scale the database without interrupting users.
Stop HP Vertica on a Node
In some cases, you need to take down a node to address one of the following scenarios:
l Perform maintenance
l Upgrade hardware
Note: Before taking a node down make sure to back up your database. See Backing Up and
Restoring the Database.
1. Check the K-safety level of your cluster. In vsql enter the following query:
SELECT current_fault_tolerance FROM system;
The query returns the K-Safety level of the cluster.
Important: HP Vertica does not recommend a K-safety level of 0. In this case, you should
backup the database before shutting down a node. See Lowering the K-Safety Level to
Allow for Node Removal for more information.
2. Run Administration Tools, select Advanced Menu, and click OK.
3. Select Stop Vertica on Host and click OK.
4. Choose the host that you want to stop and click OK.
5. Return to the Main Menu, select View Database Cluster State, and click OK. The host you
previously stopped now appears DOWN, as shown.

You can now perform maintenance. See Restart HP Vertica on a Node for details about
restarting HP Vertica on a node.
Restart HP Vertica on a Node
After stopping a node to perform maintenance, upgrade the hardware, or another similar task, you
can bring the node back up. Performing this process reconnects the node with the database.
Restarting HP Vertica on a node
1. Run Administration Tools. From the Main Menu select Restart Vertica on Host and click OK.
2. Select the database and click OK.
3. Select the host that you want to restart and click OK.
Note: This process may take a few moments.
4. Return to the Main Menu, select View Database Cluster State, and click OK. The host you
restarted now appears as UP, as shown.
Fault Groups
Fault groups let you configure HP Vertica for your physical cluster layout to minimize the risk of
correlated failures inherent in your environment, usually caused by shared resources. HP Vertica
automatically creates fault groups around control nodes (servers that run spread) in large cluster
arrangements, placing nodes that share a control node in the same fault group.
Consider defining your own fault groups specific to your cluster's physical layout if you want to:
l Reduce the risk of correlated failures. For example, by defining your rack layout, HP Vertica
could tolerate a rack failure.
l Influence the placement of control nodes in the cluster.
HP Vertica supports complex, hierarchical fault groups of different shapes and sizes and provides a
fault group script (DDL generator), SQL statements, system tables, and other monitoring tools.

See High Availability With Fault Groups in the Concepts Guide for an overview of fault groups with
a cluster topology example.
About the Fault Group Script
To help you define fault groups on your cluster, HP Vertica provides a script called fault_group_
ddl_generator.py in the /opt/vertica/scripts directory. This script generates the
SQL statements you run to create fault groups.
Note: The fault_group_ddl_generator.py script does not create fault groups for you.
However, you can copy the output to a file and then use i or vsql–f commands to pass the
cluster layout to HP Vertica when you run the helper script.
Create a fault group input file
Use any text editor to create a fault group input file for the targeted cluster. Then pass the input file
to the fault_group_ddl_generator.py script, which the script uses to return a list of commands
you run.
The fault group input file must adhere to the following format:
First line in the template
The first line is for the parent (top-level) fault groups only, delimited by spaces; for example:
rack1 rack2 rack3 rack4
Remaining lines in template
The format for each subsequent line includes a parent fault group, followed by an equals sign (=)
and then includes any combination of one or more nodes, fault groups, virtual machine severs, or
virtual machine hosts. The objects that go into a parent fault group are delimited by spaces; for
example:
<parent> = <child_1> <child_2> <child_n...>
After the first row of parent fault groups, the order in which you write the group descriptions does not
matter. All fault groups that you define in this file must refer back to a parent fault group, either
directly or by being the child of a fault group that is the child of a parent fault group.
For example, to place B into A and both C and D into B, the format is as follows:
A = B
B = C D
In the above example, B is a child of A and a parent of both C and D.
Example input file

The following input file is an example only. In your file, the node* names must represent nodes in
the cluster, such as v_vmartdb_node0001, v_vmartd_node0002, and so on.
rack1 rack2 rack3 vm3_1 vm4_1 vm4_2 rack4
rack1 = node1 node2 node3 node4
rack2 = node5 node6 node7 node8
rack3 = node9 vm3_1
vm3_1 = node10 node11 node12
vm4_1 = node13 node14
vm4_2 = node15 node16
rack4 = vm4_1 vm4_2
The fault_group_ddl_generator.py script returns output like the following. Consider piping the
output to a file so you can edit and reuse the DDL statements later, such as if you want to add
nodes to an existing fault group.
ALTER DATABASE vmartd DROP ALL FAULT GROUP;
CREATE FAULT GROUP rack1;
ALTER FAULT GROUP rack1 ADD NODE node1;
CREATE FAULT GROUP vm3_1;
ALTER FAULT GROUP vm3_1 ADD NODE node10;
ALTER FAULT GROUP rack3 ADD FAULT GROUP vm3_1;
For details, see Creating Fault Groups.
About Automatic Fault Groups
If you do not define your own fault groups, HP Vertica creates automatic fault groups in order to
create the correct control node assignments for the cluster layout. In this scenario, all nodes that
share the same control node reside in the same automatic fault group.
Ephemeral nodes are not included in automatic or user-defined fault groups because they hold no
data.

Creating Fault Groups
When you define fault groups, HP Vertica distributes data segments across the cluster so the
cluster can tolerate correlated failures inherent in your environment, such as a rack failure. For an
overview, see High Availability With Fault Groups in the Concepts Guide.
Fault groups prerequisites
l Defining fault groups requires careful and thorough network planning. You must have a solid
understanding of your network topology.
l HP Vertica must first be installed or upgraded to the latest version.
l The user who creates fault groups must be a superuser.
l A database must already exist.
How to create a fault group
The following procedure assumes that someone in your organization has planned your fault group
hierarchy and created an input file to pass to the fault_group_ddl_generator.py script. See
About the Fault Group Script for details.
Tip: Pipe the output from the script (in step 2) to a <filename>.sql file so you can run a single
SQL script instead of multiple DDL statements. Also consider saving your input file so you can
more easily modify fault groups later, such as after you expand the cluster or change the
distribution of control nodes.
1. As the database administrator, or a user with sudo privileges, log in to one of the target hosts in
the cluster.
2. Run the fault_group_ddl_generator.py. script and include the following arguments:
n The database name
n The fault group input file name
n [Optional] The file name of the <filename>.sql script into which you want to write the
DDL statements.
Example:
$ python /opt/vertica/scripts/fault_group_ddl_generator.py

vmartdb faultGroupDescription.txt > faultgroupddl.sql
The above command writes SQL statements to the file called faultgroupddl.sql.
3. Run vsql and log in to the target database as the database administrator (dbadmin by default).
4. Run the .sql output script you created in Step 2; for example:
=> i faultgroupddl.sql
5. If you do not have large cluster enabled (fewer than 120 nodes), proceed to the next step;
otherwise, you must realign the control nodes by calling the following function:
=> SELECT realign_control_nodes();
6. Save cluster changes to the spread configuration file:
=> SELECT reload_spread(true);
7. Use the Administration Tools to restart the database.
8. Save changes to the cluster's data layout by calling the REBALANCE_CLUSTER() function:
=> SELECT rebalance_cluster();
See Also
For syntax on each of the fault group statements and functions, see the following topics in the SQL
Reference Manual:
l Cluster Management Functions
l CREATE FAULT GROUP
l ALTER FAULT GROUP
l DROP FAULT GROUP
l ALTER DATABASE
Modifying Fault Groups
Modify fault groups when you need to:

l Add a fault group to another fault group
l Remove a fault group from another fault group
l Add one or more nodes to a fault group
l Remove one or more nodes from a fault group (in which case the node is left without a parent
fault group)
l Rename a fault group
How to modify a fault group
Before you modify existing fault groups, carefully plan the new layout and modify the input template
file you created for the targeted cluster. See About the Fault Group Script and Creating Fault
Groups.
1. As the database administrator or a user with sudo privileges, log in to any host in the cluster.
2. Run the fault groups python script and supply the following:
n The database name
n The fault group input file name
n [Optional] The file name of the <filename>.sql script you want to create
Example:
$ /opt/vertica/scripts/fault_group_ddl_generator.py
vmartdb ModifiedFaultGroupTemp.txt > faultgroupddl.sql
The above command writes the SQL statements for to the file name, faultgroupddl.sql.
3. Run vsql and log in to the specified database as the database administrator (default dbadmin).
4. Run the <filename>.sql script you created in the step 2; for example:
=> i faultgroupddl.sql
5. If you do not have large cluster enabled, skip this step; otherwise, you must realign the control
nodes by calling the following function:
=> SELECT realign_control_nodes();

6. Restart the spread process to write changes to the spread configuration file:
=> SELECT reload_spread(true);
7. Use the Administration Tools to restart the database.
8. Save changes to the cluster's data layout by calling the REBALANCE_CLUSTER()function:
=> SELECT rebalance_cluster();
See Also
See the following topics in the SQL Reference Manual:
l ALTER FAULT GROUP
l ALTER DATABASE
l Cluster Management Functions
Dropping Fault Groups
When you remove a fault group from the cluster, the drop operation removes the specified fault
group and its child fault groups, placing all nodes under the parent of the dropped fault group. To see
the current fault group hierarchy in the cluster, query the FAULT_GROUPS system table.
How to drop a fault group
Use the DROP FAULT GROUP statement to remove a fault group from the cluster. The following
example drops the group2 fault group:
vmartdb=> DROP FAULT GROUP group2;
DROP FAULT GROUP
How to remove all fault groups
Use the ALTER DATABASE statement to drop all fault groups, along with any child fault groups, from
the specified database cluster.
The following command drops all fault groups from the vmartdb database. Note that the command
is singular: DROP ALL FAULT GROUP.
vmartdb=> ALTER DATABASE exampledb DROP ALL FAULT GROUP;

ALTER DATABASE
How to add nodes back to a fault group
To add a node back to a fault group, you must manually re-assign it to a new or existing fault group
using CREATE FAULT GROUP and ALTER FAULT GROUP..ADD NODE statements.
See the following topics in the SQL Reference Manual for details:
l DROP FAULT GROUP
l CREATE FAULT GROUP
l ALTER FAULT GROUP..ADD NODE
Monitoring Fault Groups
You can monitor fault groups by querying HP Vertica system tables or by logging in to the
Management Console (MC) interface.
Monitoring fault groups through system tables
Use the following system tables to observe information about fault groups and cluster
vulnerabilities, such as the nodes the cluster cannot lose without the database going down:
l V_CATALOG.FAULT_GROUPS—View the hierarchy of all fault groups in the cluster.
l V_CATALOG.CLUSTER_LAYOUT—Observe the actual arrangement of the nodes
participating in the data business and the fault groups that affect them. Ephemeral nodes are not
shown in the cluster layout ring because they hold no data.
Monitoring fault groups through Management Console
An MC administrator can monitor and highlight fault groups of interest by following these steps:
1. Click the running database you want to monitor and click Manage in the task bar.
2. Open the Fault Group View menu and select the fault groups you want to view.
3. Optionally hide nodes that are not in the selected fault group to focus on fault groups of interest.
Nodes assigned to a fault group have colored bubble attached to the upper left corner of the node
icon. Each fault group has a unique color, until the number of fault groups exceeds the number of
colors available, in which case MC recycles previously-used colors.
Because HP Vertica supports complex, hierarchical fault groups of different shapes and sizes,
MC displays multiple fault group participation as a stack of different-colored bubbles, where the

higher bubbles represent a lower-tiered fault group, which means that bubble is closer to the parent
fault group, not the child or grandchild fault group. See High Availability With Fault Groups in the
Concepts Guide for information about fault group hierarchy.
Example: Simple fault group hierarchy
The following image shows two tiers of nodes in three fault groups with no nodes hidden. Although
fault groups are designed for large clusters, the cluster shown below is intentionally small to make it
easier to view details
While monitoring cluster details on the MC interface, if information in the node details box is hard to
read, increase the zoom level.
Large Cluster
To support scaling of existing clusters into large clusters and improve control message
performance, HP Vertica delegates control message responsibilities to a subset of HP Vertica
nodes, called control nodes. Control nodes communicate with each other. Other cluster nodes are
assigned to a control node, which they use for control message communications.

Control nodes on large clusters
On clusters of 120 or more nodes, a large cluster layout is necessary and enabled by default. HP
Vertica makes automatic control node assignments unless you use one of the following options:
If you want to ... Do this
Install a new cluster before you create a
database
Run the HP Vertica installation script with the --
large-cluster <integer> arguments. See the
following topics for details:
l Installing HP Vertica with the install_vertica
Script in the Installation Guide
l Installing a Large Cluster in this guide
l Expand an existing cluster for pre-
existing databases
l Change control nodes on an existing
cluster
Use cluster management functions described in
Defining and Realigning Control Nodes on an
Existing Cluster.
Influence the placement of control nodes in
your cluster's physical layout
Define fault groups to configure HP Vertica for
your cluster. See Fault Groups for details.
Control nodes on small clusters
If your cluster has fewer than 120 nodes, large cluster is neither necessary nor automatically
applied. As a result, all nodes are control nodes. However, HP Vertica lets you define control nodes
on any sized cluster. Some environments, such as cloud deployments that might have higher
network latency, could benefit from a smaller number of control nodes.
For details, see Planning a Large Cluster Arrangement and Installing a Large Cluster.
Planning a Large Cluster
In a large cluster layout of 120 nodes or more, nodes form a correlated failure group, governed by
their control node—the node that runs control messaging (spread). If a control node fails, all nodes
in its host group also fail.
This topic provides tips on how to plan for a large cluster arrangement. See Installing a Large
Cluster and Large Cluster Best Practices for more information.
Planning the number of control nodes
Configuring a large cluster requires careful and thorough network planning. You must have a solid
understanding of your network topology before you configure the cluster.

To asses how many cluster nodes should be control nodes, use the square root of the total number
of nodes expected to be in the database cluster to help satisfy both data K-Safety and rack fault
tolerance for the cluster. Depending on the result, you might need to adjust the number of control
nodes to account for your physical hardware/rack count. For example, if you have 121 nodes (with a
result of 11), and your nodes will be distributed across 8 racks, you might want to increase the
number of control nodes to 16 so you have two control nodes per rack.
Specifying the number of control nodes
HP Vertica provides different tools to help you define the number of control nodes, depending on
your current configuration. Consider the following scenarios, in which cluster nodes are distributed
among three racks in different configurations:
If you cluster fits this scenario ... Consider this setup
Three control nodes. All other nodes are evenly
distributed among the three racks.
Specify one control node per rack.
Five control nodes and three racks. Specify two control nodes on two racks each
and one control node on the final rack.
Four control nodes. One rack has twice as
many nodes as the other racks.
Specify two control nodes on the larger rack
and one control node each on the other two
racks.
Installing a Large Cluster
Whether you are forming a new large cluster (adding all nodes for the first time) or expanding an
existing cluster to a large cluster, HP Vertica provides two methods that let you specify the number
of control nodes (the nodes that run control messaging). See the following sections for details:
l If you want to install a new large cluster
l If you want to expand an existing cluster
If you want to install a new large cluster
To configure HP Vertica for a new, large cluster, pass the install_vertica script the --large-
cluster <integer> argument. HP Vertica selects the first <integer> number of hosts from the
comma-separated --hosts host_list as control nodes and assigns all other hosts you specify in
the --hosts argument to a control node based on a round-robin model.
Note: The number of hosts you include in the --hosts argument determines a large cluster
layout, not the number of nodes you later include in the database. If you specify 120 or more
hosts in the --hosts list, but you do not specifically enable large cluster by providing the --
large-cluster argument, HP Vertica automatically enables large cluster and configures

control nodes for you.
To help control nodes and the nodes assigned to them be configured for the highest possible fault
tolerance, you must specify hosts in the --hosts host_list in a specific order. For example, if
you have four sets of hosts on four racks, the first four nodes in the --hosts host_list must be
one host from each rack in order to have one control node per rack. Then the list must consist of
four hosts from each rack in line with the first four hosts. You'll continue to use this pattern of host
listing for all targeted hosts. See Sample rack-based cluster hosts topology below for examples.
Tip: If you pass the --large-cluster argument a DEFAULT value instead of an <integer>
value, HP Vertica calculates a number of control nodes based on the total number of nodes
specified in the --hosts host_list argument. If you want a specific number of control nodes
on the cluster, you must use the <integer> value.
For more information, see the following topics:
l Planning a Large Cluster Arrangement
l Installing HP Vertica with the install_vertica Script in the Installation Guide
Sample rack-based cluster hosts topology
This example shows a simple, multi-rack cluster layout, in which cluster nodes are evenly
distributed across three racks. Each rack has one control node.

In the rack-based example:
l Rack-1, Rack-2, and Rack-3 are managed by a single network switch
l Host-1_1, Host-1_2, and Host-1_3 are control nodes
l All hosts on Rack-1 are assigned to control node Host-1_1
In this scenario, if control node Host-1_1 or Rack-1 fails, the nodes in Rack-1 can communicate
with control node Host-2_1, and the database stays up.
In the following install_vertica script fragment, note the order of the hosts in the --hosts list
argument. The final arguments specifically enable large cluster and provide the number of control
nodes (3):
... install_vertica --hosts Host-1-1,Host-1-2,Host-1-3,Host-2-1,Host-2-2,Host-2-3,Host-3-
1,Host-3-2,Host-3-3,
Host-4-1,Host-4-2,Host-4-3,Host-5-1,Host-5-2,Host-5-3 -rpm <vertica-package-name> <other
required options>
--large-cluster 3

After the installation process completes, use the Administration Tools to create a database. This
operation generate an HP Vertica cluster environment with three control nodes and their respective
associated hosts that reside on the same racks as the control node.
If you want to expand an existing cluster
When you add a node to an existing cluster, HP Vertica places the new node in an appropriate
location within the cluster ring. HP Vertica then assigns the newly-added node to a control node,
based on the cluster's current allocations.
To give you more flexibility and control over which nodes run spread, you can use the SET_
CONTROL_SET_SIZE(integer) function. This function works like the installation script's --large-
cluster <integer> option. See Defining and Realigning Control Nodes on an Existing Cluster for
details.
Important: The HP Vertica installation script cannot alter the database cluster.
Defining and Realigning Control Nodes on an Existing
Cluster
This topic describes how to set up or change control node assignments on an existing cluster using
a series of cluster management functions. It assumes you already know how many control nodes
the cluster needs for failover safety. See Planning a Large Cluster Arrangement for more
information.
Note: If you are adding nodes for the first time, run the HP Vertica installation script using the -
-large-cluster <integer> argument. See Installing HP Vertica with the install_vertica
Script in the Installation Guide.
Setting up control nodes on an existing cluster makes the following changes to the cluster:
l Configures the number of nodes that run spread.
l Assigns each non-control cluster node to a control node.
l Saves the new layout to the spread configuration file.
l Redistributes data across the cluster to improve fault tolerance.
How to set up control nodes on an existing cluster
After you add, remove, or swap nodes on an existing cluster, perform the following steps. This
procedure helps the cluster maintain adequate control messaging distribution for failover safety. For
more details , see "Control node assignment/realignment" in Large Cluster Best Practices.

1. As the database administrator, log in to the Administration Tools and connect to the
database.
2. Call the SET_CONTROL_SET_SIZE(integer) function with an integer argument that specifies
the number of control nodes you want. For example, 4:
=> SELECT SET_CONTROL_SET_SIZE(4);
3. Call the REALIGN_CONTROL_NODES() function without arguments:
=> SELECT REALIGN_CONTROL_NODES();
4. Call the RELOAD_SPREAD(true) function to save changes to the spread configuration file:
=> SELECT RELOAD_SPREAD(true);
5. After the RELOAD_SPREAD() operation finishes, log back in to the Administration Tools, and
restart the database.
6. Call the REBALANCE_CLUSTER() function to distribute data across the cluster:
=> SELECT REBALANCE_CLUSTER();
Important: You must run REBALANCE_CLUSTER() for fault tolerance to be realized. See
also Rebalancing Large Clusters.
For more details about the functions used in this procedure, see Cluster Management Functions in
the SQL Reference Manual.
Rebalancing Large Clusters
A rebalance operation performs the following tasks:
l Distributes data based on user-defined fault groups, if specified, or based on large cluster
automatic fault groups
l Redistributes the database projections' data across all nodes
l Refreshes projections
l Sets the Ancient History Mark
l Drops projections that are no longer needed

When to rebalance the cluster
Rebalancing is useful (or necessary) after you:
l Mark one or more nodes as ephemeral in preparation of removing them from the cluster
l Add one or more nodes to the cluster so HP Vertica can populate the empty nodes with data
l Remove one or more nodes from the cluster so HP Vertica can redistribute the data among the
remaining nodes
l Change the scaling factor of an elastic cluster, which determines the number of storage
containers used to store a projection across the database
l Set the control node size or realign control nodes on a large cluster layout
l Specify more than 120 nodes in your initial HP Vertica cluster configuration
l Add nodes to or remove nodes from a fault group
You must be a database administrator to rebalance data across the cluster.
How to rebalance the cluster
You rebalance the cluster using SQL functions, such as the REBALANCE_CLUSTER()function. Call
REBALANCE_CLUSTER() after you have completed the last add/remove-node operation.
How long will rebalance take?
A rebalance operation can take some time, depending on the number of projections and the amount
of data they contain. HP recommends that you allow the process to complete uninterrupted. If you
must cancel the operation, call the CANCEL_REBALANCE_CLUSTER() function.
See Also
l Rebalancing Data Using SQL Functions
l Rebalancing Data Across Nodes
l Rebalancing Data Using the Administration Tools UI
Expanding the Database to a Large Cluster
If you have an existing database cluster that you want to expand to a large cluster (more than 120
nodes), follow these steps:

1. Log in to the Administration Tools as the database administrator and stop the database.
2. As root or a user with sudo privileges, open a BASH shell and run the install_vertica script with
the --add-hosts argument, providing a comma-separated list of hosts you want to add to an
existing HP Vertica cluster. See Installing HP Vertica with the install_vertica Script in the
Installation Guide.
3. Exit the shell and re-establish a vsql connection as the database administrator.
4. Log in to the Administration Tools and start the database.
5. Use the Administration Tools Advanced Menu > Cluster Management > Add Hosts option
to add the standby hosts you created in Step 2.
6. Run SET_CONTROL_SET_SIZE(integer) to specify the number of control nodes you want on
the cluster. See Defining and Realigning Control Nodes on an Existing Cluster.
7. Optionally create fault groups to further define the layout of the control nodes within the
physical cluster. See Fault Groups.
Monitoring Large Clusters
Monitor large cluster traits by querying the following system tables:
l V_CATALOG.LARGE_CLUSTER_CONFIGURATION_STATUS—Shows the current spread
hosts and the control designations in the catalog so you can see if they match.
l V_MONITOR.CRITICAL_HOSTS—Lists the hosts whose failure would cause the database to
become unsafe and force a shutdown.
Tip: The CRITICAL_HOSTS view is especially useful for large cluster arrangements. For
non-large clusters, query the CRITICAL_NODES table.
You might also want to query the following system tables:
l V_CATALOG.FAULT_GROUPS—Shows fault groups and their hierarchy in the cluster.
l V_CATALOG.CLUSTER_LAYOUT—Shows the relative position of the actual arrangement of
the nodes participating in the database cluster and the fault groups that affect them.
Large Cluster Best Practices
Keep the following best practices in mind when you are planning and managing a large cluster
implementation.

Planning the number of control nodes
To asses how many cluster nodes should be control nodes, use the square root of the total number
of nodes expected to be in the database cluster to help satisfy both data K-Safety and rack fault
tolerance for the cluster. Depending on the result, you might need to adjust the number of control
nodes to account for your physical hardware/rack count. For example, if you have 121 nodes (with a
result of 11), and your nodes will be distributed across 8 racks, you might want to increase the
number of control nodes to 16 so you have two control nodes per rack.
See Planning a Large Cluster Arrangement.
Control node assignment/realignment
After you specify the number of control nodes, you must update the control host's (spread)
configuration files to reflect the catalog change. Certain cluster management functions might
require that you run other functions or restart the database or both.
If, for example, you drop a control node, cluster nodes that point to it are reassigned to another
control node. If that node fails, all the nodes assigned to it also fail, so you need to use the
Administration Tools to restart the database. In this scenario, you'd call the REALIGN_CONTROL_
NODES() and RELOAD_SPREAD(true) functions, which notify nodes of the changes and realign fault
groups. Calling RELOAD_SPREAD(true) connects an existing cluster node to a newly-assigned
control node.
On the other hand, if you run REALIGN_CONTROL_NODES() multiple times in a row, the layout does
not change beyond the initial setup, so you don't need to restart the database. But if you add or drop
a node and then run REALIGN_CONTROL_NODES(), the function call could change many node
assignments.
Here's what happens with control node assignments when you add or drop nodes, whether those
nodes are control nodes or non-control nodes:
l If you add a cluster node—HP Vertica assigns a control node to the newly-added node based
on the current cluster configuration. If the new node joins a fault group, it is assigned to a control
node from that fault group and requires a database restart to reconnect to that control node. See
Fault Groups for more information.
l If you drop a non-control node—HP Vertica quietly drops the cluster node. This operation
could change the cluster and spread layout, so you must call REBALANCE_CLUSTER() after you
drop a node.
l If you drop a control node—All nodes assigned to the control node go down. In large cluster
implementations, however, the database remains up because the down nodes are not buddies
with other cluster nodes.
Dropping a control node results in (n-1) control nodes. You must call REALIGN_CONTROL_NODES
() to reset the cluster so it has n control nodes, which might or might not be the same number as
before you dropped the control node. Remaining nodes are assigned new control nodes. In this
operation, HP Vertica makes control node assignments based on the cluster layout. When it

makes the new assignments, it respects user-defined fault groups, if any, which you can view
by querying the V_CATALOG.CLUSTER_LAYOUT system table, a view that also lets you see the
proposed new layout for nodes in the cluster. If you want to influence the layout of control nodes
in the cluster, you should define fault groups.
For more information, see Defining and Realigning Control Nodes on an Existing Cluster and
Rebalancing Large Clusters.
Allocate standby nodes
Have as many standby nodes available as you can, ideally on racks you are already using in the
cluster. If a node suffers a non-transient failure, use the Administration Tools "Replace
Host" utility to swap in a standby node.
Standby node availability is especially important for control nodes. If you are swapping a node
that's a control node, all nodes assigned to the control node's host grouping will need to be taken
offline while you swap in the standby node. For details on node replacement, see Replacing Nodes.
Plan for cluster growth
If you plan to expand an existing cluster to 120 or more nodes, you can configure the number of
control nodes for the cluster after you add the new nodes. See Defining and Realigning Control
Nodes.
Write custom fault groups
When you deploy a large cluster, HP Vertica automatically creates fault groups around control
nodes, placing nodes that share a control node into the same fault group. Alternatively, you can
specify which cluster nodes should reside in a particular correlated failure group and share a control
node. See High Availability With Fault Groups in the Concepts Guide.
Use segmented projections
On large-cluster setups, minimize the use of unsegmented projections in favor of segmented
projections. When you use segmented projections, HP Vertica creates buddy projections and
distributes copies of segmented projections across database nodes. If a node fails, data remains
available on the other cluster nodes.
Use the Database Designer
HP recommends that you use the Database Designer to create your physical schema. If you
choose to design projections manually, you should segment large tables across all database nodes
and replicate (unsegment) small table projections on all database nodes.

Elastic Cluster
You can scale your cluster up or down to meet the needs of your database. The most common case
is to add nodes to your database cluster to accommodate more data and provide better query
performance. However, you can scale down your cluster if you find that it is overprovisioned or if
you need to divert hardware for other uses.
You scale your cluster by adding or removing nodes. Nodes can be added or removed without
having to shut down or restart the database. After adding a node or before removing a node, HP
Vertica begins a rebalancing process that moves data around the cluster to populate the new nodes
or move data off of nodes about to be removed from the database. During this process data may
also be exchanged between nodes that are not being added or removed to maintain robust
intelligent K-safety. If HP Vertica determines that the data cannot be rebalanced in a single iteration
due to a lack of disk space, then the rebalance is done in multiple iterations.
To help make data rebalancing due to cluster scaling more efficient, HP Vertica locally segments
data storage on each node so it can be easily moved to other nodes in the cluster. When a new
node is added to the cluster, existing nodes in the cluster give up some of their data segments to
populate the new node and exchange segments to keep the number of nodes that any one node
depends upon to a minimum. This strategy keeps to a minimum the number of nodes that may
become critical when a node fails (see Critical Nodes/K-safety). When a node is being removed
from the cluster, all of its storage containers are moved to other nodes in the cluster (which also
relocates data segments to minimize nodes that may become critical when a node fails). This
method of breaking data into portable segments is referred to as elastic cluster, since it makes
enlarging or shrinking the cluster easier.
The alternative to elastic cluster is to resegment all of the data in the projection and redistribute it to
all of the nodes in the database evenly any time a node is added or removed. This method requires
more processing and more disk space, since it requires all of the data in all projections to
essentially be dumped and reloaded.
The Elastic Cluster Scaling Factor
In new installs, each node has a "scaling factor" number of local segments. Rebalance efficiently
redistributes data by relocating local segments provided that, after nodes are added or removed,
there are sufficient local segments in the cluster to redistribute the data evenly (determined by
MAXIMUM_SKEW_PERCENT). For example, if the scaling factor = 8, and there are initially 5
nodes, then there are a total of 40 local segments cluster wide. If two additional nodes are added to
bring the total to 7 nodes, relocating local segments would place 5 such segments on 2 nodes and 6
such segments on 5 nodes, which is roughly a 16.7% skew. Rebalance chooses this course of
action only if the resulting skew is less than the allowed threshold, as determined by MAXIMUM_
SKEW_PERCENT. Otherwise, segmentation space (and hence data, if uniformly distributed over
this space) is evenly distributed among the 7 nodes and new local segment boundaries are drawn
for each node, such that each node again has 8 local segments.
Note: By default, the scaling factor only has an effect while HP Vertica rebalances the
database. While rebalancing, each node breaks the projection segments it contains into
storage containers, which it then moves to other nodes if necessary. After rebalancing, the

data is recombined into ROS containers. It is possible to have HP Vertica always group data
into storage containers. See Local Data Segmentation for more information.
Enabling and Disabling Elastic Cluster
You enable and disable elastic cluster using functions. See the entries for the ENABLE_ELASTIC_
CLUSTER and DISABLE_ELASTIC_CLUSTER functions in the SQL Reference Manual.
Note: An elastic projection (a segmented projection created when Elastic Cluster is enabled)
created with a modularhash segmentation expression uses hash instead.
Query the ELASTIC_CLUSTER system table to determine if elastic cluster is enabled:
=> select is_enabled from ELASTIC_CLUSTER;
is_enabled
------------
t
(1 row)
Scaling Factor Defaults
The default scaling factor is "4" for new installs of HP Vertica and for upgraded install of HP Vertica
that had local segments disabled. Versions of HP Vertica prior to 6.0 had local segments disabled
by default. The scaling factor is not changed during upgrade on databases upgraded to version 6.0 if
local segments were enabled.
Note: Databases created with versions of HP Vertica earlier than version 5.0 have a scaling
factor of 0, which disables elastic cluster. This ensures that HP Vertica handles projection
segmentation the way it did prior to version 5.0. If you want your older database to have better
scaling performance, you need to manually set a scaling factor to enable the new storage
segmenting behavior.
Viewing Scaling Factor Settings
To view the scaling factor, query the ELASTIC_CLUSTER table:
=> SELECT scaling_factor FROM ELASTIC_CLUSTER;
scaling_factor
---------------
4
(1 row)
=> SELECT SET_SCALING_FACTOR(6);
SET_SCALING_FACTOR
--------------------
SET

(1 row)
=> SELECT scaling_factor FROM ELASTIC_CLUSTER;
scaling_factor
---------------
6
(1 row)
Setting the Scaling Factor
The scaling factor determines the number of storage containers used to store a projection across
the database. Use the SET_SCALING_FACTOR function to change your database's scaling
factor. The scaling factor can be an integer between 1 and 32.
Note: Setting the scaling factor value too high can cause nodes to create too many small
container files, greatly reducing efficiency and potentially causing a "Too many
ROS containers" error (also known as "ROS pushback"). The scaling factor should be set high
enough so that rebalance can transfer local segments to satisfy the skew threshold, but small
enough that the number of storage containers does not exceed ROS pushback. The number of
storage containers should be greater than or equal to the number of partitions multiplied by the
number local of segments (# storage containers >= # partitions * # local segments).
=> SELECT SET_SCALING_FACTOR(12);
SET_SCALING_FACTOR
--------------------
SET
(1 row)
Local Data Segmentation
By default, the scaling factor only has an effect when HP Vertica rebalances the database. During
rebalancing, nodes break the projection segments they contain into storage containers which they
can quickly move to other nodes.
This process is more efficient than re-segmenting the entire projection (in particular, less free disk
space is required), but it still has significant overhead, since storage containers have to be
separated into local segments, some of which are then transferred to other nodes. This overhead is
not a problem if you rarely add or remove nodes from your database.
However, if your database is growing rapidly and is constantly busy, you may find the process of
adding nodes becomes disruptive. In this case, you can enable local segmentation, which tells HP
Vertica to always segment its data based on the scaling factor, so the data is always broken into
containers that are easily moved. Having the data segmented in this way dramatically speeds up
the process of adding or removing nodes, since the data is always in a state that can be quickly
relocated to another node. The rebalancing process that HP Vertica performs after adding or
removing a node just has to decide which storage containers to relocate, instead of first having to
first break the data into storage containers.

Local data segmentation increases the number of storage containers stored on each node. This is
not an issue unless a table contains many partitions. For example, if the table is partitioned by day
and contains one or more years. If local data segmentation is enabled, then each of these table
partitions is broken into multiple local storage segments, which potentially results in a huge number
of files which can lead to ROS "pushback." Consider your table partitions and the effect enabling
local data segmentation may have before enabling the feature.
Enabling and Disabling Local Segmentation
To enable local segmentation, use the ENABLE_LOCAL_SEGMENTS function. To disable local
segmentation, use the DISABLE_LOCAL_SEGMENTATION function:
=> SELECT ENABLE_LOCAL_SEGMENTS();
ENABLE_LOCAL_SEGMENTS
-----------------------
ENABLED
(1 row)
=> SELECT is_local_segment_enabled FROM elastic_cluster;
is_enabled
------------
t
(1 row)
=> SELECT DISABLE_LOCAL_SEGMENTS();
DISABLE_LOCAL_SEGMENTS
------------------------
DISABLED
(1 row)
=> SELECT is_local_segment_enabled FROM ELASTIC_CLUSTER;
is_enabled
------------
f
(1 row)
Elastic Cluster Best Practices
The following are some best practices with regard to local segmentation and upgrading pre-5.0
databases.
Note: You should always perform a database backup before and after performing any of the
operations discussed in this topic. You need to back up before changing any elastic cluster or
local segmentation settings to guard against a hardware failure causing the rebalance process
to leave the database in an unusable state. You should perform a full backup of the database
after the rebalance procedure to avoid having to rebalance the database again if you need to
restore from a backup.

When to Enable Local Data Segmentation
Local Data Segmentation can significantly speed up the process of resizing your cluster. You
should enable local data segmentation if
l your database does not contain tables with hundreds partitions.
l the number of nodes in the database cluster is a power of two.
l you plan to expand or contract the size of your cluster.
Local segmentation can result in an excessive number of storage containers with tables that have
hundreds of partitions, or in clusters with a non-power-of-two number of nodes. If your database has
these two features, take care when enabling local segmentation.
Upgraded Database Consideration
Databases created using a version of HP Vertica earlier than version 5.0 do not have elastic cluster
enabled by default. If you expect to expand or contract the database in the future, you may benefit
from enabling elastic cluster by setting a scaling factor. There are two strategies you can follow:
l Enable elastic cluster now, and rebalance the database. This may take a significant amount of
time to complete,. and make consume up to 50% of the free disk space on the nodes in the
database, since all of the segmented projections are re-written. However, afterwards, adding
and removing nodes will take less time.
l Wait until you need to resize the cluster, then enable elastic cluster just before adding or
removing nodes. Changing the setting does not make the resizing of the cluster any faster, but
later resize operations will be faster.
Which method you choose depends on your specific circumstances. If you might resize your
database on short notice (for example, you may need to load a very large amount of data at once),
you can choose to schedule the downtime needed to enable elastic cluster and rebalance the
database to enable elastic cluster sooner, so the actual add or remove node process will occur
faster.
If you choose to enable elastic cluster for your database, you should consider whether you want to
enable local data segmentation at the same time. If you choose to enable local data segmentation
at a later time, you will need to rebalance the database again, which is a lengthy process.
Monitoring Elastic Cluster Rebalancing
HP Vertica includes system tables that can be used to monitor the rebalance status of an elastic
cluster and gain general insight to the status of elastic cluster on your nodes.

l The REBALANCE_TABLE_STATUS table provides general information about a rebalance. It
shows, for each table, the amount of data that has been separated, the amount that is currently
being separated, and the amount to be separated. It also shows the amount of data transferred,
the amount that is currently being transferred, and the remaining amount to be transferred (or an
estimate if storage is not separated).
Note: If multiple rebalance methods were used for a single table (for example, the table has
unsegmented and segmented projections), the table may appear multiple times - once for
each rebalance method.
l REBALANCE_PROJECTION_STATUS can be used to gain more insight into the details for a
particular projection that is being rebalanced. It provides the same type of information as above,
but in terms of a projection instead of a table.
In each table, separated_percent and transferred_percent can be used to determine overall
progress.
Historical Rebalance Information
Historical information about work completed is retained, so use the predicate "where is_latest" to
restrict the output to only the most recent or current rebalance activity. The historical data may
include information about dropped projections or tables. If a table or projection has been dropped
and information about the anchor table is not available, then NULL is displayed for the table_id and
"<unknown>" is displayed for the table_name. Information on dropped tables is still useful, for
example, in providing justification for the duration of a task.

Adding Nodes
There are many reasons for adding one or more nodes to an installation of HP Vertica:
l Increase system performance. Add additional nodes due to a high query load or load latency or
increase disk space without adding storage locations to existing nodes.
Note: The database response time depends on factors such as type and size of the application
query, database design, data size and data types stored, available computational power, and
network bandwidth. Adding nodes to a database cluster does not necessarily improve the
system response time for every query, especially if the response time is already short, e.g.,
less then 10 seconds, or the response time is not hardware bound.
l Make the database K-safe (K-safety=1) or increase K-safety to 2. See Failure Recovery for
details.
l Swap a node for maintenance. Use a spare machine to temporarily take over the activities of
an existing node that needs maintenance. The node that requires maintenance is known ahead
of time so that when it is temporarily removed from service, the cluster is not vulnerable to
additional node failures.
l Replace a node. Permanently add a node to remove obsolete or malfunctioning hardware.
Important: : If you installed HP Vertica on a single node without specifying the IP address or
hostname (or you used localhost), you cannot expand the cluster. You must reinstall HP
Vertica and specify an IP address or hostname that is not localhost/127.0.0.1.
Adding nodes consists of the following general tasks:
1. Back up the database.
HP strongly recommends that you back up the database before you perform this significant
operation because it entails creating new projections, refreshing them, and then deleting the old
projections. See Backing Up and Restoring the Database for more information.
The process of migrating the projection design to include the additional nodes could take a
while; however during this time, all user activity on the database can proceed normally, using
the old projections.
2. Configure the hosts you want to add to the cluster.
See Before you Install HP Vertica in the Installation Guide. You will also need to edit the hosts
configuration file on all of the existing nodes in the cluster to ensure they can resolve the new
host.
3. Add one or more hosts to the cluster.

4. Add the hosts you added to the cluster (in step 3) to the database.
Note: When you add a "host" to the database, it becomes a "node." You can add nodes to
your database using either the Administration Tools or the Management Console (See
Monitoring HP Vertica Using Management Console.)
After you add one or more nodes to the database, HP Vertica automatically distributes updated
configuration files to the rest of the nodes in the cluster and starts the process of rebalancing data in
the cluster. See Rebalancing Data Across Nodes for details.
Adding Hosts to a Cluster
After you have backed up the database and configured the hosts you want to add to the cluster, you
can now add hosts to the cluster using the update_vertica script.
You can use MC to add standby nodes to a database, but you cannot add hosts to a cluster using
MC.
Prerequisites and Restrictions
l If you installed HP Vertica on a single node without specifying the IP address or hostname (you
used localhost), it is not possible to expand the cluster. You must reinstall HP Vertica and
specify an IP address or hostname.
l If your database has more than one node already, you can add a node without stopping the
server. However, if you are adding a node to a single-node installation, then you must shut down
both the database and spread. If you do not, the system returns an error like the following:
$ sudo /opt/vertica/sbin/update_vertica --add-hosts node05 --rpm vertica_7.0.x.x86_64.RHE
L5.rpm
Vertica 7.0.x Installation Tool
Starting installation tasks...
Getting system information for cluster (this may take a while)....
Spread is running on ['node01']. HP Vertica and spread must be stopped before adding node
s to a 1 node cluster.
Use the admin tools to stop the database, if running, then use the following command to s
top spread:
/etc/init.d/spread stop (as root or with sudo)
Installation completed with errors.
Installation failed.
Procedure to Add Hosts
From one of the existing cluster hosts, run the update_vertica script with a minimum of the --add-
hosts host(s) parameter (where host(s) is the hostname or IP address of the system(s) that you
are adding to the cluster) and the --rpm or --deb parameter:

# /opt/vertica/sbin/update_vertica --add-hosts host(s) --rpm package
Note: See Installing with the Script for the full list of parameters. You must also provide the
same options you used when originally installing the cluster.
The update_vertica script uses all the same options as install_vertica and:
l Installs the HP Vertica RPM on the new host.
l Performs post-installation checks, including RPM version and N-way network connectivity
checks.
l Modifies spread to encompass the larger cluster.
l Configures the Administration Tools to work with the larger cluster.
Important Tips:
l A host can be specified by the hostname or IP address of the system you are adding to the
cluster. However, internally HP Vertica stores all host addresses as IP addresses.
l Do not use include spaces in the hostname/IP address list provided with --add-hosts if you
specified more than one host.
l If a package is specified with --rpm/--deb, and that package is newer than the one currently
installed on the existing cluster, then, HP Vertica first installs the new package on the existing
cluster hosts before the newly-added hosts.
l Use the same command line parameters for the database administrator username, password,
and directory path you used when you installed the cluster originally. Alternatively, you can
create a properties file to save the parameters during install and then re-using it on subsequent
install and update operations. See Installing HP Vertica Silently.
l If you are installing using sudo, the database administrator user (dbadmin) must already exist on
the hosts you are adding and must be configured with passwords and home directory paths
identical to the existing hosts. HP Vertica sets up passwordless ssh from existing hosts to the
new hosts, if needed.
l If you initially used the --point-to-point option to configure spread to use direct, point-to-point
communication between nodes on the subnet, then use the --point-to-point option whenever
you run install_vertica or update_vertica. Otherwise, your cluster's configuration is
reverted to the default (broadcast), which may impact future databases.
Examples:
--add-hosts host01 --rpm

--add-hosts 192.168.233.101
--add-hosts host02,host03
Adding Nodes to a Database
Once you have added one or more hosts to the cluster, you can add them as nodes to the database.
You can add nodes to a database using either these methods:
l The Management Console interface
To Add Nodes to a Database Using MC
Only nodes in STANDBY state are eligible for addition. STANDBY nodes are nodes included in the
cluster but not yet assigned to the database.
You add nodes to a database on MC's Manage page. Click the node you want to act upon, and then
click Add node in the Node List.
When you add a node, the node icon in the cluster view changes color from gray (empty) to green as
the node comes online. Additionally, a task list displays detailed progress of the node addition
process.
To Add Nodes to a Database Using the Administration Tools:
1. Open the Administration Tools. (See Using the Administration Tools.)
2. On the Main Menu, select View Database Cluster State to verify that the database is
running. If it is not, start it.
3. From the Main Menu, select Advanced Tools Menu and click OK.
4. In the Advanced Menu, select Cluster Management and click OK.
5. In the Cluster Management menu, select Add Host(s) and click OK.
6. Select the database to which you want to add one or more hosts, and then select OK.
A list of unused hosts is displayed.
7. Select the hosts you want to add to the database and click OK.
8. When prompted, click Yes to confirm that you want to add the hosts.
9. When prompted, enter the password for the database, and then select OK.

10. When prompted that the hosts were successfully added, select OK.
11. HP Vertica now automatically starts the rebalancing process to populate the new node with
data. When prompted, enter the path to a temporary directory that the Database Designer can
use to rebalance the data in the database and select OK.
12. Either press enter to accept the default K-Safety value, or enter a new higher value for the
database and select OK.
13. Select whether HP Vertica should immediately start rebalancing the database, or whether it
should create a script to rebalance the database later. You should select the option to
automatically start rebalancing unless you want to delay rebalancing until a time when the
database has a lower load. If you choose to automatically rebalance the database, the script is
still created and saved where you can use it later.
14. Review the summary of the rebalancing process and select Proceed.
15. If you chose to automatically rebalance, the rebalance process runs. If you chose to create a
script, the script is generated and saved. In either case, you are shown a success screen, and
prompted to select OK to end the Add Node process.

Removing Nodes
Although less common than adding a node, permanently removing a node is useful if the host
system is obsolete or over-provisioned.
Note: You cannot remove nodes if your cluster would not have the minimum number of nodes
required to maintain your database's current K-safety level (3 nodes for a database with a K-
safety level of 1, and 5 nodes for a K-safety level of 2). If you really wish to remove the node or
nodes from the database, you first must reduce the K-safety level of your database.
Removing one or more nodes consists of the following general steps:
HP recommends that you back up the database before performing this significant operation
because it entails creating new projections, deleting old projections, and reloading data.
2. Lower the K-safety of your database if the cluster will not be large enough to support its current
level of K-safety after you remove nodes.
3. Remove the hosts from the database.
4. Remove the nodes from the cluster if they are not used by any other databases.
Lowering the K-Safety Level to Allow for Node Removal
A database with a K-Safety level of 1 requires at least three nodes to operate, and a database with a
K-Safety level 2 requires at least 5 nodes to operate. To remove a node from a cluster that is at the
minimum number of nodes for its database's K-Safety level, you must first lower the K-Safety level
using the MARK_DESIGN_KSAFE function.
Note: HP does not recommend lowering the K-safety level of a database to 0, since doing so
eliminates HP Vertica's fault tolerance features. You should only use this procedure to move
from a K-safety level of 2 to 1.
To lower the K-Safety level of the database:
1. Connect to the database, either through the Administration Tools or via vsql.
2. Enter the command: SELECT MARK_DESIGN_KSAFE(n); where n is the new K-Safety level for
the database (0 if you are reducing the cluster to below 3 nodes, 1 if you are reducing the
cluster to 3 or 4 nodes).
Removing Nodes From a Database
You can remove nodes from a database using either these methods:

l The Management Console interface
Prerequisites
l The node must be empty, in other words there should be no projections referring to the node.
Ensure you have followed the steps listed in Removing Nodes to modify your database design.
l The database must be UP.
l You cannot drop nodes that are critical for K-safety. See Lowering the K-Safety Level to Allow
for Node Removal.
Remove Unused Hosts From the Database Using MC
You remove nodes from a database cluster on MC's Manage page. Click the node you want to act
upon, and then click Remove node in the Node List.
Using MC, you can remove only nodes that are part of the database cluster and which show a state
of DOWN (red). When you remove a node, its color changes from red to clear and MC updates its
state to STANDBY. You can add STANDBY nodes back to the database later.
Remove Unused Hosts From the Database Using the
To remove unused hosts from the database using the Administration Tools:
running. If the database isn't running, start it.
3. From the Main Menu, select Advanced Tools Menu, and then select OK.
4. In the Advanced menu, select Cluster Management, and then select OK.
5. In the Cluster Management menu, select Remove Host(s) from Database, and then select
OK.
6. When warned that you must redesign your database and create projections that exclude the
hosts you are going to drop, select Yes.
7. Select the database from which you want to remove the hosts, and then select OK.

A list of all the hosts that are currently being used is displayed.
8. Select the hosts you want to remove from the database, and then select OK.
9. When prompted, select OK to confirm that you want to remove the hosts. HP Vertica begins
the process of rebalancing the database and removing the node or nodes.
10. When informed that the hosts were successfully removed, select OK.
11. If you removed a host from a Large Cluster configuration, open a vsql session and run the
following command:
SELECT realign_control_nodes();
For more details, see REALIGN_CONTROL_NODES.
Removing Hosts From a Cluster
If a host that you removed from the database is not used by any other database, you can remove it
from the cluster using the update_vertica script. You can leave the database running (UP) during
this operation.
You can remove hosts from a database on the MC interface, but you cannot remove those hosts
from a cluster.
Prerequisites
The host must not be used by any database
Procedure to Remove Hosts
From one of the hosts in the cluster, run update_vertica with the –-remove-hosts switchand
provide a comma-separated list of hosts to remove from an existing HP Vertica cluster. A host can
be specified by the hostname or IP address of the system.:
# /opt/vertica/sbin/update_vertica ---remove-hosts host
For example:
# /opt/vertica/sbin/update_vertica --remove-hosts host01
Note: See Installing with the Script for the full list of parameters.
The update_vertica script uses all the same options as install_vertica and:
l Modifies the spread to match the smaller cluster.
l Configures the Administration Tools to work with the smaller cluster.

Important Tips:
l Do not include spaces in the hostname list provided with --remove-hosts if you specified more
than one host.
l If a new RPM is specified with --rpm, then HP Vertica will first install it on the existing cluster
hosts before proceeding.
l Use the same command line parameters as those used when you installed the original cluster.
Specifically if you used non-default values for the database administrator username, password,
or directory path, provide the same when you remove hosts; otherwise; the procedure fails.
Consider creating a properties file in which you save the parameters during the installation,
which you can reuse on subsequent install and update operations. See Installing HP Vertica
Silently.
Examples:
--remove-hosts host01
--remove-hosts 192.168.233.101
-R host01

Replacing Nodes
If you have a K-Safe database, you can replace nodes, as necessary, without bringing the system
down. For example, you might want to replace an existing node if you:
l Need to repair an existing host system that no longer functions and restore it to the cluster
l Want to exchange an existing host system for another more powerful system
Note: HP Vertica does not support replacing a node on a K-safe=0 database. Use the
procedures to add and remove nodes instead.
The process you use to replace a node depends on whether you are replacing the node with:
l A host that uses the same name and IP address
l A host that uses a different name and IP address
Prerequisites
l Configure the replacement hosts for HP Vertica. See Before you Install HP Vertica in the
Installation Guide.
l Read the Important Tips sections under Adding Hosts to a Cluster and Removing Hosts From a
Cluster.
l Ensure that the database administrator user exists on the new host and is configured identically
to the existing hosts. HP Vertica will setup passwordless ssh as needed.
l Ensure that directories for Catalog Path, Data Path, and any storage locations are added to the
database when you create it and/or are mounted correctly on the new host and have read and
write access permissions for the database administrator user. Also ensure that there is
sufficient disk space.
l Follow the best practice procedure below for introducing the failed hardware back into the cluster
to avoid spurious full-node rebuilds.
Best Practice for Restoring Failed Hardware
Following this procedure will prevent HP Vertica from misdiagnosing missing disk or bad mounts as
data corruptions, which would result in a time-consuming, full-node recovery.
If a server fails due to hardware issues, for example a bad disk or a failed controller, upon repairing
the hardware:

1. Reboot the machine into runlevel 1, which is a root and console-only mode.
Runlevel 1 prevents network connectivity and keeps HP Vertica from attempting to reconnect
to the cluster.
2. In runlevel 1, validate that the hardware has been repaired, the controllers are online, and any
RAID recover is able to proceed.
Note: You do not need to initialize RAID recover in runlevel 1; simply validate that it can
recover.
3. Once the hardware is confirmed consistent, only then reboot to runlevel 3 or higher.
At this point, the network activates, and HP Vertica rejoins the cluster and automatically recovers
any missing data. Note that, on a single-node database, if any files that were associated with a
projection have been deleted or corrupted, HP Vertica will delete all files associated with that
projection, which could result in data loss.
Replacing a Node Using the Same Name and IP Address
To replace a node with a host system that has the same IP address and host name as the original:
1. Backing Up and Restoring the Database.
2. From a functioning node in the cluster, run the install_vertica script with the -s and -r
parameters. Additionally, use the same additional install parameters that were used when the
cluster was originally installed.
# /opt/vertica/sbin/install_vertica --hosts host --rpm rpm_package
Where host is the hostname or IP address of the system you are restoring to the cluster; for
example:
--hosts host01
--hosts 192.168.233.101
--rpm is the name of the rpm package; for example --rpm vertica_7.0.x.x86_
64.RHEL5.rpm
The installation script verifies system configuration and that HP Vertica, spread, and the
Administration Tools metadata are installed on the host.
3. On the new node, create catalog and data directories (unless they both reside in the same top-
level directory, then you just need to create the one directory). These are the same top-level
directories you specified when creating the database.

Note: You can find the directories used for catalog and data storage by querying the V_
MONITOR.DISK_STORAGE system table. You need to create the directories up to the
v_database_node00xx portion of the data and catalog path. For example, if the catalog
storage location is /home/dbadmin/vmart/v_vmart_node0001_catalog/Catalog, you
would need to create the /home/dbadmin/vmart directory to store the catalog.
4. Use the Administration Tools to restart the host you just replaced.
The node automatically joins the database and recovers its data by querying the other nodes
within the database. It then transitions to an UP state.
Note: Do not connect two hosts with the same name and IP address to the same network.
If this occurs, traffic is unlikely to be routed properly.
Replacing a Failed Node Using a node with Different IP
Address
Replacing a failed node with a host system that has a different IP address from the original consists
of the following steps:
HP recommends that you back up the database before you perform this significant operation
2. Add the new host to the cluster. See Adding Hosts to a Cluster.
3. If HP Vertica is still running in the node being replaced, then use the Administration Tools to
Stop Vertica on Host on the host being replaced.
4. Use the Administration Tools to replace the original host with the new host. . If you are using
more than one database, replace the original host in all the databases in which it is used. See
Replacing Hosts.
5. Use the procedure in Distributing Configuration Files to the New Host to transfer metadata to
the new host.
6. Remove the host from the cluster.
7. Use the Administration Tools to restart HP Vertica on the host. On the Main Menu, select
Restart Vertica on Host, and click OK. See Starting the Database for more information.
Once you have completed this process, the replacement node automatically recovers the data that
was stored in the original node by querying other nodes within the database.

Replacing a Functioning Node Using a Different Name
and IP Address
Replacing a node with a host system that has a different IP address and host name from the original
consists of the following general steps:
HP recommends that you back up the database before you perform this significant operation
2. Add the replacement hosts to the cluster.
At this point, both the original host that you want to remove and the new replacement host are
members of the cluster.
3. Use the Administration Tools to Stop Vertica on Host on the host being replaced.
4. Use the Administration Tools to replace the original host with the new host. If you are using
more than one database, replace the original host in all the databases in which it is used. See
Replacing Hosts.
5. Remove the host from the cluster.
6. Restart HP Vertica on the host.
Once you have completed this process, the replacement node automatically recovers the data that
was stored in the original node by querying the other nodes within the database. It then transitions
to an UP state.
Note: If you do not remove the original host from the cluster and you attempt to restart the
database, the host is not invited to join the database because its node address does not match
the new address stored in the database catalog. Therefore, it remains in the INITIALIZING
state.
Using the Administration Tools to Replace Nodes
If you are replacing a node with a host that uses a different name and IP address, use the
Administration Tools to replace the original host with the new host. Alternatively, you can use the
Management Console to replace a node.
Replace the Original Host with a New Host Using the
To replace the original host with a new host using the Administration Tools:

1. Back up the database. See Backing Up and Restoring the Database.
2. From a node that is up, and is not going to be replaced, open the Administration Tools.
running. If it’s not running, use the Start Database command on the Main Menu to restart it.
4. On the Main Menu, select Advanced Menu.
5. In the Advanced Menu, select Stop HP Vertica on Host.
6. Select the host you want to replace, and then click OK to stop the node.
7. When prompted if you want to stop the host, select Yes.
8. In the Advanced Menu, select Cluster Management, and then click OK.
9. In the Cluster Management menu, select Replace Host, and then click OK.
10. Select the database that contains the host you want to replace, and then click OK.
A list of all the hosts that are currently being used displays.
11. Select the host you want to replace, and then click OK.
12. Select the host you want to use as the replacement, and then click OK.
13. When prompted, enter the password for the database, and then click OK.
14. When prompted, click Yes to confirm that you want to replace the host.
15. When prompted that the host was successfully replaced, click OK.
16. In the Main Menu, select View Database Cluster State to verify that all the hosts are running.
You might need to start HP Vertica on the host you just replaced. Use Restart Vertica on
Host.
The node enters a RECOVERING state.
Caution: If you are using a K-Safe database, keep in mind that the recovering node counts as
one node down even though it might not yet contain a complete copy of the data. This means
that if you have a database in which K safety=1, the current fault tolerance for your database is
at a critical level. If you lose one more node, the database shuts down. Be sure that you do not
stop any other nodes.
Using the Management Console to Replace Nodes
On the MC Manage page, you can quickly replace a DOWN node in the database by selecting one
of the STANDBY nodes in the cluster.

A DOWN node shows up as a red node in the cluster. Click the DOWN node and the Replace node
button in the Node List becomes activated, as long as there is at least one node in the cluster that is
not participating in the database. The STANDBY node will be your replacement node for the node
you want to retire; it will appear gray (empty) until it has been added to the database, when it turns
green.
Tip: You can resize the Node List by clicking its margins and dragging to the size you want.
When you highlight a node and click Replace, MC provides a list of possible STANDBY nodes to
use as a replacement. After you select the replacement node, the process begins. A node
replacement could be a long-running task.
MC transitions the DOWN node to a STANDBY state, while the node you selected as the
replacement will assume the identity of the original node, using the same node name, and will be
started.
Assuming a successful startup, the new node will appear orange with a status of RECOVERING
until the recovery procedure is complete. When the recovery process completes, the replacement
node will turn green and show a state of UP.

Rebalancing Data Across Nodes
HP Vertica automatically rebalances your database when adding or removing nodes. You can also
manually trigger a rebalance using the Administration Tools or using SQL functions. Users can
rebalance data across nodes through the Management Console interface (seeRebalancing Data
Using Management Console for details).
Whether you start the rebalance process manually or automatically, the process occurs in the
following steps:
l For segmented projections, HP Vertica creates new (renamed), segmented projections that are
identical in structure to the existing projections, but which have their data distributed across all
nodes. The rebalance process then refreshes all new projections, sets the Ancient History Mark
(AHM) to the greatest allowable epoch (now), and drops all of the old segmented projections. All
new buddy projections have the same base name so they can be identified as a group.
Note: HP Vertica does not maintain custom projection segmentations defined with a
specific node list. Node rebalancing distributes data across all nodes, regardless of any
custom definitions.
l For unsegmented projections, leaves existing projections unmodified, creates new projections
on the new nodes, and refreshes them.
l After the data has been rebalanced, HP Vertica drops:
n Duplicate buddy projections with the same offset
n Duplicate replicated projections on the same node
K-safety and Rebalancing
Before data rebalancing completes, HP Vertica operates with the existing K-safe value. After
rebalancing completes, HP Vertica operates with the K-safe value specified during the rebalance
operation.
You can maintain existing K-safety or specify a new value (0 to 2) for the modified database cluster.
HP Vertica does not support downgrading K-safety and returns a warning if you attempt to reduce it
from its current value: Design k-safety cannot be less than system k-safety level. For
more information, see Lowering the K-Safety Level to Allow for Node Removal.
Rebalancing Failure and Projections
If a failure occurs while rebalancing the database, you can rebalance again. If the cause of the
failure has been resolved, the rebalance operation continues from where it failed. However, a failed
data rebalance can result in projections becoming out of date, so that they cannot be removed
automatically.
To locate any such projections, query the V_CATALOG.PROJECTIONS system table as follows:

=> SELECT projection_name, anchor_table_name, is_prejoin,
is_up_to_date
FROM projections
WHERE is_up_to_date = false;
To remove out-of-date projections, use the DROP PROJECTION function.
Permissions
Only the superuser has permissions to rebalance data.
Rebalancing Data Using the Administration Tools UI
To rebalance the data in your database:
1. Open the Administration Tools. (See Using the Administration Tools.)
running. If it is not, start it.
3. From the Main Menu, select Advanced Tools Menu and click OK.
4. In the Advanced Menu, select Cluster Management and click OK.
5. In the Cluster Management menu, select Re-balance Data and click OK.
6. Select the database you want to rebalance, and then select OK.
7. Enter the directory for the Database Designer outputs (for example /tmp) and click OK.
8. Accept the proposed K-safety value or provide a new value. Valid values are 0 to 2.
9. Review the message and click Proceed to begin rebalancing data.
The Database Designer modifies existing projections to rebalance data across all database
nodes with the K-safety you provided. A script to rebalance data, which you can run manually
at a later time, is also generated and resides in the path you specified; for example
/tmp/extend_catalog_rebalance.sql.
Important: Rebalancing data can take some time, depending on the number of projections
and the amount of data they contain. HP recommends that you allow the process to
complete. If you must cancel the operation, use Ctrl+C.
The terminal window notifies you when the rebalancing operation is complete.
10. Press Enter to return to the Administration Tools.

Rebalancing Data Using Management Console
HP Vertica automatically rebalances the database after you add or remove nodes. If, however, you
notice data skew where one node shows more activity than another (for example, most queries
processing data on a single node), you can manually rebalance the database using MC if that
database is imported into the MC interface.
On the Manage page, click Rebalance in the toolbar to initiate the rebalance operation.
During a rebalance, you cannot perform any other activities on the database cluster, such as start,
stop, add, or remove nodes.
Rebalancing Data Using SQL Functions
There are three SQL functions that let you manually control the data rebalancing process. You can
use these functions to run a rebalance from a script scheduled to run at an off-peak time, rather than
having to manually trigger a rebalance through the Administration Tools.
These functions are:
l REBALANCE_CLUSTER()
l START_REBALANCE_CLUSTER()
l CANCEL_REBALANCE_CLUSTER()
For more information and examples of using these functions, see their entries in the SQL Reference
Manual.
Redistributing Configuration Files to Nodes
The add and remove node processes automatically redistribute the HP Vertica configuration files.
You may rarely need to redistribute the configuration files to help resolve configuration issues.
To distribute configuration files to a host:
1. Log on to a host that contains these files and start the Administration Tools.
See Using the Administration Tools for information about accessing the Administration Tools.
2. On the Main Menu in the Administration Tools, select Configuration Menu and click OK.
4. Select Database Configuration.
5. Select the database in which you want to distribute the files and click OK.

The vertica.conf file is distributed to all the other hosts in the database. If it previously existed
on a host, it is overwritten.
7. Select SSL Keys.
The certifications and keys for the host are distributed to all the other hosts in the database. If
they previously existed on a host, they are overwritten.
Select AdminTools Meta-Data.
The Administration Tools metadata is distributed to every host in the cluster.
Stopping and Starting Nodes on MC
You can start and stop one or more database nodes through the Manage page by clicking a specific
node to select it and then clicking the Start or Stop button in the Node List.
Note: The Stop and Start buttons in the toolbar start and stop the database, not individual
nodes.
On the Databases and Clusters page, you must click a database first to select it. To stop or start a
node on that database, click the View button. You'll be directed to the Overview page. Click
Manage in the applet panel at the bottom of the page and you'll be directed to the database node
view.
The Start and Stop database buttons are always active, but the node Start and Stop buttons are
active only when one or more nodes of the same status are selected; for example, all nodes are UP
or DOWN.
After you click a Start or Stop button, Management Console updates the status and message icons
for the nodes or databases you are starting or stopping.

Managing Disk Space
HP Vertica detects and reports low disk space conditions in the log file so that the issue can be
addressed before serious problems occur. It also detects and reports low disk space conditions via
SNMP traps if enabled.
Critical disk space issues are reported sooner than other issues. For example, running out of
catalog space is fatal; therefore, HP Vertica reports the condition earlier than less critical
conditions. To avoid database corruption when the disk space falls beyond a certain threshold, HP
Vertica begins to reject transactions that update the catalog or data.
Caution: A low disk space report indicates one or more hosts are running low on disk space or
have a failing disk. It is imperative to add more disk space (or replace a failing disk) as soon as
possible.
When HP Vertica reports a low disk space condition, use the DISK_RESOURCE_REJECTIONS
system table to determine the types of disk space requests that are being rejected and the hosts on
which they are being rejected.
These and the other Using System Tables system tables are described in

HPE Vertica_7.0.x Administrators Guide

HPE Vertica_7.0.x Administrators Guide

More Related Content

What's hot (17)

Viewers also liked (15)

Similar to HPE Vertica_7.0.x Administrators Guide (20)

More from Andrey Karpov (16)

Recently uploaded (20)

HPE Vertica_7.0.x Administrators Guide