IDUG 2015 NA Data Movement Utilities final

DB2 Data Movement Utilities: A
Comparison
Speaker: Jeyabarathi(JB) Chakrapani
NASCO
Session Code: D09
Wed, May 06, 2015 (08:00 AM - 09:00 AM) : Hancock | Platform: DB2 LUW II

Agenda
 Learn the various tools that are available with DB2 for achieving
efficient data movement within the database environment.
 Offer a brief introduction into each utility including the DB2
Admin Online Table Move procedure.
 Learn the various enhancements offered in each DB2 version
for each of these utilities
 Understand how to use the different utilities with examples.
 Learn what it takes to maximize the performance of your choice
of data movement utility along with useful tricks and tips.

Introduction to DB2 data movement utilities
Load Utility
Export Utility
Import Utility
Ingest Utility
DB2move tool
Restore utility
ADMIN_COPY_SCHEMA
ADMIN_MOVE_TABLE
Split Mirror
IBM replication tools
3
What are the available tools and options for data movement?

Load utility
Required input for Load:
 The path and the name of the input file, named pipe, or
device.
 The name or alias of the target table.
 The format of the input source. This format can be DEL,
ASC, PC/IXF, or CURSOR.
 Whether the input data is to be appended to the table, or
is to replace the existing data in the table.
 A message file name, if the utility is invoked through the
application programming interface (API), db2Load.
5

Load
Load phases:
• Load
• Build
• Delete
• Index Copy
Load modes:
• Insert
• Replace
• Restart
• Terminate
6

Load options Include:
• If the load utility is invoked from a remotely connected client,
the data file must be on the client. XML and LOB data are
always read from the server, even you specify the CLIENT
option.
• The method to use for loading the data: column location,
column name, or relative column position.
• How often the utility is to establish consistency points.
• The names of the table columns into which the data is to be
inserted.
• Whether or not preexisting data in the table can be queried
while the load operation is in progress.
• Whether the load operation should wait for other utilities or
applications to finish using the table or force the other
applications off before proceeding.
Client Options
Method
Consistency
Points
Access level
Paths
TableSpace
Statistics
Recovery
COPY NO/YES
7

Load options Include:
• An alternate system temporary table space in which to build the
index.
• The paths and the names of the input files in which LOBs are
stored.
• A message file name.
• Whether the utility should modify the amount of free space
available after a table is loaded.
• Whether statistics are to be gathered during the load process.
This option is only supported if the load operation is running in
REPLACE mode.
• Whether to keep a copy of the changes made. This is done to
enable rollforward recovery of the database.
• The fully qualified path to be used when creating temporary files
during a load operation. The name is specified by the TEMPFILES
PATH parameter of the LOAD command.
Client Options
Method
Consistency
Points
Access level
Paths
TableSpace
Statistics
Recovery
COPY NO/YES
8

Load Restrictions:
• Loading data into nicknames is not supported.
• Loading data into typed tables, or tables with
structured type columns, is not supported.
• Loading data into declared temporary tables and
created temporary tables is not supported.
• XML data can only be read from the server side;
if you want to have the XML files read from the
client, use the import utility.
• You cannot create or drop tables in a table space
that is in Backup Pending state.
• If an error occurs during a LOAD REPLACE
operation, the original data in the table is lost.
Retain a copy of the input data to allow the load
operation to be restarted.
• Triggers are not activated on newly loaded rows.
Business rules associated with triggers are not
enforced by the load utility.
• Loading encrypted data is not supported.
Nick names
Structured Data Types
Temporary Tables
XML support
Backup Pending
Load Replace
Triggers
Data encryptions
Partitioned Tables
9

Load from Cursor…
Examples:
DECLARE mycurs CURSOR FOR SELECT * FROM abc.table1;
LOAD FROM mycurs OF cursor INSERT INTO abc.table2;
DECLARE C1 CURSOR FOR SELECT * FROM customers
WHERE XMLEXISTS(’$DOC/customer[income_level=1]’);
LOAD FROM C1 OF CURSOR INSERT INTO lvl1_customers;
The ANYORDER file type modifier is supported for
loading XML data into an XML column.
• Loads the results of a query
directly into the target table,
no intermediate export
necessary.
• XML data can be loaded
with the cursor option.
•Nicknames can be
referenced in the SQL query
of the cursor using the
DATABASE option.
•Load from remote database
using the DATABASE option.
10

Examples:
 Loading from a federated database:
Federation should be enabled and data source cataloged.
CREATE NICKNAME myschema1.table1 FOR source.abc.table1;
DECLARE mycurs CURSOR FOR SELECT c1,c2,c3 FROM myschema1.table1
LOAD FROM mycurs OF cursor INSERT INTO abc.table2
 Loading from a remote database:
The remote database should be cataloged.
DECLARE mycurs CURSOR DATABASE dbsource USER dsciaraf USING mypasswd
FOR SELECT * FROM abc.table1;
LOAD FROM mycurs OF cursor INSERT INTO abc.table2;
11

Checking for Integrity violations….
Load puts the tables in check pending status when:
 The table has check constraints or RI constraints.
 The table has identity columns and a V7 or earlier client was used to load data.
 The table has descendent immediate staging tables or MQT tables referencing it.
 The table is a staging table or MQT table.
12

Load Performance….
CPU_PARALLELISM - specifies the number of threads used by the load utility to
parse, convert, and format data records
DISK_PARALLELISM - specifies the number of processes or threads used by the load
utility to write data records to disk
DATA_BUFFER - total amount of memory, in 4 KB units, allocated to the load utility
as a buffer
NONRECOVERABLE – Does not put the table in backup pending.
SAVE COUNT – Specifies consistency points.
STATISTICS USE PROFILE – Collection of statistics after load
FASTPARSE – Used when data is known to be valid.
NOROWWARNINGS - use this when multiple warnings are expected.
PAGEFREESPACE,INDEXFREESPACE,TOTALFREESPACE – Specify to reduce the need
for reorg
13

EXPORT UTILITY
• Required input:
 Pathname for the output file.
 Format of the output file -
IXF or DEL.
 Specification of data to be
extracted using SELECT
statement
• Additional options:
 subset of columns to be extracted
using the METHOD option.
 XML TO, XMLFILE, XML SAVESCHEMA
- to export and store XML data in
different ways.
 The SELECT statement used for
extracting data can be optimized the
same way any SQL query can be
optimized to improve export
performance.
 ‘Messages’ option allows messages
generated by the export utility to be
written to a file.
15
Data extraction using SQL query or Xquery statements

EXPORT UTILITY
16
Examples…
• Export to table1.ixf of ixf messages msg.txt select * from
myschema.table1.
This is a simple export command that exports all rows to the ixf file.
• EXPORT TO table1export.del OF DEL XML TO /db/xmlpath
XMLFILE xmldocs XMLSAVESCHEMA SELECT * FROM
myschema.table1
• export to table1.del of del lobs to /db/lob1, /db/lob2/ modified
by lobsinfile select * from myschema.table1

IMPORT
• Required input for Import:
 The path and the name of the
input file
 The name or alias of the target
table or view
 The format of the data in the
input file
 The method by which the data is
to be imported
 The traverse order, when
importing hierarchical data
 The subtable list, when importing
typed tables
• Additional options:
 MODIFIED BY clause
 ALLOW WRITE ACCESS – Import
acquires non exclusive lock
 ALLOW NO ACCESS – Import
acquires exclusive lock, waits for
other work to complete until it
can acquire the lock.
 COMMITCOUNT – Commits after
the specified number of rows.
 MESSAGES
18
Data append/update using SQL query or XQuery statements

Import
• Import Support
 Import supports IXF, ASC, and DEL
data formats.
 Used with file type modifiers to
customize the import operation.
 Used to move hierarchical data and
typed tables.
 Import logs all activity, updates
indexes, verifies constraints, and
fires triggers.
 Allows you to specify the names of
the columns within the table or
view into which the data is to be
inserted.
• Import Modes
 INSERT – Adds data to the existing
table without changing existing data.
 INSERT_UPDATE – Updates data with
matching primary key values,
otherwise inserts.
 REPLACE – Deletes existing data and
inserts new data.
 CREATE - Creates the target table and
its index definitions.
 REPLACE_CREATE – Deletes existing
data and inserts new data. If the
target table does not exist, it is
created
19

IMPORT
Restrictions
• If the table has primary key that is
referenced by a foreign key, it can be
only appended to.
• You cannot perform an import replace
operation into an underlying table of a
materialized query table defined in
refresh immediate mode.
• You cannot import data into a system
table, a summary table, or a table with
a structured type column.
• You cannot import data into declared
temporary tables.
• Views cannot be created through the
import utility.
• Cannot import encrypted data.
• Referential constraints and foreign key
definitions are not preserved when
creating tables from PC/IXF files.
(Primary key definitions are preserved if
the data was previously exported by
using SELECT *.)
• Because the import utility generates its
own SQL statements, the maximum
statement size of 2 MB might, in some
cases, be exceeded.
• You cannot re-create a partitioned table
or a multidimensional clustered table
(MDC) by using the CREATE or
REPLACE_CREATE import parameters.
• Cannot re-create tables containing XML
• Does not honor Not Logged Initially
clause.
20

IMPORT Restrictions …
Remote import is not
allowed if
• The application and
database code pages are
different.
• The file being imported is a
multiple-part PC/IXF file.
• The method used for
importing the data is either
column name or relative
column position.
• The target column list
provided is longer than 4 KB.
• The LOBS FROM clause or
the lobsinfile modifier is
specified.
• The NULL INDICATORS
clause is specified for ASC
files.
21

IMPORT Performance …
22
 If the workload is mostly insert, consider altering the table to
‘append on’.
 To avoid transaction log full condition, consider an
appropriate ‘commit count’ value.
 Enable DB2_PARALLEL_IO registry variable.
 Review logbuffer db cfg value and increase it as necessary.
 Review utility heap db cfg value and increase as needed.
 Review num_ioservers, num_iocleaners parameters.

INGEST
• INGEST characteristics
• Fast – Multithreaded design to process in parallel.
• Available – Uses row level locking and so tables remain
concurrent.
• Continuous – Can continuously ingest data streams from
pipes or files.
• Robust – Handles unexpected failures. Can be restarted
from last commit point.
• Flexible and Functional – Supports different input formats
and target tables types, has rich data manipulation
capabilities.
24

INGEST
Supported Table Types
• multidimensional clustering (MDC)
and insert time clustering (ITC)
tables
• range-partitioned tables
• range-clustered tables (RCT)
• materialized query tables (MQTs)
that are defined as MAINTAINED
BY USER, including summary
tables
• temporal tables
• updatable views (except typed
views)
Supported data formats
• Delimited text
• Positional text and binary
• Columns in various orders and
formats
25

Ingest
Transporter Formatter
Formatter
Formatter
Transporter
SQL
DB Partition 1
Multiple
files
or
Multiple
pipes
Formatter
Array InsertFlusher
[Array Insert]
Array InsertFlusher
[Array Insert]
Array InsertFlusher
[Array Insert]
Hash by
database
partition
DB Partition 2
DB Partition n
Main components:
 Transporter
 Formatter
 Flusher

INGEST
• Transporter:
 Reads from data source and
writes to formatter queues.
For INSERT and MERGE
operations, there is one
transporter thread for each
input source. For UPDATE and
DELETE operations, there is
only one transporter thread.
• Formatter:
 Parse each record, convert the
data into the format that DB2
requires, & writes each
formatted record to one of
the flusher queues for that
record's partition.
 The num_formatters
configuration parameter is
used to specify the number of
formatter threads. The default
is (number of logical CPUs)/2.
27

INGEST
• Flusher:
 The flushers issue the SQL statements to perform the operations
on the DB2 tables. The number of flushers for each partition is
specified by the num_flushers_per_partition configuration
parameter. The default is max(1, ((number of logical
CPUs)/2)/(number of partitions) ).
28

INGEST Examples
29
INGEST FROM FILE my_file.del FORMAT DELIMITED INSERT INTO
my_table;
Input records are sent over a named pipe
INGEST FROM PIPE my_pipe FORMAT DELIMITED INSERT INTO
my_table;
Input records delimited by CRLF; fields are delimited by vertical bar
INGEST FROM FILE my_file.del FORMAT DELIMITED '|' INSERT
INTO my_table;

INGEST Examples
30
INGEST FROM FILE input_file.txt
FORMAT DELIMITED
(
$key1 INTEGER EXTERNAL,
$data1 CHAR(8),
$data2 CHAR(32),
$data3 DECIMAL(5,2) EXTERNAL
)
MERGE INTO target_table
ON (key1 = $key1)
WHEN MATCHED THEN
UPDATE SET (data1, data2, data3) = ($data1, $data2,
$data3)
WHEN NOT MATCHED THEN
INSERT VALUES($key1, $data1, $data2, $data3);

INGEST – Examples…
31
Ingest configuration:
connect to mydb user <username> using <password>;
INGEST SET num_flushers_per_partition 1;
INGEST SET NUM_FORMATTERS 12;
INGEST SET shm_max_size 12 GB;
INGEST SET commit_count 20000;
ingest from file
/mydir/file1
FORMAT DELIMITED by ','
RESTART OFF
insert into myschema.tab1;

INGEST – Restart ..
32
Restart information is stored in a separate table
(SYSTOOLS.INGESTRESTART) and it is created once.
To create the restart table on DB2 10.1
CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C',
NULL, NULL).
The table contains some counters to keep track of which records
have been ingested.

INGEST - Restart
33
RESTART CONTINUE to restart a previously failed job
(and clean up the restart data)
RESTART TERMINATE to clean up the restart data from
a failed job you don't plan to restart
RESTART OFF to suppress saving of restart information
(in which case the ingest job is not restartable)

INGEST – Additional features
34
Commit by time or number of rows - Commit_count or
commit_period configuration parameter
Support for copying rejected records to a file or table -
DUMPFILE or EXCEPTION TABLE parameter
Support for restart and recovery - retry_count ingest
configuration parameter.

INGEST - Monitoring
35
INGEST LIST and INGEST GET STATS commands
Reads information that the utility maintains in shared
memory.
Must be run in a separate window on the same machine
as the INGEST command.
Can display detailed information

INGEST and LOAD
• INGEST
• When the table needs to
remain concurrent during load.
• You need only some fields from
the input file to be loaded.
• You need to specify an SQL
statement other than INSERT
• You need to be able to use an
SQL expression (to construct a
column value from field values)
• You need to recover and
continue on when the utility
gets a recoverable error
• LOAD
• Don’t need the table to remain
concurrent.
• XML & LOB data to be loaded.
• Load from cursor or load from a
device
• Input source file is in IXF format
• Load a GENERATED ALWAYS column
or SYSTEM_TIME column from the
input file
• Use SYSPROC.ADMIN_CMD
• Invoke the utility through an API
• Don't want the INSERTs to be logged
36

INGEST - Performance
• Field type and column type
• Define fields to be the same type as their corresponding column types.
• Materialized query tables (MQTs)
• If you are using Ingest utility against a base table of an MQT defined as
refresh immediate, performance can degrade significantly due to the time
required to update the MQT.
• Row size
• Increase the setting of the commit_count for tables with smaller row size
and reduce it for tables with larger row size
• Other workloads
• If multiple workloads are running along with the ingest, consider increasing
the locklist database configuration parameter and reduce the
commit_count ingest configuration parameter.
•
37

Comparison between Import, Load and Ingest
38
Table type IMPORT LOAD INGEST
Created global temporary table no no no
Declared global temporary table no no no
Detached table that has dependent table where SET
INTEGRITY not run (detached table has
SYSCAT.TABLES.TYPE = 'L')
no
(SQL20285N,
reason code 1)
no
(SQL20285N,
reason code 1)
no
Multidimensional clustering (MDC) table yes yes yes
Materialized query table (MQT) that is maintained by user yes yes yes
Nickname relational
except ODBC
no
(SQL02305N)
yes
Range-clustered table (RCT) yes no yes
Range-partitioned table yes yes yes
Summary table no yes yes
Typed table yes no (SQL3211N) no
Typed view yes no
(SQL2305N)
no
Untyped (regular) table yes yes yes
Updatable view yes no
(SQL02305N)
yes

Comparison to IMPORT and LOAD – Column types
39
Column data type IMPORT LOAD INGEST
Numeric: SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE,
DECFLOAT
yes yes yes
Character: CHAR, VARCHAR, NCHAR, NVARCHAR, plus
corresponding FOR BIT DATA types
yes yes yes
Graphic: GRAPHIC, VARGRAPHIC yes yes yes
Long types: LONG VARCHAR, LONG VARGRAPHIC yes yes yes
Date/time: DATE, TIME, TIMESTAMP(p) yes yes yes
DB2SECURITYLABEL yes yes yes
LOBs from files: BLOB, CLOB, DBCLOB, NCLOB yes yes no
Inline LOBs yes yes no
XML from files yes yes no
Inline XML no no no
Distinct Type (note 1) yes yes yes
Structured Type no no no
Reference Type yes yes yes

Comparison to IMPORT and LOAD
Input Types and Formats
40
Input type Import Load INGEST
Cursor no yes no
Device no yes no
File yes yes yes
Pipe no yes yes
Multiple input files, multiple
pipes, etc
no yes yes
Input format IMPORT LOAD INGEST
ASC (including binary) yes, except binary yes yes
DEL yes yes yes
IXF yes yes no
WSF (worksheet format) yes, but will be
discontinued in
DB2 10.1
no no

Comparison to IMPORT and LOAD – Other features
41
Feature IMPORT LOAD INGEST
Can other apps update the table
while utility is loading the table?
yes no yes
Can use SQL expressions? no no yes
Support for REPLACE yes yes yes
Support for UPDATE, MERGE,
and DELETE
Update only no yes
Can update GENERATED ALWAYS
and SYSTEM_TIME columns?
no yes no
Performance for large number of
input records
slow best Comparable to load into
staging table followed by
multiple concurrent inserts
from staging table to
target table
API yes yes no (planned for a fix pack)
SYSPROC.ADMIN_CMD support no yes no
Inserts and updates are logged? yes no yes (cannot be turned off,
and no support for NOT
LOGGED INITIALLY)
Error recovery no no yes
Restart no yes Yes

ADMIN_MOVE_TABLE Procedure
42
Can be done online or offline.
A shadow copy of the source table is taken.
Source table changes are captured and applied thru triggers.
Source table is taken offline briefly to rename the shadow
copy and its indexes to the source table name.

43
Call the stored procedure once specifying atleast the schema
name and the table name.
CALL SYSPROC.ADMIN_MOVE_TABLE (’schema name’,
’source table’, ’’,’’,’’,’’,’’,’’,’’,’’,’MOVE’)
Or call the procedure multiple times for each operation of the
move.
CALL SYSPROC.ADMIN_MOVE_TABLE (’schema name’,
’source table’, ’’,’’,’’,’’,’’,’’,’’,’’,’operation name’)

44
Moving range partitioned tables
CREATE TABLE “SCHEMA1 "."T1" ("I1" INTEGER ,"I2" INTEGER )
DISTRIBUTE BY HASH("I1") PARTITION BY RANGE("I1")
(PART "PART0" STARTING(0) ENDING(100) IN "TS1",
PART "PART1" STARTING(101) ENDING(MAXVALUE) IN "TS2");
Move the T1 table from schema SCHEMA1 to the TS3 table space, leaving
the first partition in TS1.
DB2 "CALL SYSPROC.ADMIN_MOVE_TABLE
(’SCHEMA1’,’T1’,’TS3’,’TS3’,’TS3’,’’,’’,
’(I1) (STARTING 0 ENDING 100 IN TS1 INDEX IN TS1 LONG IN TS1,
STARTING 101 ENDING MAXVALUE IN TS3 INDEX IN TS3 LONG IN
TS3)’, ’’,’’,’MOVE’)"

IBM Replication tools
45
 Q replication
Q capture and Q apply components
Q capture reads DB2 recovery logs and translates committed data into
Websphere MQ messages.
Q apply reads the messages from the queue translates them into SQL
statements that can be applied to the target server.
SQL replication
Capture and apply components
Capture reads DB2 log data and writes to change data tables. Apply reads
the change data tables and replicates the changes to the target tables.

DB2move utility and ADMIN_COPY_SCHEMA
46
ADMIN_COPY_SCHEMA procedure to copy a single schema within the
same database.
Options:
- DDL, COPY, COPYNO.
 db2move utility with the -co COPY action to copy a single schema or
multiple schemas from a source database to a target database.
Eg:
db2move <dbname> COPY -sn schema1 -co target_db target schema_map "
((schema1,schema2))" tablespace_map "((TS1, TS2),(TS3, TS4),
SYS_ANY)“ -u userid -p password

DB2 Redirected Restore utility
47
Perform redirected restores to build partial or full database images.
db2 restore db test from <directory/tsm> taken at <timestamp>
redirect generate script redirect.sql
Transport a set of table spaces, storage groups and SQL schemas from
database backup image to a database using the TRANSPORT option (in DB2
Version 9.7 Fix Pack 2 and later fix packs).
db2 restore db <sourcedb> tablespace (mydata1)
schema(schema1,schema2)
from <Media_Target_clause> taken at <date-time>
transport into <targetdb> redirect
db2 list tablespaces
db2 set tablespace containers for <tablespace ID for mydata1> using
(path '/db2DB/data1')

Suspended I/O and online split mirror
48
For large databases, make copies from a mirrored image by using suspended
I/O and the split mirror function. This approach also:
 Eliminates backup operation overhead from the production machine
 Represents a fast way to clone systems.
 Represents a fast implementation of idle standby failover.
Disk mirroring is the process of writing data to two separate hard disks at the
same time. One copy of the data is called a mirror of the other. Splitting a
mirror is the process of separating the two copies.

Summary
49
Load - This utility is best suited to situations where performance is your
primary concern.
Ingest - This utility strikes a good balance between performance and
availability, but if the latter is more important to you, then you should choose
the ingest utility instead of the load utility.
Import - The import utility can be a good alternative to the load utility
in the following situations:
- where the target table is a view.
- the target table has constraints and you don't want the target table to be put
in the Set Integrity Pending state.
- the target table has triggers and you want them fired.

References
50
IBM Red book on DB2 Data Movement.
IBM Knowledge center for DB2 V9.7 and V10.1.
IBM Developer Works Technical Library.
IDUG technical archives.

JEYABARATHI CHAKRAPANI
NASCO
jbchakra@gmail.com
Session D09
Title: DB2 Data Movement Utilities : A comparison
Please fill out your session
evaluation before leaving!

IDUG 2015 NA Data Movement Utilities final

More Related Content

What's hot (20)

Similar to IDUG 2015 NA Data Movement Utilities final (20)

IDUG 2015 NA Data Movement Utilities final