Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa

C02- Perfect Trio: Temporal Tables,
Transparent Archiving in DB2 for
z/OS and IDAA
Mehmet Cuneyt Goksu, mehmet.goksu@ibm.com
IDAA & Db2 Tools Lab Advocate
IBM Germany R&D, Böblingen

• Data archiving requirements and challenges
• Data archiving solutions for z/OS systems
• Temporal Tables & History Generation
• Transparent Archiving & History
Generation
• Overview of IDAA Technology
• Combining Solutions for different usecases
Agenda

Sometimes, due to legal requirements
Why retain data for long periods of time?
• Sometimes in support of customer service
We need to
repair
your 2005
vehicle
• Sometimes for analytics purposes
If we analyze more
data, we’ll get more
valuable insight…

• DB2 tables with non-continuously-ascending clustering key (new rows get inserted throughout table),
data retention can increase the CPU cost of data access
• More recently-inserted rows are often the most frequently accessed, but sets of such rows will be separated by
ever-larger numbers of “old and cold” rows
Result: more and more DB2 GETPAGEs are required to retrieve the same result sets, and more GETPAGEs means
more CPU
Data retention’s impact: application performance
• Even for DB2 table with continuously-ascending clustering key (so newer rows are concentrated at
“end” of table), growth means larger indexes, and that means more CPU
• A larger index has more levels, leading to more GETPAGEs
• DB2 utilities that process indexes (such as REORG and RUNSTATS) may become more expensive to run

• Storing years of historical data on the high-end disk subsystems typically used with z
Systems can cost a lot of $$$
• A cost-reducing alternative – storing historical data offline, on tape, has its own problems
• No dynamic query access – data requested for analysis might be restored to disk
overnight, available next day
• Even then, likely that only a subset of data-on-tape would be restored at any given
time
• Is there a better way?
Yes – several of them!!
Data retention’s impact: data storage costs

Non DBMS
Retention Platform
ATA File Server
EMC Centera
IBM RS550
HDS
Compressed
Archives
Offline
Retention Platform
CD
Tape
Optical
Compressed
Archives
Production
Database
Archive
Definitions
Archive
Restore
Archive
Database
Compressed
Archives
Online
Archive
5-6 years
Offline
Archive
7+ years
Current
Data
1-2 years
Active
Historical
3-4 years

DB2 Temporal Tables – Time Travel Query
• One of the major improvements since DB2 10
• The ability for the database to reduce the complexity and amount of coding needed to
implement “versioned” data, data that has different values at different points in time.
• Data that you need to keep a record of for any given point in time
• Data that you may need to look at for the past, current or future situation
• The ability to support history or auditing queries
• Business Time & System time

Temporal Concepts
• Business Time (Effective Dates, Valid Time, From/To-dates)
– Every row has a pair of TIMESTAMP or DATE columns set by Application
• Begin time : when the business deems the row valid
• End Time : when the business deems row validity ends
– Constraint created to ensure Begin time < End time
– Query at current, any prior, or future point/period in business time
• System Time (Assertion Dates, Knowledge Dates, Transaction Time, Audit Time, In/Out-
dates)
– Every row has a pair of TIMESTAMP columns set by DBMS
• Begin time : when the row was inserted in the DBMS
• End Time : when the row was modified/deleted
– Every base row has a Transaction Start ID timestamp
– Query at current or any prior point/period in system time
• Times are inclusive for start time and exclusive for end times

DB2 Temporal Tables - History Generation
• Concept of period (SYSTEM_TIME and BUSINESS_TIME periods)
• A period is represented by a pair of datetime columns in DB2 relations, one column stores start
time, the other one stores end time
• SYSTEM_TIME period captures DB2’s creation and deletion of records. DB2 SYSTEM_TIME
versioning automatically keeps historical versions of records
• BUSINESS_TIME period allows users to create their own valid period for a given record. Users
maintain the valid times for a record.
• Temporal tables: System-period Temporal Table (STT), Application-period Temporal Table (ATT)
• Business value
• It helps meet compliance requirements
• It performs better
• It is easier to manage compared to home-grown solutions

Row Maintenance with System Time – History Generation
* T1: INSERT Row A
* T2: UPDATE Row A
* T3: UPDATE Row A
* T4: DELETE Row A
* T5: INSERT Row A
Row A1:T1-T2Row A1:T1-HVRow A2:T2-HVRow A3:T3-HVRow A4:T5-HV
Base Table History Table
* Notes:
– INSERT has no History Table impact
– The first UPDATE begins a lineage for Row A.
• History Table ST End = Base Table ST Begin (No gap)
• The Base Table ST End is always High Values (HV)
– The second UPDATE deepens the lineage
• No gaps exist across all generations of Row A.
– The DELETE adds to the lineage in the History Table.
• There is no current row (Base Table) after the DELETE
– The second INSERT begins a new row lineage
• There is a gap between the History Table rows and the Base Table
– If all of the above statements happen in the same UOW, there would be no History Table rows
Row A2:T2-T3
Row A1:T1-T2
Row A3:T3-T4
Row A2:T2-T3
Row A1:T1-T2

Sep 2008
Audit
HistoryCurrent
Aug 2008
Jul 2008
History
Generation
SQL using
current data
SQL using
ASOF
Transparent/automatic access to satisfy ASOF
Queries
History table contains version of every update on a single row
DB2 Temporal Tables - History Generation

Temporal auditing
Track which SQL operation caused modification
− Also: who modified data
− Usage not restricted to DB2 temporal
• GENERATED ALWAYS AS ... can also be defined for non-temporal tables
ACCOUNT_ID BALANCE USER OP_CODE SYS_START SYS_END
Table BANK_ACC_STT
GENERATED ALWAYS AS (SESSION_USER)
Also special registers such as
• CURRENT CLIENT_USERID
• CURRENT SQLID
• CURRENT CLIENT_ACCTNG ...
CHAR(1) GENERATED ALWAYS AS
(DATA CHANGE OPERATION)

Temporal auditing -
example
 User JOE inserts entry for ACCOUNT_ID 56789
56789 1234.56 JOE I 2017-01-19 9999-12-30
BANK_ACC_STT
56789 88.77 DON U 2017-01-21 9999-12-30
 User DON updates this record
56789 1234.56 JOE I 2017-01-19 2017-01-21
BANK_ACC_HIST
56789 1234.56 JOE I 2017-01-19 2017-01-21
56789 88.77 DON U 2017-01-21 2017-02-15
56789 88.77 LAURA D 2017-02-15 2017-02-15
BANK_ACC_STT
 User LAURA deletes this record
BANK_ACC_STT
BANK_ACC_HIST
*
* Requires ON DELETE ADD EXTRA ROW in temporal DDL☼

• Both active and history tables with Timestamp(12) can be loaded to the Accelerator
System Time Temporal Query Routing with DB2 12 and IDAA
• Special query rewrite is applied for the following 3 temporal SQL:
• FOR SYSTEM_TIME AS OF expr
• FOR SYSTEM_TIME FROM expr1 TO expr2
• FOR SYSTEM_TIME BETWEEN expr1 AND expr2
• Queries on system temporal tables are routed to the Accelerator when ZPARM QUERY_ACCEL_OPTIONS
is set to 5
5: Allows to run accelerated queries against STT and bi-temporal tables
6: queries will be offloaded if the queries reference timestamp columns with precision of 12
• All existing offloading criteria have to be met

• Yes – it is a “historical” data retention option
• With system-time temporal, you are retaining data that was once, but is no longer, in effect
• Needs of the business determine which data retention approach is appropriate for a given
situation
• When data previously inserted in a table is changed (updated or deleted), is there a need to retain
a “before” image of a changed row, along with the “from” and “to” times of the row’s “in effect”
period?
• That’s what system-time temporal is for – it lets you see data that WAS current at some
prior point in time
Can system-time temporal be a form of archiving?

• Querying and managing tables that contain a large amount
of data is a common problem
• Maintaining for performance of a large table is a another pain point
Index or not?
• One known solution is to archive the inactive/cold data to a different
environment
• Challenges on the ease of use and performance
• How to access to both current and archived data within single query
• How to make data archiving and access “transparent” with minimum
application changes
Poor Application
Performance
Why DB2 Archive Transparency

1. DBA creates table (e.g., T1_AR) to be used as archive for table T1
2. DBA tells DB2 to enable archiving for T1, using archive table T1_AR
ALTER TABLE T1 ENABLE ARCHIVE USE T1_AR;
3. Program deletes to-be-archived rows from T1
• If program sets DB2 global variable SYSIBMADM.MOVE_TO_ARCHIVE to ‘Y’,
all it has to do is delete from T1 – DB2 will move deleted rows to T1_AR
• The value of a global variable affects only the DB2 thread for which it was set
4. Bind packages appropriately (bind option affects static and dynamic SQL)
• If a program will ALWAYS access ONLY the base table, it should be bound with
ARCHIVESENSITIVE(NO)
• If a program will SOMETIMES or ALWAYS access rows in the base table and the
associated archive table, it should be bound with ARCHIVESENSITIVE(YES)
• If program sets DB2 global variable SYSIBMADM.GET_ARCHIVE to ‘Y’, and
issues SELECT against base table, DB2 will automatically drive that SELECT
against associated archive table, too, and will merge results with UNION ALL
• So, with DB2-managed archiving, a program can retrieve data from an archive
table without having to reference the archive table
DB2-managed data archiving – how it’s done

• NOT the same thing as system time temporal data
• When versioning (system time) is activated for a table, the “before” images of rows made
“non-current” by update or delete are inserted into an associated history table
• With DB2-managed archiving, rows in an archive table are current in terms of validity –
they are just older than rows in the associated base table (if row age is the archive
criterion)
When most access is to rows recently inserted into a table, moving older rows to an
archive table can improve performance for newer-row retrieval
Particularly useful when data clustered by non-continuously-ascending key
DB2 users are already doing it for several years! – DB2 makes it easier
DB2-managed data archiving
Before DB2-managed
data archiving
After DB2-managed
data archiving Newer, more
“popular” rows
Older rows, less
frequently retrieved
Base table Archive table

DB2 Archive Transparency - History Generation
Sep 2008
ArchiveArchive-
enabled
Aug 2008
Jul 2008
Archive
@DELETE/
REORG
DISCARD
SQL using
current data
GET_ARCHIVE = 'Y';
SQL
Transparent/automatic
access to satisfy “GET_ARCHIVE” queries
History table contains version of every update on a single row
MOVE_TO_ARCHIVE =‘Y’| 'E';

DB2 Transparent archiving – What is new!
 Transparent archiving introduced with DB2 11
− Enable archiving of deleted rows in separate tables
− Similar to temporal / SYSTEM TIME
 New with DB2 12: new ZPARM to specify default value for MOVE_TO_ARCHIVE
global variable
− retrofitted to DB2 11 with APAR PI56767
 New with DB2 12: allow row change timestamp column to be part of
partitioning key
− can facilitate archiving of archive table to DB2 Analytics Accelerator (on partition basis)
− retrofitted to DB2 11 with APAR PI63830
 AND: optimizer improvements in DB2 12 (e.g. UNION ALL) with positive impact
on transparent archiving and temporal tables

• System-time temporal support and DB2-managed archiving cannot be activated for the same table
– use one or the other
• Key differences:
• System-time temporal
• Implemented with a base table and an associated history table
• Rows in the history table are NOT current – they are the “before” images of rows that were
made non-current by DELETE or UPDATE operations targeting the base table
• DB2-managed archiving
• Implemented with a base table and an associated archive table
• Rows in the archive table ARE current – they are just older than the rows in the base table
(assuming that age is the archive criterion)
DB2: temporal (system time) versus archive

IBM z Analytics
22
Query execution process flow
AcceleratorDRDARequestor
Application
Interface
Heartbeat
(availability and performance indicators)
Application
Optimizer
Query execution run-time for queries
that cannot be or should not be
routed to Accelerator
Heartbeat
Queries executed
with Accelerator
Queries executed
without Accelerator
Db2 Analytics Accelerator

IBM z Analytics
Introducing Accelerator-only table type in DB2 for z/OS
Creation (DDL) and access remains through DB2 for z/OS in all cases
Non-accelerator DB2 table
• Data in DB2 only
Accelerator-shadow table
• Data in DB2 and the Accelerator
Accelerator-archived table /
partition
• Empty read-only partition in DB2
• Partition data is in Accelerator only
Accelerator-only table (AOT)
• “Proxy table” in DB2
• Data is in Accelerator only
Table 1
Table 4
Table 3
Table 2Table 2
Table 4
Table 3
Db2 Analytics Accelerator

1. A base table and its associated archive table can be selected for acceleration (so both
tables will exist on both the front-end DB2 for z/OS system and the back-end Analytics
Accelerator)
2. The archive table can be partitioned, regardless of whether or not the base table is
partitioned (base and associated archive table only have to be logically – not physically
– identical)
3. If archive table is partitioned on a date basis (could require adding timestamp column to
base and archive tables), and if older rows are not updated, High-Performance Storage
Saver can be utilized
• In that case, large majority of archive table’s data would physically exist only on the
Analytics Accelerator
• Timestamp column, if added to base and archive tables to facilitate date-based
partitioning of archive table, can be defined as:
GENERATED ALWAYS FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP
Combining two solutions - DB2-managed archiving and IDAA
DB2 will generate a value when a row is moved from base to archive table

The archiving combination, in a picture
Front-end DB2 system
Base table T1
DB2 Analytics Accelerator
“Accelerated” table T1
…
…
Archive table T1_AR “Accelerated” table T1_AR
Week n-5*
Week n
Week n-1
Week n-2*
Week n-3*
Week n-4*
Most recent 3
months of data
Most recent 3
months of data
Week n-5
Week n
Week n-1
Week n-2
Week n-3
Week n-4
“Trickle-feed”
replication keeps
“accelerated” tables
within 1-2 minutes of
currency
* Older partitions exist only logically on front-end DB2
(In this example, base table holds 3 months of data, archive table is partitioned by week)

Combining History in DB2 and on the Accelerator
• Both active|archive-enabled and history|archive table need to be
accelerated to route SQL to IDAA
Active
tables
History tables
DB2
Accelerator
Active
tables History tables
archive tables
Archive-enabled
tables
Archive-enabled
tables
archive tables
SQL1 SQL2

• ETL Processing pattern
• Move data from original data source(s) through tools or custom transformation
programs to target DW/DM
• Typically, data is stored several times in intermittent staging areas
• Myth: main purpose for ETL
• To make data consumable for end users
• To optimize for performance (star schema)
• Merging and cleansing (making consistent)
• Reality: majority of the ETL processing is generating history data…
Combining solutions for ETL Modernization

CREATE TABLE T1 (...)
IN ACCELERATOR
ACC1;
INSERT INTO T1
SELECT ... FROM
CUST_TABLE_1 JOIN
TRANS_TABLE_1....
CREATE TABLE T2 (...)
IN ACCELERATOR
ACC1;
INSERT INTO T2
SELECT ... FROM
CUST_TABLE_2 JOIN
TRANS_TABLE_2....
Select ... FROM T1
JOIN T2....;
DROP TABLE T1;
DROP TABLE T2;
Accelerator-only tables store temporary results
during reporting process
Customer
Summary Mart
Credit Card
Transaction History
Set of tables CUST_TABLE_x,
TRANS_TABLE_x
Credit Card
History
Customer
Sum Mart
RoutingofCREATE,SELECTand
DROPstatements
ACC1
32
1
Data for analytical processing
Multi-Step
Report
Reporting Application
2
3
1
Reports and Dashboards
ETL with Accelerator-Only Tables

Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa

Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa

More Related Content

What's hot (20)

Similar to Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa (20)

More from Cuneyt Goksu (20)

Recently uploaded (20)

Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa