M|18 Querying Data at a Previous Point in Time

Querying Data
at a Previous Point in Time
Alexander Krizhanovsky
Tempesta Technologies, Inc.
ak@tempesta-tech.com

Who am I?
CEO & CTO at Tempesta Technologies
Develop Tempesta FW –
an open source hybrid of an HTTP accelerator and a firewall
●
Web accelerator, load balancer, DDoS mitigation & Web security
●
x3 faster than Nginx, 40% faster than a DPDK-based Web server
●
Linux kernel HTTPS/TCP/IP stack
https://netdevconf.org/2.1/session.html?krizhanovsky
Custom software development:
●
high performance network traffic processing
e.g. WAF mentioned in Gartner magic quadrant
●
Databases

MariaDB System Versioning
Commissioned by MariaDB Corporation

SQL System Versioning
SQL:2011
The database can store all versions of stored records
Applications:
●
Point-in-time recovery
●
Forensic discovery & legal requirements to store data for N years
●
Data analysis (retrospective, trends etc.)
MariaDB starting with 10.3.4
●
https://mariadb.com/kb/en/library/system-versioned-tables/

Keeping the history
t t t
+------+ update t set x=2; +------+ delete from t; +------+
| x | | x | | x |
+------+ +------+ +------+
| 1 | | 2 | | 2 |
+------+ | 1 | | 1 |
+------+ +------+

Keeping the history
t t t
+------+ update t set x=2; +------+ delete from t; +------+
| x | | x | | x |
+------+ +------+ +------+
| 1 | | 2 | | 2 |
+------+ | 1 | | 1 |
+------+ +------+
> select * from t;
Empty set (0.00 sec)

Getting the history
t t t t
+------+ trx_0 +------+ trx_1 +------+ ... trx_1000 +------+
| x | | x | | x | | x |
+------+ +------+ +------+ +------+
| 1 | | 2 | | 3 | | 1000 |
+------+ | 1 | | 1 | | 1 |
| | | 2 | | 2 |
+------+ +------+ | 3 |
TS0 TS1 ...
> select * from t +------+ +------+
for system_time between | 2 | AS OF TS0
timestamp TS0 and | 3 |
timestamp TS1; +------+

System Versioning vs Flashback
Flashback (since 10.2.4)
mysqlbinlog --fashback > dump.sql & mysql < d.sql
●
Pure binary log based point-in-time recovery mechanism
●
Typically to recover recent changes (low performance)
●
Multi-engine
●
No DDL
System Versioning
●
Efficient queries & MVCC-like data analysis
●
InnoDB & MyISAM fully supported; RocksDB, Aria must be tested
●
Designed to survive DDL (in progress)

Use cases
Temporal data processing
●
How a Sales Opportunity has fluctuated over time?
●
Mine clients activity changes during a particular period of time
●
Analyze trends in your staff changes
Forensic analysis & legal requirements to store data for N years.
●
Audit requires a financial institution to report on changes made to a
client's records during the past five years
●
A client inquiry reveals a data entry error involving the three-month
introductory interest rate on a credit card. The bank needs to
retroactively correct the error

Sense of System Versioning:
CREATE TABLE (SQL:2011)
> create table t(x int,
row_start timestamp(6) generated always as row start invisible,
row_end timestamp(6) generated always as row end invisible,
period for system_time(row_start, row_end)
) with system versioning;

Sense of System Versioning:
CREATE TABLE
> create table t(x int) with system versioning;

Sense of System Versioning
> insert into t values (1);
> set @ts = now(6);
> insert into t values (2);
> select * from t for system_time as of timestamp @ts;
+------+
| x |
+------+
| 1 |
+------+

Sense of BETWEEN
> insert into t values(1);
> set @t0 = now(6);
> update t set x = 2;
> set @t1 = now(6);
> delete from t;
> select *,row_start,row_end from t
for system_time between timestamp @ts0 and timestamp @ts1;
+------+----------------------------+----------------------------+
| x | row_start | row_end |
+------+----------------------------+----------------------------+
| 2 | 2018-02-23 18:11:44.017902 | 2018-02-23 18:11:53.634389 |
| 1 | 2018-02-23 18:06:57.559257 | 2018-02-23 18:11:44.017902 |
+------+----------------------------+----------------------------+

> select sleep(10);
> delete from t;
> insert into t
select * from t for system_time as of (now(6) - interval 10 second);
> select * from t;
+------+
| x |
+------+
| 1 |
+------+

SQL workaround:
a Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-design-
a-point-in-time-architecture/
INSERT: introduces column DateCreated
DELETE: no actual deletes, introduces column DateEnd
UPDATE: trigger
●
UPDATE DateEnd for old record
●
INSERT a new record
SELECT: additional WHERE clause by <DateCreated, DateEnd>

Point in Time Architecture
Issues
●
Application layer awareness
●
Timestamps only
●
Low performance
●
Too complex
●
Doesn’t survive DDLs

Solutions on the market
Mostly for point-in-time recovery
Doesn’t survive DDL
Oracle Flashback & IBM DB2
●
History tables are generated from undo log => limited time to live
●
Long history leads to performance issues
MS SQL Server
●
separate history tables

MariaDB System Versioning
Intended to survive DDL (for 2.0)
As engine independent as possibly
●
SQL layer: DML & Queries
●
InnoDB: transactional history (MVCC-like) only
No changes are required from an application
Standard dialect (what is defined)
Too many data (use partitioning for separate disks)

row_end in primary key
Historical records now can have the same PK values
+---+-----------+----------------------+
+---+-----------+----------------------+
| 1 | 5434 | 5437 | ← dead (history)
| 1 | 5437 | 18446744073709551615 |
+---+-----------+----------------------+
DELETE and UPDATE now always updates PK
PK constraints are always satisfied:
+---+-----------+----------------------+
+---+-----------+----------------------+
| 1 | 5434 | 18446744073709551615 | Wrong and imposible!
| 1 | 5437 | 18446744073709551615 |

Why timestamps aren’t enough?
Forensics discovery and debugging may need reliable answer
which transactions were visible for transaction X?
●
However have begin timestamp, commit timestamp...
Limited accuracy for many short concurrent transactions
●
OS doesn’t guarantee strictly monotonically increasing time
●
Different CPUs may have different time
●
MVCC operates with transaction IDs

Transactional System Versioning
(InnoDB only)
> create table t_trx(x int,
t0 bigint unsigned generated always as row start,
tx bigint unsigned generated always as row end,
period for system_time(t0, tx)
) with system versioning;
> insert into t_trx values(1);
> insert into t_trx values(2);
> select *,t0,tx from t_trx;
+------+------+----------------------+
| x | t0 | tx |
+------+------+----------------------+
| 1 | 4046 | 18446744073709551615 |
| 2 | 4049 | 18446744073709551615 |
+------+------+----------------------+

mysql.transaction_registry
Maps trx_id to timestamp (for transaction history only)
Updated on engine-independent layer through handler interface
Very large
Columns
●
transaction_id - transaction ID
●
commit_id – transaction commit ID (trx_id)
●
begin_timestamp – timestamp for beging of the transaction
●
commit_timestamp – timestamp for commit of the transaction
●
isolation_level – RR/S, RC/RU

Begin & commit transaction IDs
> select *,row_start,row_end from t for system_time all;
+---+-----------+----------------------+
+---+-----------+----------------------+
| 1 | 5583 | 18446744073709551615 |
+---+-----------+----------------------+
> select * from mysql.transaction_registry
where commit_timestamp > now(6) - interval 15 minute G
*************************** 1. row ***************************
transaction_id: 5583
commit_id: 5584
begin_timestamp: 2018-02-25 06:37:42.190825
commit_timestamp: 2018-02-25 06:37:42.191870
isolation_level: REPEATABLE-READ

Transaction history view
Uses trx_id only to provide MVCC-consistent AS OF view
Only works with InnoDB tables with transactional history
create function TRX_SEES(TRX_ID1 bigint unsigned, TRX_ID0 bigint unsigned)
returns bool
begin
declare COMMIT_ID1 bigint unsigned default VTQ_COMMIT_ID(TRX_ID1);
declare COMMIT_ID0 bigint unsigned default VTQ_COMMIT_ID(TRX_ID0);
declare ISO_LEVEL1 enum('RR', 'RC') default VTQ_ISO_LEVEL(TRX_ID1);
if TRX_ID1 > COMMIT_ID0 then
return true;
end if;
if COMMIT_ID1 > COMMIT_ID0 and ISO_LEVEL1 = 'RC' then
return true;
end if;
return false;
end

SELECT
JOIN::prepare, i.e. system versioning queries are optimized
Adds WHERE clause for time-related information
●
row_end = Inf for current data
transaction_registery is used to convert timestamps to trx_id

SELECT: track the rows
> select x, sys_trx_start as start, commit_id as commit,
sys_trx_end as end, begin_timestamp, commit_timestamp
from t for system_time all
join mysql.transaction_registry as vtq
on vtq.transaction_id = t.sys_trx_start
where x < 10;
+---+-------+--------+----------------------+----------------------------+----------------------------+
| x | start | commit | end | begin_timestamp | commit_timestamp |
+---+-------+--------+----------------------+----------------------------+----------------------------+
| 3 | 3033 | 3034 | 18446744073709551615 | 2017-04-12 01:05:55.861774 | 2017-04-12 01:05:55.864698 |
| 2 | 3026 | 3027 | 3033 | 2017-04-12 01:00:32.275002 | 2017-04-12 01:00:32.278337 |
| 1 | 3024 | 3025 | 3026 | 2017-04-12 01:00:23.585170 | 2017-04-12 01:00:23.596620 |
+---+-------+--------+----------------------+----------------------------+----------------------------+

Transactional System Versioning:
SELECT (syntax sugar)
-- standard syntax
> select *,t0,tx from t_trx for system_time as of transaction 4046;
+------+------+----------------------+
| x | t0 | tx |
+------+------+----------------------+
| 1 | 4046 | 18446744073709551615 |
+------+------+----------------------+
-- ...the same (where t0 > 4045 and t0 < 4048 also works)
> select *,t0,tx from t_trx where t0 = 4046;
+------+------+----------------------+
| x | t0 | tx |
+------+------+----------------------+
| 1 | 4046 | 18446744073709551615 |
+------+------+----------------------+

Select all historical records
> select x as dead_rows from t
for system_time all where row_end < now(6);
+-----------+
| dead_rows |
+-----------+
| 1 |
+-----------+

Range queries
> select *,row_start,row_end from t for system_time
between timestamp (now(6) - interval 1 month) and now(6);
+------+------+-----------+---------+
| x | y | row_start | row_end |
+------+------+-----------+---------+
| 7 | NULL | 2922 | 2938 |
+------+------+-----------+---------+

Range queries
between timestamp (now(6) - interval 1 month) and now(6);
+------+------+-----------+---------+
+------+------+-----------+---------+
| 7 | NULL | 2922 | 2938 |
+------+------+-----------+---------+
from transaction 2974 to transaction 2986;
+------+------+-----------+---------+
+------+------+-----------+---------+
| 44 | NULL | 2965 | 2986 |
+------+------+-----------+---------+

FROM...TO vs BETWEEN
between transaction 0 and transaction 3033;
+---+-----------+----------------------+
+---+-----------+----------------------+
| 1 | 3024 | 3026 |
| 2 | 3026 | 3033 |
| 3 | 3033 | 18446744073709551615 |
+---+-----------+----------------------+
from transaction 0 to transaction 3033;
+---+-----------+---------+
+---+-----------+---------+
| 1 | 3024 | 3026 |
| 2 | 3026 | 3033 |
+---+-----------+---------+
Required by the
standard
Might be useful to
know
Changes during a
period
state before a
disaster

Range queries: MyISAM
> select *,row_start,row_end from my_t
for system_time between timestamp 0 and timestamp now(6);
+---+----------------------------+----------------------------+
+---+----------------------------+----------------------------+
| 1 | 2017-04-12 00:10:47.099814 | 2038-01-19 06:14:07.000000 |
+---+----------------------------+----------------------------+
> select *,row_start,row_end from my_t
for system_time from transaction 0 to transaction 10000;
ERROR 4109 (HY000): Transaction system versioning for `my_t` is not
supported

INSERT
New record
●
row_start = current timestamp
●
row_end = 2038-01-19 06:14:07.999999
New record (transactional history):
●
row_start = trx_id
●
row_end = Inf

DELETE
UPDATE
Moves the record to history:
●
row_end = current timestamp | trx_id
(as of begin of the transaction)
Can not be used for historical data

UPDATE
UPDATE + INSERT
New history record:
●
Copy the record to history
●
row_end = current timestamp | trx_id
(as of begin of the transaction)
New record:
●
row_start = current timestamp | trx_id
●
row_end = Inf | 2038-01-19 06:14:07.999999

History partitioning
> create table t (x int) with system versioning
partition by system_time interval 1 month
subpartition by key(x) subpartitions 4 (
partition p0 history,
partition p1 history,
partition pnow current);
By time interval, limit number of records (e.g. limit 1000)
Partition pruning for history range
Another way to get all history records:
> select *,row_start,row_end from t partition(p0,p1);

History purging
> delete history from t before system_time '2018-02-23 21:36';
> delete history from t;
> alter table t drop partition p0;
> alter table t drop partition p1;
ERROR 4126 (HY000): Wrong partitions for `t`: must have at least one
HISTORY and exactly one last CURRENT

ALTER System Versioning
> create table t (x int);
> alter table t add system versioning;
> update t set x=2;
> alter table t drop system versioning;
-- historical data was dropped
> select * from t;
+------+
| x |
+------+
| 2 |
+------+

Per-column history
> create table t (x int) with system versioning;
> insert into t(x) values(1); update t set x=2;
> set @@system_versioning_alter_history='keep';
> alter table t add y int without system versioning;
> insert into t(x,y) values(3,3);
> update t set x=4;
> update t set y=5;
> select *,row_end from t for system_time all;
+------+------+----------------------------+
| x | y | row_end |
+------+------+----------------------------+
| 1 | NULL | 2018-02-24 16:20:30.323272 |
| 2 | NULL | 2018-02-24 16:22:08.685693 |
| 3 | 3 | 2018-02-24 16:22:08.685693 |
| 4 | 5 | 2038-01-19 06:14:07.999999 |
| 4 | 5 | 2038-01-19 06:14:07.999999 |
+------+------+----------------------------+

Foreign keys
> create table p (x int unique key);
> create table c (px int, foreign key(px) references p(x))
with system versioning;
> insert into p values(1);
> insert into c values(1);
> delete from c;
> delete from p;
> select * from c for system_time all;
+----+
| px |
+----+
| 1 |
+----+

Backups
Fully compatible with MariaDB Backup
Dump & restore lose the history

Further extensions
DDL survival (in progress)
https://github.com/tempesta-tech/mariadb/milestone/15
Audit plugin:
https://github.com/tempesta-tech/mariadb/issues/138
Other storage engines – need to test
Application-time period tables (?)

DDL survival
TBD: https://github.com/tempesta-tech/mariadb/wiki/DDL-Survival
In progress: persistent history (tables renaming)
Versioned Tracking Metadata table (VTMD) table:
●
trx_id_start - transaction which generated a table
●
trx_id_end - transaction, which generated a new version
●
original_name - original name of the table before the transaction
trx_id_start
●
new_name - new name of the table
●
col_renames - blob with new to old column name mappings
Multi-schema SELECT

Application-time period tables
(we’re open for requests)
> create table emp(id int, d_start date, d_end date, dept varchar(30),
e_period for period(d_start, d_end));
> insert into emp values (1, '2016-01-01', '2038-01-19', 'sales');
> update emp
for portion of e_period from date '2017-03-15' to date '2017-07-15'
set dept = 'engineering' where id = 1;
+----+-------------+------------+--------------+
| id | d_start | d_end | dept |
+----+-------------+------------+--------------+
| 1 | 2016-01-01 | 2017-03-15 | sales |
| 1 | 2017-03-15 | 2017-07-15 | engineering |
| 1 | 2017-07-15 | 2038-01-19 | sales |
+----+-------------+------------+--------------+

Questions?
Thanks to:
●
MariaDB (request, discussions, review)
●
Alexey Midenkov
●
Eugene Kosov
E-mail: ak@tempesta-tech.com
Tempesta FW – the fastest and secure HTTP accelerator:
https://github.com/tempesta-tech/tempesta

Replication
Timestamp-based
●
SBR, RBR, Galera – as usual tables
Transaction-based (InnoDB)
●
SBR only
●
RBR for system versioned tables is automatically switched to SBR
(like mixed replication)

Cascade foreign keys
(https://jira.mariadb.org/browse/MDEV-15364)
> create table p (x int primary key);
> create table c (px int, foreign key (px) references p(x)
on delete cascade on update cascade)
with system versioning;
> insert into p values (1);
> insert into c values (1);
> update p set x = 2;
> select *,row_start,row_end from c for system_time all;
+------+----------------------------+----------------------------+
| px | row_start | row_end |
+------+----------------------------+----------------------------+
| 2 | 2018-02-25 01:31:59.070080 | 2038-01-19 06:14:07.999999 |
+------+----------------------------+----------------------------+

M|18 Querying Data at a Previous Point in Time

More Related Content

What's hot (12)

Similar to M|18 Querying Data at a Previous Point in Time (20)

More from MariaDB plc (20)

Recently uploaded (20)

M|18 Querying Data at a Previous Point in Time