SlideShare a Scribd company logo
Querying Data
at a Previous Point in Time
Alexander Krizhanovsky
Tempesta Technologies, Inc.
ak@tempesta-tech.com
Who am I?
CEO & CTO at Tempesta Technologies
Develop Tempesta FW –
an open source hybrid of an HTTP accelerator and a firewall
●
Web accelerator, load balancer, DDoS mitigation & Web security
●
x3 faster than Nginx, 40% faster than a DPDK-based Web server
●
Linux kernel HTTPS/TCP/IP stack
https://netdevconf.org/2.1/session.html?krizhanovsky
Custom software development:
●
high performance network traffic processing
e.g. WAF mentioned in Gartner magic quadrant
●
Databases
MariaDB System Versioning
Commissioned by MariaDB Corporation
SQL System Versioning
SQL:2011
The database can store all versions of stored records
Applications:
●
Point-in-time recovery
●
Forensic discovery & legal requirements to store data for N years
●
Data analysis (retrospective, trends etc.)
MariaDB starting with 10.3.4
●
https://mariadb.com/kb/en/library/system-versioned-tables/
Keeping the history
t t t
+------+ update t set x=2; +------+ delete from t; +------+
| x | | x | | x |
+------+ +------+ +------+
| 1 | | 2 | | 2 |
+------+ | 1 | | 1 |
+------+ +------+
Keeping the history
t t t
+------+ update t set x=2; +------+ delete from t; +------+
| x | | x | | x |
+------+ +------+ +------+
| 1 | | 2 | | 2 |
+------+ | 1 | | 1 |
+------+ +------+
> select * from t;
Empty set (0.00 sec)
Getting the history
t t t t
+------+ trx_0 +------+ trx_1 +------+ ... trx_1000 +------+
| x | | x | | x | | x |
+------+ +------+ +------+ +------+
| 1 | | 2 | | 3 | | 1000 |
+------+ | 1 | | 1 | | 1 |
| | | 2 | | 2 |
+------+ +------+ | 3 |
TS0 TS1 ...
> select * from t +------+ +------+
for system_time between | 2 | AS OF TS0
timestamp TS0 and | 3 |
timestamp TS1; +------+
System Versioning vs Flashback
Flashback (since 10.2.4)
mysqlbinlog --fashback > dump.sql & mysql < d.sql
●
Pure binary log based point-in-time recovery mechanism
●
Typically to recover recent changes (low performance)
●
Multi-engine
●
No DDL
System Versioning
●
Efficient queries & MVCC-like data analysis
●
InnoDB & MyISAM fully supported; RocksDB, Aria must be tested
●
Designed to survive DDL (in progress)
Use cases
Temporal data processing
●
How a Sales Opportunity has fluctuated over time?
●
Mine clients activity changes during a particular period of time
●
Analyze trends in your staff changes
Forensic analysis & legal requirements to store data for N years.
●
Audit requires a financial institution to report on changes made to a
client's records during the past five years
Point-in-time recovery
●
A client inquiry reveals a data entry error involving the three-month
introductory interest rate on a credit card. The bank needs to
retroactively correct the error
Sense of System Versioning:
CREATE TABLE (SQL:2011)
> create table t(x int,
row_start timestamp(6) generated always as row start invisible,
row_end timestamp(6) generated always as row end invisible,
period for system_time(row_start, row_end)
) with system versioning;
Sense of System Versioning:
CREATE TABLE
> create table t(x int) with system versioning;
Sense of System Versioning
> create table t(x int) with system versioning;
> insert into t values (1);
> set @ts = now(6);
> insert into t values (2);
> select * from t for system_time as of timestamp @ts;
+------+
| x |
+------+
| 1 |
+------+
Sense of BETWEEN
> create table t(x int) with system versioning;
> insert into t values(1);
> set @t0 = now(6);
> update t set x = 2;
> set @t1 = now(6);
> delete from t;
> select *,row_start,row_end from t
for system_time between timestamp @ts0 and timestamp @ts1;
+------+----------------------------+----------------------------+
| x | row_start | row_end |
+------+----------------------------+----------------------------+
| 2 | 2018-02-23 18:11:44.017902 | 2018-02-23 18:11:53.634389 |
| 1 | 2018-02-23 18:06:57.559257 | 2018-02-23 18:11:44.017902 |
+------+----------------------------+----------------------------+
Point-in-time recovery
> create table t(x int) with system versioning;
> insert into t values(1);
> select sleep(10);
> delete from t;
> insert into t
select * from t for system_time as of (now(6) - interval 10 second);
> select * from t;
+------+
| x |
+------+
| 1 |
+------+
SQL workaround:
a Point in Time Architecture
https://www.simple-talk.com/sql/database-administration/database-design-
a-point-in-time-architecture/
INSERT: introduces column DateCreated
DELETE: no actual deletes, introduces column DateEnd
UPDATE: trigger
●
UPDATE DateEnd for old record
●
INSERT a new record
SELECT: additional WHERE clause by <DateCreated, DateEnd>
Point in Time Architecture
Issues
●
Application layer awareness
●
Timestamps only
●
Low performance
●
Too complex
●
Doesn’t survive DDLs
Solutions on the market
Mostly for point-in-time recovery
Doesn’t survive DDL
Oracle Flashback & IBM DB2
●
History tables are generated from undo log => limited time to live
●
Long history leads to performance issues
MS SQL Server
●
separate history tables
MariaDB System Versioning
Intended to survive DDL (for 2.0)
As engine independent as possibly
●
SQL layer: DML & Queries
●
InnoDB: transactional history (MVCC-like) only
No changes are required from an application
Standard dialect (what is defined)
Too many data (use partitioning for separate disks)
System versioned tables
New invisible columns
●
row_start - transaction ID which created the row
●
row_end - transaction ID when the row died
> create table t(x int primary key,
row_start timestamp(6) generated always as row start invisible,
row_end timestamp(6) generated always as row end invisible,
period for system_time(row_start, row_end)) with system versioning;
> desc t;
+-----------+--------------+------+-----+---------+-----------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-----------+
| x | int(11) | NO | PRI | NULL | |
| row_start | timestamp(6) | NO | | NULL | INVISIBLE |
| row_end | timestamp(6) | NO | PRI | NULL | INVISIBLE |
row_end in primary key
Historical records now can have the same PK values
+---+-----------+----------------------+
| x | row_start | row_end |
+---+-----------+----------------------+
| 1 | 5434 | 5437 | ← dead (history)
| 1 | 5437 | 18446744073709551615 |
+---+-----------+----------------------+
DELETE and UPDATE now always updates PK
PK constraints are always satisfied:
+---+-----------+----------------------+
| x | row_start | row_end |
+---+-----------+----------------------+
| 1 | 5434 | 18446744073709551615 | Wrong and imposible!
| 1 | 5437 | 18446744073709551615 |
Why timestamps aren’t enough?
Forensics discovery and debugging may need reliable answer
which transactions were visible for transaction X?
●
However have begin timestamp, commit timestamp...
Limited accuracy for many short concurrent transactions
●
OS doesn’t guarantee strictly monotonically increasing time
●
Different CPUs may have different time
●
MVCC operates with transaction IDs
Transactional System Versioning
(InnoDB only)
> create table t_trx(x int,
t0 bigint unsigned generated always as row start,
tx bigint unsigned generated always as row end,
period for system_time(t0, tx)
) with system versioning;
> insert into t_trx values(1);
> insert into t_trx values(2);
> select *,t0,tx from t_trx;
+------+------+----------------------+
| x | t0 | tx |
+------+------+----------------------+
| 1 | 4046 | 18446744073709551615 |
| 2 | 4049 | 18446744073709551615 |
+------+------+----------------------+
mysql.transaction_registry
Maps trx_id to timestamp (for transaction history only)
Updated on engine-independent layer through handler interface
Very large
Columns
●
transaction_id - transaction ID
●
commit_id – transaction commit ID (trx_id)
●
begin_timestamp – timestamp for beging of the transaction
●
commit_timestamp – timestamp for commit of the transaction
●
isolation_level – RR/S, RC/RU
Begin & commit transaction IDs
> select *,row_start,row_end from t for system_time all;
+---+-----------+----------------------+
| x | row_start | row_end |
+---+-----------+----------------------+
| 1 | 5583 | 18446744073709551615 |
+---+-----------+----------------------+
> select * from mysql.transaction_registry
where commit_timestamp > now(6) - interval 15 minute G
*************************** 1. row ***************************
transaction_id: 5583
commit_id: 5584
begin_timestamp: 2018-02-25 06:37:42.190825
commit_timestamp: 2018-02-25 06:37:42.191870
isolation_level: REPEATABLE-READ
Transaction history view
Uses trx_id only to provide MVCC-consistent AS OF view
Only works with InnoDB tables with transactional history
create function TRX_SEES(TRX_ID1 bigint unsigned, TRX_ID0 bigint unsigned)
returns bool
begin
declare COMMIT_ID1 bigint unsigned default VTQ_COMMIT_ID(TRX_ID1);
declare COMMIT_ID0 bigint unsigned default VTQ_COMMIT_ID(TRX_ID0);
declare ISO_LEVEL1 enum('RR', 'RC') default VTQ_ISO_LEVEL(TRX_ID1);
if TRX_ID1 > COMMIT_ID0 then
return true;
end if;
if COMMIT_ID1 > COMMIT_ID0 and ISO_LEVEL1 = 'RC' then
return true;
end if;
return false;
end
SELECT
JOIN::prepare, i.e. system versioning queries are optimized
Adds WHERE clause for time-related information
●
row_end = Inf for current data
transaction_registery is used to convert timestamps to trx_id
SELECT: track the rows
> select x, sys_trx_start as start, commit_id as commit,
sys_trx_end as end, begin_timestamp, commit_timestamp
from t for system_time all
join mysql.transaction_registry as vtq
on vtq.transaction_id = t.sys_trx_start
where x < 10;
+---+-------+--------+----------------------+----------------------------+----------------------------+
| x | start | commit | end | begin_timestamp | commit_timestamp |
+---+-------+--------+----------------------+----------------------------+----------------------------+
| 3 | 3033 | 3034 | 18446744073709551615 | 2017-04-12 01:05:55.861774 | 2017-04-12 01:05:55.864698 |
| 2 | 3026 | 3027 | 3033 | 2017-04-12 01:00:32.275002 | 2017-04-12 01:00:32.278337 |
| 1 | 3024 | 3025 | 3026 | 2017-04-12 01:00:23.585170 | 2017-04-12 01:00:23.596620 |
+---+-------+--------+----------------------+----------------------------+----------------------------+
Transactional System Versioning:
SELECT (syntax sugar)
-- standard syntax
> select *,t0,tx from t_trx for system_time as of transaction 4046;
+------+------+----------------------+
| x | t0 | tx |
+------+------+----------------------+
| 1 | 4046 | 18446744073709551615 |
+------+------+----------------------+
-- ...the same (where t0 > 4045 and t0 < 4048 also works)
> select *,t0,tx from t_trx where t0 = 4046;
+------+------+----------------------+
| x | t0 | tx |
+------+------+----------------------+
| 1 | 4046 | 18446744073709551615 |
+------+------+----------------------+
Select all historical records
> select x as dead_rows from t
for system_time all where row_end < now(6);
+-----------+
| dead_rows |
+-----------+
| 1 |
+-----------+
Range queries
> select *,row_start,row_end from t for system_time
between timestamp (now(6) - interval 1 month) and now(6);
+------+------+-----------+---------+
| x | y | row_start | row_end |
+------+------+-----------+---------+
| 7 | NULL | 2922 | 2938 |
+------+------+-----------+---------+
Range queries
> select *,row_start,row_end from t for system_time
between timestamp (now(6) - interval 1 month) and now(6);
+------+------+-----------+---------+
| x | y | row_start | row_end |
+------+------+-----------+---------+
| 7 | NULL | 2922 | 2938 |
+------+------+-----------+---------+
> select *,row_start,row_end from t for system_time
from transaction 2974 to transaction 2986;
+------+------+-----------+---------+
| x | y | row_start | row_end |
+------+------+-----------+---------+
| 44 | NULL | 2965 | 2986 |
+------+------+-----------+---------+
FROM...TO vs BETWEEN
> select *,row_start,row_end from t for system_time
between transaction 0 and transaction 3033;
+---+-----------+----------------------+
| x | row_start | row_end |
+---+-----------+----------------------+
| 1 | 3024 | 3026 |
| 2 | 3026 | 3033 |
| 3 | 3033 | 18446744073709551615 |
+---+-----------+----------------------+
> select *,row_start,row_end from t for system_time
from transaction 0 to transaction 3033;
+---+-----------+---------+
| x | row_start | row_end |
+---+-----------+---------+
| 1 | 3024 | 3026 |
| 2 | 3026 | 3033 |
+---+-----------+---------+
Required by the
standard
Might be useful to
know
Changes during a
period
state before a
disaster
Range queries: MyISAM
> select *,row_start,row_end from my_t
for system_time between timestamp 0 and timestamp now(6);
+---+----------------------------+----------------------------+
| x | row_start | row_end |
+---+----------------------------+----------------------------+
| 1 | 2017-04-12 00:10:47.099814 | 2038-01-19 06:14:07.000000 |
+---+----------------------------+----------------------------+
> select *,row_start,row_end from my_t
for system_time from transaction 0 to transaction 10000;
ERROR 4109 (HY000): Transaction system versioning for `my_t` is not
supported
INSERT
New record
●
row_start = current timestamp
●
row_end = 2038-01-19 06:14:07.999999
New record (transactional history):
●
row_start = trx_id
●
row_end = Inf
DELETE
UPDATE
Moves the record to history:
●
row_end = current timestamp | trx_id
(as of begin of the transaction)
Can not be used for historical data
UPDATE
UPDATE + INSERT
New history record:
●
Copy the record to history
●
row_end = current timestamp | trx_id
(as of begin of the transaction)
New record:
●
row_start = current timestamp | trx_id
●
row_end = Inf | 2038-01-19 06:14:07.999999
History partitioning
> create table t (x int) with system versioning
partition by system_time interval 1 month
subpartition by key(x) subpartitions 4 (
partition p0 history,
partition p1 history,
partition pnow current);
By time interval, limit number of records (e.g. limit 1000)
Partition pruning for history range
Another way to get all history records:
> select *,row_start,row_end from t partition(p0,p1);
History purging
> delete history from t before system_time '2018-02-23 21:36';
> delete history from t;
> alter table t drop partition p0;
> alter table t drop partition p1;
ERROR 4126 (HY000): Wrong partitions for `t`: must have at least one
HISTORY and exactly one last CURRENT
ALTER System Versioning
> create table t (x int);
> insert into t values(1);
> alter table t add system versioning;
> update t set x=2;
> alter table t drop system versioning;
-- historical data was dropped
> select * from t;
+------+
| x |
+------+
| 2 |
+------+
Per-column history
> create table t (x int) with system versioning;
> insert into t(x) values(1); update t set x=2;
> set @@system_versioning_alter_history='keep';
> alter table t add y int without system versioning;
> insert into t(x,y) values(3,3);
> update t set x=4;
> update t set y=5;
> select *,row_end from t for system_time all;
+------+------+----------------------------+
| x | y | row_end |
+------+------+----------------------------+
| 1 | NULL | 2018-02-24 16:20:30.323272 |
| 2 | NULL | 2018-02-24 16:22:08.685693 |
| 3 | 3 | 2018-02-24 16:22:08.685693 |
| 4 | 5 | 2038-01-19 06:14:07.999999 |
| 4 | 5 | 2038-01-19 06:14:07.999999 |
+------+------+----------------------------+
Foreign keys
> create table p (x int unique key);
> create table c (px int, foreign key(px) references p(x))
with system versioning;
> insert into p values(1);
> insert into c values(1);
> delete from c;
> delete from p;
> select * from c for system_time all;
+----+
| px |
+----+
| 1 |
+----+
Backups
Fully compatible with MariaDB Backup
Dump & restore lose the history
Further extensions
DDL survival (in progress)
https://github.com/tempesta-tech/mariadb/milestone/15
Audit plugin:
https://github.com/tempesta-tech/mariadb/issues/138
Other storage engines – need to test
https://github.com/tempesta-tech/mariadb/issues/323
https://github.com/tempesta-tech/mariadb/issues/345
Application-time period tables (?)
DDL survival
TBD: https://github.com/tempesta-tech/mariadb/wiki/DDL-Survival
In progress: persistent history (tables renaming)
Versioned Tracking Metadata table (VTMD) table:
●
trx_id_start - transaction which generated a table
●
trx_id_end - transaction, which generated a new version
●
original_name - original name of the table before the transaction
trx_id_start
●
new_name - new name of the table
●
col_renames - blob with new to old column name mappings
Multi-schema SELECT
Application-time period tables
(we’re open for requests)
> create table emp(id int, d_start date, d_end date, dept varchar(30),
e_period for period(d_start, d_end));
> insert into emp values (1, '2016-01-01', '2038-01-19', 'sales');
> update emp
for portion of e_period from date '2017-03-15' to date '2017-07-15'
set dept = 'engineering' where id = 1;
+----+-------------+------------+--------------+
| id | d_start | d_end | dept |
+----+-------------+------------+--------------+
| 1 | 2016-01-01 | 2017-03-15 | sales |
| 1 | 2017-03-15 | 2017-07-15 | engineering |
| 1 | 2017-07-15 | 2038-01-19 | sales |
+----+-------------+------------+--------------+
Questions?
Thanks to:
●
MariaDB (request, discussions, review)
●
Alexey Midenkov
●
Eugene Kosov
E-mail: ak@tempesta-tech.com
Tempesta FW – the fastest and secure HTTP accelerator:
https://github.com/tempesta-tech/tempesta
Replication
Timestamp-based
●
SBR, RBR, Galera – as usual tables
Transaction-based (InnoDB)
●
SBR only
●
RBR for system versioned tables is automatically switched to SBR
(like mixed replication)
Cascade foreign keys
(https://jira.mariadb.org/browse/MDEV-15364)
> create table p (x int primary key);
> create table c (px int, foreign key (px) references p(x)
on delete cascade on update cascade)
with system versioning;
> insert into p values (1);
> insert into c values (1);
> update p set x = 2;
> select *,row_start,row_end from c for system_time all;
+------+----------------------------+----------------------------+
| px | row_start | row_end |
+------+----------------------------+----------------------------+
| 2 | 2018-02-25 01:31:59.070080 | 2038-01-19 06:14:07.999999 |
+------+----------------------------+----------------------------+

More Related Content

PPT
Optimizer Enhancement in Informix
PPT
Day2
PDF
Training Slides: 104 - Basics - Working With Command Line Tools
PDF
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
PDF
Metrics with Ganglia
PDF
Mysql56 replication
PDF
Kernel Recipes 2017: Performance Analysis with BPF
PPTX
Linux Process & CF scheduling
Optimizer Enhancement in Informix
Day2
Training Slides: 104 - Basics - Working With Command Line Tools
Training Slides: Intermediate 201: Single and Multi-Site Tungsten Clustering ...
Metrics with Ganglia
Mysql56 replication
Kernel Recipes 2017: Performance Analysis with BPF
Linux Process & CF scheduling

What's hot (12)

PPT
Ganglia monitoring
PPT
Verilog hdl design examples
PDF
Monitoring with Ganglia
PPT
Ganglia Monitoring Tool
PPTX
Full PPT Stack
PDF
Protocol Independence
PPTX
Debugging linux issues with eBPF
PPTX
HPC Examples
PDF
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
DOC
H3C Config
PDF
Introduction to HTTP/2 and How To Use It
PDF
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
Ganglia monitoring
Verilog hdl design examples
Monitoring with Ganglia
Ganglia Monitoring Tool
Full PPT Stack
Protocol Independence
Debugging linux issues with eBPF
HPC Examples
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
H3C Config
Introduction to HTTP/2 and How To Use It
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
Ad

Similar to M|18 Querying Data at a Previous Point in Time (20)

PDF
Advanced Query Optimizer Tuning and Analysis
PDF
Streaming ETL - from RDBMS to Dashboard with KSQL
PPTX
A few things about the Oracle optimizer - 2013
PDF
How to Avoid Pitfalls in Schema Upgrade with Galera
PDF
MariaDB Temporal Tables
PDF
Performance Schema for MySQL Troubleshooting
PPTX
Cruel (SQL) Intentions
PDF
Advance MySQL Training by Pratyush Majumdar
PDF
Performance Schema for MySQL Troubleshooting
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
PPTX
5 Cool Things About SQL
PDF
16 MySQL Optimization #burningkeyboards
PPT
Informix Warehouse Accelerator (IWA) features in version 12.1
PPT
11thingsabout11g 12659705398222 Phpapp01
PPT
11 Things About11g
PDF
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
PDF
Window functions in MySQL 8.0
PDF
Hash join use memory optimization
PDF
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Advanced Query Optimizer Tuning and Analysis
Streaming ETL - from RDBMS to Dashboard with KSQL
A few things about the Oracle optimizer - 2013
How to Avoid Pitfalls in Schema Upgrade with Galera
MariaDB Temporal Tables
Performance Schema for MySQL Troubleshooting
Cruel (SQL) Intentions
Advance MySQL Training by Pratyush Majumdar
Performance Schema for MySQL Troubleshooting
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
5 Cool Things About SQL
16 MySQL Optimization #burningkeyboards
Informix Warehouse Accelerator (IWA) features in version 12.1
11thingsabout11g 12659705398222 Phpapp01
11 Things About11g
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Window functions in MySQL 8.0
Hash join use memory optimization
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
PDF
MariaDB München Roadshow - 24 September, 2024
PDF
MariaDB Paris Roadshow - 19 September 2024
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
PDF
MariaDB Paris Workshop 2023 - Newpharma
PDF
MariaDB Paris Workshop 2023 - Cloud
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
PDF
MariaDB Paris Workshop 2023 - MaxScale
PDF
MariaDB Paris Workshop 2023 - novadys presentation
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
PDF
Hochverfügbarkeitslösungen mit MariaDB
PDF
Die Neuheiten in MariaDB Enterprise Server
PDF
Global Data Replication with Galera for Ansell Guardian®
PDF
Introducing workload analysis
PDF
Under the hood: SkySQL monitoring
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB München Roadshow - 24 September, 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
Einführung : MariaDB Tech und Business Update Hamburg 2023
Hochverfügbarkeitslösungen mit MariaDB
Die Neuheiten in MariaDB Enterprise Server
Global Data Replication with Galera for Ansell Guardian®
Introducing workload analysis
Under the hood: SkySQL monitoring

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Foundation of Data Science unit number two notes
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Lecture1 pattern recognition............
PPTX
Computer network topology notes for revision
PDF
annual-report-2024-2025 original latest.
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
IBA_Chapter_11_Slides_Final_Accessible.pptx
ISS -ESG Data flows What is ESG and HowHow
Foundation of Data Science unit number two notes
Business Ppt On Nestle.pptx huunnnhhgfvu
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Supervised vs unsupervised machine learning algorithms
Miokarditis (Inflamasi pada Otot Jantung)
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Lecture1 pattern recognition............
Computer network topology notes for revision
annual-report-2024-2025 original latest.
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Fluorescence-microscope_Botany_detailed content
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

M|18 Querying Data at a Previous Point in Time

  • 1. Querying Data at a Previous Point in Time Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com
  • 2. Who am I? CEO & CTO at Tempesta Technologies Develop Tempesta FW – an open source hybrid of an HTTP accelerator and a firewall ● Web accelerator, load balancer, DDoS mitigation & Web security ● x3 faster than Nginx, 40% faster than a DPDK-based Web server ● Linux kernel HTTPS/TCP/IP stack https://netdevconf.org/2.1/session.html?krizhanovsky Custom software development: ● high performance network traffic processing e.g. WAF mentioned in Gartner magic quadrant ● Databases
  • 3. MariaDB System Versioning Commissioned by MariaDB Corporation
  • 4. SQL System Versioning SQL:2011 The database can store all versions of stored records Applications: ● Point-in-time recovery ● Forensic discovery & legal requirements to store data for N years ● Data analysis (retrospective, trends etc.) MariaDB starting with 10.3.4 ● https://mariadb.com/kb/en/library/system-versioned-tables/
  • 5. Keeping the history t t t +------+ update t set x=2; +------+ delete from t; +------+ | x | | x | | x | +------+ +------+ +------+ | 1 | | 2 | | 2 | +------+ | 1 | | 1 | +------+ +------+
  • 6. Keeping the history t t t +------+ update t set x=2; +------+ delete from t; +------+ | x | | x | | x | +------+ +------+ +------+ | 1 | | 2 | | 2 | +------+ | 1 | | 1 | +------+ +------+ > select * from t; Empty set (0.00 sec)
  • 7. Getting the history t t t t +------+ trx_0 +------+ trx_1 +------+ ... trx_1000 +------+ | x | | x | | x | | x | +------+ +------+ +------+ +------+ | 1 | | 2 | | 3 | | 1000 | +------+ | 1 | | 1 | | 1 | | | | 2 | | 2 | +------+ +------+ | 3 | TS0 TS1 ... > select * from t +------+ +------+ for system_time between | 2 | AS OF TS0 timestamp TS0 and | 3 | timestamp TS1; +------+
  • 8. System Versioning vs Flashback Flashback (since 10.2.4) mysqlbinlog --fashback > dump.sql & mysql < d.sql ● Pure binary log based point-in-time recovery mechanism ● Typically to recover recent changes (low performance) ● Multi-engine ● No DDL System Versioning ● Efficient queries & MVCC-like data analysis ● InnoDB & MyISAM fully supported; RocksDB, Aria must be tested ● Designed to survive DDL (in progress)
  • 9. Use cases Temporal data processing ● How a Sales Opportunity has fluctuated over time? ● Mine clients activity changes during a particular period of time ● Analyze trends in your staff changes Forensic analysis & legal requirements to store data for N years. ● Audit requires a financial institution to report on changes made to a client's records during the past five years Point-in-time recovery ● A client inquiry reveals a data entry error involving the three-month introductory interest rate on a credit card. The bank needs to retroactively correct the error
  • 10. Sense of System Versioning: CREATE TABLE (SQL:2011) > create table t(x int, row_start timestamp(6) generated always as row start invisible, row_end timestamp(6) generated always as row end invisible, period for system_time(row_start, row_end) ) with system versioning;
  • 11. Sense of System Versioning: CREATE TABLE > create table t(x int) with system versioning;
  • 12. Sense of System Versioning > create table t(x int) with system versioning; > insert into t values (1); > set @ts = now(6); > insert into t values (2); > select * from t for system_time as of timestamp @ts; +------+ | x | +------+ | 1 | +------+
  • 13. Sense of BETWEEN > create table t(x int) with system versioning; > insert into t values(1); > set @t0 = now(6); > update t set x = 2; > set @t1 = now(6); > delete from t; > select *,row_start,row_end from t for system_time between timestamp @ts0 and timestamp @ts1; +------+----------------------------+----------------------------+ | x | row_start | row_end | +------+----------------------------+----------------------------+ | 2 | 2018-02-23 18:11:44.017902 | 2018-02-23 18:11:53.634389 | | 1 | 2018-02-23 18:06:57.559257 | 2018-02-23 18:11:44.017902 | +------+----------------------------+----------------------------+
  • 14. Point-in-time recovery > create table t(x int) with system versioning; > insert into t values(1); > select sleep(10); > delete from t; > insert into t select * from t for system_time as of (now(6) - interval 10 second); > select * from t; +------+ | x | +------+ | 1 | +------+
  • 15. SQL workaround: a Point in Time Architecture https://www.simple-talk.com/sql/database-administration/database-design- a-point-in-time-architecture/ INSERT: introduces column DateCreated DELETE: no actual deletes, introduces column DateEnd UPDATE: trigger ● UPDATE DateEnd for old record ● INSERT a new record SELECT: additional WHERE clause by <DateCreated, DateEnd>
  • 16. Point in Time Architecture Issues ● Application layer awareness ● Timestamps only ● Low performance ● Too complex ● Doesn’t survive DDLs
  • 17. Solutions on the market Mostly for point-in-time recovery Doesn’t survive DDL Oracle Flashback & IBM DB2 ● History tables are generated from undo log => limited time to live ● Long history leads to performance issues MS SQL Server ● separate history tables
  • 18. MariaDB System Versioning Intended to survive DDL (for 2.0) As engine independent as possibly ● SQL layer: DML & Queries ● InnoDB: transactional history (MVCC-like) only No changes are required from an application Standard dialect (what is defined) Too many data (use partitioning for separate disks)
  • 19. System versioned tables New invisible columns ● row_start - transaction ID which created the row ● row_end - transaction ID when the row died > create table t(x int primary key, row_start timestamp(6) generated always as row start invisible, row_end timestamp(6) generated always as row end invisible, period for system_time(row_start, row_end)) with system versioning; > desc t; +-----------+--------------+------+-----+---------+-----------+ | Field | Type | Null | Key | Default | Extra | +-----------+--------------+------+-----+---------+-----------+ | x | int(11) | NO | PRI | NULL | | | row_start | timestamp(6) | NO | | NULL | INVISIBLE | | row_end | timestamp(6) | NO | PRI | NULL | INVISIBLE |
  • 20. row_end in primary key Historical records now can have the same PK values +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5434 | 5437 | ← dead (history) | 1 | 5437 | 18446744073709551615 | +---+-----------+----------------------+ DELETE and UPDATE now always updates PK PK constraints are always satisfied: +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5434 | 18446744073709551615 | Wrong and imposible! | 1 | 5437 | 18446744073709551615 |
  • 21. Why timestamps aren’t enough? Forensics discovery and debugging may need reliable answer which transactions were visible for transaction X? ● However have begin timestamp, commit timestamp... Limited accuracy for many short concurrent transactions ● OS doesn’t guarantee strictly monotonically increasing time ● Different CPUs may have different time ● MVCC operates with transaction IDs
  • 22. Transactional System Versioning (InnoDB only) > create table t_trx(x int, t0 bigint unsigned generated always as row start, tx bigint unsigned generated always as row end, period for system_time(t0, tx) ) with system versioning; > insert into t_trx values(1); > insert into t_trx values(2); > select *,t0,tx from t_trx; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | | 2 | 4049 | 18446744073709551615 | +------+------+----------------------+
  • 23. mysql.transaction_registry Maps trx_id to timestamp (for transaction history only) Updated on engine-independent layer through handler interface Very large Columns ● transaction_id - transaction ID ● commit_id – transaction commit ID (trx_id) ● begin_timestamp – timestamp for beging of the transaction ● commit_timestamp – timestamp for commit of the transaction ● isolation_level – RR/S, RC/RU
  • 24. Begin & commit transaction IDs > select *,row_start,row_end from t for system_time all; +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 5583 | 18446744073709551615 | +---+-----------+----------------------+ > select * from mysql.transaction_registry where commit_timestamp > now(6) - interval 15 minute G *************************** 1. row *************************** transaction_id: 5583 commit_id: 5584 begin_timestamp: 2018-02-25 06:37:42.190825 commit_timestamp: 2018-02-25 06:37:42.191870 isolation_level: REPEATABLE-READ
  • 25. Transaction history view Uses trx_id only to provide MVCC-consistent AS OF view Only works with InnoDB tables with transactional history create function TRX_SEES(TRX_ID1 bigint unsigned, TRX_ID0 bigint unsigned) returns bool begin declare COMMIT_ID1 bigint unsigned default VTQ_COMMIT_ID(TRX_ID1); declare COMMIT_ID0 bigint unsigned default VTQ_COMMIT_ID(TRX_ID0); declare ISO_LEVEL1 enum('RR', 'RC') default VTQ_ISO_LEVEL(TRX_ID1); if TRX_ID1 > COMMIT_ID0 then return true; end if; if COMMIT_ID1 > COMMIT_ID0 and ISO_LEVEL1 = 'RC' then return true; end if; return false; end
  • 26. SELECT JOIN::prepare, i.e. system versioning queries are optimized Adds WHERE clause for time-related information ● row_end = Inf for current data transaction_registery is used to convert timestamps to trx_id
  • 27. SELECT: track the rows > select x, sys_trx_start as start, commit_id as commit, sys_trx_end as end, begin_timestamp, commit_timestamp from t for system_time all join mysql.transaction_registry as vtq on vtq.transaction_id = t.sys_trx_start where x < 10; +---+-------+--------+----------------------+----------------------------+----------------------------+ | x | start | commit | end | begin_timestamp | commit_timestamp | +---+-------+--------+----------------------+----------------------------+----------------------------+ | 3 | 3033 | 3034 | 18446744073709551615 | 2017-04-12 01:05:55.861774 | 2017-04-12 01:05:55.864698 | | 2 | 3026 | 3027 | 3033 | 2017-04-12 01:00:32.275002 | 2017-04-12 01:00:32.278337 | | 1 | 3024 | 3025 | 3026 | 2017-04-12 01:00:23.585170 | 2017-04-12 01:00:23.596620 | +---+-------+--------+----------------------+----------------------------+----------------------------+
  • 28. Transactional System Versioning: SELECT (syntax sugar) -- standard syntax > select *,t0,tx from t_trx for system_time as of transaction 4046; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | +------+------+----------------------+ -- ...the same (where t0 > 4045 and t0 < 4048 also works) > select *,t0,tx from t_trx where t0 = 4046; +------+------+----------------------+ | x | t0 | tx | +------+------+----------------------+ | 1 | 4046 | 18446744073709551615 | +------+------+----------------------+
  • 29. Select all historical records > select x as dead_rows from t for system_time all where row_end < now(6); +-----------+ | dead_rows | +-----------+ | 1 | +-----------+
  • 30. Range queries > select *,row_start,row_end from t for system_time between timestamp (now(6) - interval 1 month) and now(6); +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 7 | NULL | 2922 | 2938 | +------+------+-----------+---------+
  • 31. Range queries > select *,row_start,row_end from t for system_time between timestamp (now(6) - interval 1 month) and now(6); +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 7 | NULL | 2922 | 2938 | +------+------+-----------+---------+ > select *,row_start,row_end from t for system_time from transaction 2974 to transaction 2986; +------+------+-----------+---------+ | x | y | row_start | row_end | +------+------+-----------+---------+ | 44 | NULL | 2965 | 2986 | +------+------+-----------+---------+
  • 32. FROM...TO vs BETWEEN > select *,row_start,row_end from t for system_time between transaction 0 and transaction 3033; +---+-----------+----------------------+ | x | row_start | row_end | +---+-----------+----------------------+ | 1 | 3024 | 3026 | | 2 | 3026 | 3033 | | 3 | 3033 | 18446744073709551615 | +---+-----------+----------------------+ > select *,row_start,row_end from t for system_time from transaction 0 to transaction 3033; +---+-----------+---------+ | x | row_start | row_end | +---+-----------+---------+ | 1 | 3024 | 3026 | | 2 | 3026 | 3033 | +---+-----------+---------+ Required by the standard Might be useful to know Changes during a period state before a disaster
  • 33. Range queries: MyISAM > select *,row_start,row_end from my_t for system_time between timestamp 0 and timestamp now(6); +---+----------------------------+----------------------------+ | x | row_start | row_end | +---+----------------------------+----------------------------+ | 1 | 2017-04-12 00:10:47.099814 | 2038-01-19 06:14:07.000000 | +---+----------------------------+----------------------------+ > select *,row_start,row_end from my_t for system_time from transaction 0 to transaction 10000; ERROR 4109 (HY000): Transaction system versioning for `my_t` is not supported
  • 34. INSERT New record ● row_start = current timestamp ● row_end = 2038-01-19 06:14:07.999999 New record (transactional history): ● row_start = trx_id ● row_end = Inf
  • 35. DELETE UPDATE Moves the record to history: ● row_end = current timestamp | trx_id (as of begin of the transaction) Can not be used for historical data
  • 36. UPDATE UPDATE + INSERT New history record: ● Copy the record to history ● row_end = current timestamp | trx_id (as of begin of the transaction) New record: ● row_start = current timestamp | trx_id ● row_end = Inf | 2038-01-19 06:14:07.999999
  • 37. History partitioning > create table t (x int) with system versioning partition by system_time interval 1 month subpartition by key(x) subpartitions 4 ( partition p0 history, partition p1 history, partition pnow current); By time interval, limit number of records (e.g. limit 1000) Partition pruning for history range Another way to get all history records: > select *,row_start,row_end from t partition(p0,p1);
  • 38. History purging > delete history from t before system_time '2018-02-23 21:36'; > delete history from t; > alter table t drop partition p0; > alter table t drop partition p1; ERROR 4126 (HY000): Wrong partitions for `t`: must have at least one HISTORY and exactly one last CURRENT
  • 39. ALTER System Versioning > create table t (x int); > insert into t values(1); > alter table t add system versioning; > update t set x=2; > alter table t drop system versioning; -- historical data was dropped > select * from t; +------+ | x | +------+ | 2 | +------+
  • 40. Per-column history > create table t (x int) with system versioning; > insert into t(x) values(1); update t set x=2; > set @@system_versioning_alter_history='keep'; > alter table t add y int without system versioning; > insert into t(x,y) values(3,3); > update t set x=4; > update t set y=5; > select *,row_end from t for system_time all; +------+------+----------------------------+ | x | y | row_end | +------+------+----------------------------+ | 1 | NULL | 2018-02-24 16:20:30.323272 | | 2 | NULL | 2018-02-24 16:22:08.685693 | | 3 | 3 | 2018-02-24 16:22:08.685693 | | 4 | 5 | 2038-01-19 06:14:07.999999 | | 4 | 5 | 2038-01-19 06:14:07.999999 | +------+------+----------------------------+
  • 41. Foreign keys > create table p (x int unique key); > create table c (px int, foreign key(px) references p(x)) with system versioning; > insert into p values(1); > insert into c values(1); > delete from c; > delete from p; > select * from c for system_time all; +----+ | px | +----+ | 1 | +----+
  • 42. Backups Fully compatible with MariaDB Backup Dump & restore lose the history
  • 43. Further extensions DDL survival (in progress) https://github.com/tempesta-tech/mariadb/milestone/15 Audit plugin: https://github.com/tempesta-tech/mariadb/issues/138 Other storage engines – need to test https://github.com/tempesta-tech/mariadb/issues/323 https://github.com/tempesta-tech/mariadb/issues/345 Application-time period tables (?)
  • 44. DDL survival TBD: https://github.com/tempesta-tech/mariadb/wiki/DDL-Survival In progress: persistent history (tables renaming) Versioned Tracking Metadata table (VTMD) table: ● trx_id_start - transaction which generated a table ● trx_id_end - transaction, which generated a new version ● original_name - original name of the table before the transaction trx_id_start ● new_name - new name of the table ● col_renames - blob with new to old column name mappings Multi-schema SELECT
  • 45. Application-time period tables (we’re open for requests) > create table emp(id int, d_start date, d_end date, dept varchar(30), e_period for period(d_start, d_end)); > insert into emp values (1, '2016-01-01', '2038-01-19', 'sales'); > update emp for portion of e_period from date '2017-03-15' to date '2017-07-15' set dept = 'engineering' where id = 1; +----+-------------+------------+--------------+ | id | d_start | d_end | dept | +----+-------------+------------+--------------+ | 1 | 2016-01-01 | 2017-03-15 | sales | | 1 | 2017-03-15 | 2017-07-15 | engineering | | 1 | 2017-07-15 | 2038-01-19 | sales | +----+-------------+------------+--------------+
  • 46. Questions? Thanks to: ● MariaDB (request, discussions, review) ● Alexey Midenkov ● Eugene Kosov E-mail: ak@tempesta-tech.com Tempesta FW – the fastest and secure HTTP accelerator: https://github.com/tempesta-tech/tempesta
  • 47. Replication Timestamp-based ● SBR, RBR, Galera – as usual tables Transaction-based (InnoDB) ● SBR only ● RBR for system versioned tables is automatically switched to SBR (like mixed replication)
  • 48. Cascade foreign keys (https://jira.mariadb.org/browse/MDEV-15364) > create table p (x int primary key); > create table c (px int, foreign key (px) references p(x) on delete cascade on update cascade) with system versioning; > insert into p values (1); > insert into c values (1); > update p set x = 2; > select *,row_start,row_end from c for system_time all; +------+----------------------------+----------------------------+ | px | row_start | row_end | +------+----------------------------+----------------------------+ | 2 | 2018-02-25 01:31:59.070080 | 2038-01-19 06:14:07.999999 | +------+----------------------------+----------------------------+