How to migrate_to_sharding_with_spider

How to migrate to sharding with Spider
Kentoku SHIBA

1. What is SPIDER?
2. Why SPIDER? what SPIDER can do for you?
3. How to migrate to sharding using Replication
4. How to migrate to sharding using Trigger
5. How to migrate to sharding using Spider function
6. How to migrate to sharding
using Vertical Partitioning Storage Engine
Agenda

What is the Spider Storage Engine?
Spider is a sharding solution and proxying
solution. Spider Storage Engine is a
plugin of MariaDB/MySQL. Spider tables
can be used to federate from other servers
MariaDB/MySQL/OracleDB tables as if they
stand on local server. And Spider can
create database sharding by using table
partitioning feature.

1.request
2. Execute SQL
4.response
AP
All databases can be used as ONE database through Spider.
APAP AP AP
SPIDER
（MariaDB/MySQL）
MariaDB
tbl_a
MySQL
tbl_b
SPIDER
（MariaDB/MySQL）
SPIDER
（MariaDB/MySQL）
OracleDB
tbl_c
3. Distributed SQL3. Distributed SQL 3. Distributed SQL

Spider is bundled in MariaDB
from 10.0 and all patches for MariaDB is
applied in 10.3

Why SPIDER?
What SPIDER can do for you?

Why Spider? What Spider can do for you?
For federation
You can attach tables from other servers or
from local server by using Spider.
For sharding
You can divide huge tables and huge
traffics to multiple servers by using Spider.

Cross shard join
You can join all tables by using Spider,
even if tables are on different servers.

simple
sharding
solution
Join operation with simple sharding solution (without Spider)
DB1
tbl_a1
1.Request
2. Execute SQL with JOIN
3.Response
DB2
AP
Join operation requires that all joined tables are on same
server.
APAP AP AP
tbl_a2tbl_b1 tbl_b2

Join operation with Spider
1.request
3.response
AP
You can JOIN all tables, even if tables are on different servers.
APAP AP AP
SPIDER
（MariaDB/MySQL）
DB1
tbl_a1
DB2
tbl_a2tbl_b1 tbl_b2

Join push down
If it is possible, Spider executes JOIN
operation at data node directly.

JOIN push down
1.request
3.response
AP
If all tables are on same data node, Spider executes JOIN
operation on data node directly.
APAP AP AP
SPIDER
（MariaDB/MySQL）
DB1
tbl_a
DB2
tbl_ctbl_b tbl_d

JOIN push down
Simple join operation are two times faster
on simple JOIN pushdown test.
Also, in this pushdown of JOIN, when
aggregate functions are included in the
query, since the aggregation processing is
also executed at the data node, the amount
of data transfer is greatly reduced and it
becomes super high speed.

using Replication

Initial Structure
There is 1 MariaDB server without Spider.
DB1
tbl_a
Create table tbl_a (
col_a int,
col_b int,
primary key(col_a)
) engine = InnoDB;

Step 1 (for migrating)
Create table on DB3 and DB4.
Then create Spider table on DB2.
DB1
tbl_a
DB3
tbl_a
col_a%2=1col_a%2=0
DB2
DB4
tbl_a
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
Connection ‘
table “tbl_a”,
user “user”,
password “pass”
‘
partition by list(
mod(col_a, 2)) (
partition pt1 values in(0)
comment ‘host “DB3”’,
comment ‘host “DB4”’
);
tbl_a

Step 2
DB1
tbl_a
DB3
tbl_a
col_a%2=1col_a%2=0
DB2
DB4
tbl_a
Copy table data from DB1 to DB2.
(Use mysqldump with “--master-data = 1 or 2” option)
tbl_a

Step 3
Start replication from DB1 to DB2.
Wait for resolving replication delay.
DB1
tbl_a
DB3
tbl_a
col_a%2=1col_a%2=0
DB2
DB4
tbl_a
tbl_a
replication

Step 4
Stop client access for DB1.
Wait for resolving replication delay.
Switch client access from DB1 to DB2.
DB1
tbl_a
DB3
tbl_a
col_a%2=1col_a%2=0
DB2
DB4
tbl_a
tbl_a
replication

Finish
Stop replication on DB2.
Remove DB1.
DB3
tbl_a
col_a%2=1col_a%2=0
DB2
DB4
tbl_a
tbl_a

Pros and Cons of Replication way
Pros
1. No need to manage lock size for coping.
2. Support non primary key table.
Cons
1. Need to stop writing.

using Trigger

Initial Structure
DB1
tbl_aCreate table tbl_a (
col_a int,
col_b int,
primary key(col_a)
) engine = InnoDB;

Then create Spider table on DB1.
DB1
tbl_a
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_a2
Create table tbl_a2 (
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
Connection ‘
table “tbl_a”,
user “user”,
password “pass”
‘
partition by list(
mod(col_a, 2)) (
);

Step 2
Create triggers on DB1.
(For copying insert, update and delete. If you use “truncate” for tbl_a, you should better to use
other way)
DB1
tbl_a
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_a2
delimiter |
create trigger tbl_a_i after insert
on tbl_a for each row
insert into tbl_a2 (a,b) values
(new.a, new.b);
|
create trigger tbl_a_u after update
update tbl_a2 set a = new.a,
b = new.b
where a = old.a;
|
create trigger tbl_a_d after delete
delete from tbl_a2 where a = old.a;
|
delimiter ;

Step 3
DB1
tbl_a
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_a2
Insert select from tbl_a to tbl_a2.
(Please take care of locking time for tbl_a and tbl_a2.)

Step 4
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
Rename table from tbl_a2 to tbl_a.
Rename table tbl_a to tbl_a3,
tbl_a2 to tbl_a;
DB1
tbl_a3
tbl_a

Finish
DB1 DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_atbl_a
Drop table tbl_a3.

Pros and Cons of Trigger way
Pros
1. No need to stop services.
2. Easy to copy.(Simple command)
Cons
1. Impossible to support truncate.
2. Need to manage lock size at coping.
3. Impossible to support non primary key.

using Spider function

Step 2
Create tables on DB1.
DB1
tbl_a
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_a2Create table tbl_a4 (
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
Connection ‘
host “localhost”
table “tbl_a3 tbl_a2”,
lst “0 2”,
user “user”,
password “pass”
‘;
tbl_a4
tbl_a3
col_a int,
col_b int,
primary key(col_a)
) engine = InnoDB;

Step 3
Rename table on DB1.
DB1
tbl_a3
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_a2
Rename table tbl_a3 to tbl_a5,
tbl_a to tbl_a3, tbl_a4 to tbl_a;
tbl_a
tbl_a5

Step 4
Copy data on DB1.
DB1
tbl_a3
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_a2
Select
spider_copy_table(‘tbl_a’, ‘’, ‘’);
tbl_a
tbl_a5

Step 5
Rename table on DB1.
DB1
tbl_a3
DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_aRename table tbl_a2 to tbl_a6,
tbl_a5 to tbl_a2, tbl_a to tbl_a7,
tbl_a6 to tbl_a;
tbl_a7
tbl_a2

Finish
DB1 DB2
tbl_a
col_a%2=1col_a%2=0
DB3
tbl_atbl_a
Drop table tbl_a2, tbl_a3 and tbl_a7.

Pros and Cons of Spider function way
Pros
2. Easy to copy.(Simple command. Lock
size is managed by Spider)
Cons

using Vertical Partitioning Storage Engine

Initial Structure
DB1
tbl_a
col_a int,
col_b int,
primary key(col_a),
key idx2(col_b)
) engine = InnoDB;

Then create tables on DB1.
DB1
tbl_a
DB2
col_a%2=1col_a%2=0
DB3
Create table tbl_pk (
col_a int,
primary key(col_a)
) engine = Spider
Connection ‘
table “tbl_pk”,
user “user”,
password “pass”
‘
partition by list(
mod(col_a, 2)) (
);
tbl_pk
tbl_pk tbl_pk

Step 2
Then create tables on DB1.
DB1
tbl_a
DB2
col_a%2=1col_a%2=0
DB3
col_a int,
col_b int,
key idx1(col_a),
key idx2(col_b)
) engine = Spider
Connection ‘
table “tbl_a2”,
user “user”,
password “pass”
‘
partition by list(
mod(col_b, 2)) (
);
tbl_pk
tbl_a2
tbl_pk tbl_pk
DB4
col_b%2=1col_b%2=0
DB5
tbl_a2 tbl_a2

Step 3
Create tables on DB1.
DB1
tbl_a
DB2
col_a%2=1col_a%2=0
DB3
col_a int,
col_b int,
primary key(col_a),
key idx2(col_b)
) engine = VP
Comment ‘
ctm “1”,
ist “1”,
pcm “1”,
tnl “tbl_a4 tbl_pk tbl_a2”
‘;
tbl_a3
tbl_pk
tbl_a2
tbl_pk tbl_pk
DB4
col_b%2=1col_b%2=0
DB5
tbl_a2 tbl_a2
col_a int,
col_b int,
primary key(col_a)
) engine = InnoDB;
tbl_a4

Step 4
Rename tables on DB1.
DB1
DB2
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_pk
tbl_a2
tbl_pk tbl_pk
DB4
col_b%2=1col_b%2=0
DB5
tbl_a2 tbl_a2
Rename table tbl_a4 to tbl_a5,
tbl_a to tbl_a4, tbl_a3 to tbl_a;
tbl_a5
tbl_a4

Step 5
Copy data from tbl_a4 to tbl_pk and tbl_a2 on DB1.
DB1
DB2
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_pk
tbl_a2
tbl_pk tbl_pk
DB4
col_b%2=1col_b%2=0
DB5
tbl_a2 tbl_a2
Select vp_copy_tables(‘table_a’,
‘tbl_a4’, ‘tbl_pk tbl_a2’);
tbl_a5
tbl_a4

Step 6
Alter table tbl_a on DB1.
DB1
DB2
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_pk
tbl_a2
tbl_pk tbl_pk
DB4
col_b%2=1col_b%2=0
DB5
tbl_a2 tbl_a2Alter table tbl_a
comment ‘
pcm “1”,
tnl “tbl_pk tbl_a2”
‘;
tbl_a5
tbl_a4

Finish
Drop table tbl_a on DB1.
DB1
DB2
col_a%2=1col_a%2=0
DB3
tbl_a
tbl_pk
tbl_a2
tbl_pk tbl_pk
DB4
col_b%2=1col_b%2=0
DB5
tbl_a2 tbl_a2
Drop table tbl_a4, tbl_a5;

Pros and Cons of VP way
Pros
2. Support spiltting by non unique columns.
3. Easy to copy.(Simple command. Lock
size is managed by VP)
Cons
1. VP storage engine is required.

Thank you for
taking your
time!!

How to migrate_to_sharding_with_spider

More Related Content

What's hot (20)

Similar to How to migrate_to_sharding_with_spider (20)

More from Kentoku (20)

Recently uploaded (20)

How to migrate_to_sharding_with_spider