SlideShare a Scribd company logo
©Continuent 2013
Tungsten University:
Load a Vertica Data
Warehouse with MySQL Data
Robert Hodges
CEO, Continuent
©Continuent 2013
Introducing Continuent
2
• The leading provider of clustering and
replication for open source DBMS
• Our Product: Continuent Tungsten
• Clustering - Commercial-grade HA, performance
scaling and data management for MySQL
• Replication - Flexible, high-performance data
movement
©Continuent 2013
OLTP and Data Warehouse
Fundamentals
3
©Continuent 2013
The Contenders
4
Popular open
source RDBMS
for transaction
processing
Popular closed
source RDBMS
for analytics
©Continuent 2013
Storage Layout in MySQL
5
id cust_id prod_id ...
1 335301 532 ...
2 2378 6235 ...
3 ... ... ...
Sales Table
id sku type
532 C00135 consumer
533 S09957 specialty
... ...
Product Table
prod_id id
532 1
6235 2
... ...
Prod_ID Index
Row format
makes table
scans very
slow
Indexes slow
OLTP
Low/no data
compression
Limited
index
types
Limited
join
types
©Continuent 2013
Storage Layout in Vertica
6
Sales Table
cust_id
335301
2378
...
prod_id
532
6235
...
Fast scans
on columns
Updates to single
rows are
hideously slow
quantity
1
3
...
id
1
2
3
Every column
is an index
Good
compression
id
532
533
...
sku
C00135
S09957
...
type
consumer
specialty
...
Product Table
Fast joins
with parallel
query
©Continuent 2013
Traditional ETL Problems
7
MySQL
Sales
Table
Sales
Table
LoadTransferExtract
Date columns = intrusive
Batch-oriented = not timely
Scan for changes = performance hit
©Continuent 2013
Questions for Real-Time Loading
• Do I need to transform data and if so how?
• Do I need to clean up bad information?
• Do I need to process UPDATE/DELETE too?
• Do I need to load from multiple sources?
• How timely do loads need to be?
• What if something fails?
8
©Continuent 2013
Tungsten Replicator Basics
9
©Continuent 2013
Real-Time Data Replication
10
MySQL
Sales
Table
Sales
Table
Fast propagation = timely
No SQL changes = transparent
Automatic change capture = low impact
DBMS
Logs
Data
Replication
©Continuent 2013
Tungsten Master/Slave in Action
11
Master
(Transactions + Metadata)
Slave
THL
DBMS
Logs
Replicator
(Transactions + Metadata)
THLReplicator
Download
transactions
via network
Apply using JDBC
©Continuent 2013
Pipelines with Parallel Apply
12
Extract Filter Apply
Stage
Extract Filter Apply
Stage
Stage
Pipeline
Remote
Master
Transaction
History Log
Parallel
Queue
Slave
DBMS
Extract Filter Apply
Extract Filter Apply
Extract Filter Apply
(Assign
Shard ID)
©Continuent 2013
Real-Time Batch Loading
13
MySQL Tungsten Master
Replicator
Service my2vr
MySQLExtractor
Special Filters
* pkey - Fill in pkey info
* colnames - Fill in names
* replicate - Ignore tables
binlog_format=row
Tungsten Slave
Replicator
Service my2vr
MySQL
Binlog
CSV
Files
CSV
Files
CSV
Files
CSV
Files
CSV
Files
Large transaction
batches to leverage
load parallelization
Single transactions
from OLTP
operations
©Continuent 2013
Batch Loading--The Gory Details
14
Replicator
Service my2vr
Transactions
from master
CSV
Files
CSV
Files
CSV
Files
Staging
Tables
Staging
Tables
Staging
Tables
Base
Tables
Base
Tables
Base
Tables
Merge
Script
(or)
COPY
directly to
base tables
COPY to
stage tables SELECT to
base tables
©Continuent 2013
Setting Up MySQL to Vertica
Replication
15
©Continuent 2013
DEMO
16
MySQL toVertica replication
with some bells and a whistle
MySQL
db01
db02
db03
db01
renamed02
X
sysbench
sysbench
sysbench
©Continuent 2013
Get the Code
wget --no-check-certificate https://s3.amazonaws.com/
files.continuent.com/builds/nightly/tungsten-2.0-snapshots/
tungsten-replicator-2.1.0-285.tar.gz
tar -xf tungsten-replicator-2.1.0-285.tar.gz
cd tungsten-replicator-2.1.0-285
17
©Continuent 2013
Installing MySQL Master
18
tools/tungsten-installer --master-slave -a 
--service-name=mysql2vertica 
--master-host=mysql1 
--cluster-hosts=mysql1 
--datasource-user=tungsten 
--datasource-password=secret 
--home-directory=/opt/continuent 
--buffer-size=100 
--java-file-encoding=UTF8 
--java-user-timezone=GMT 
--mysql-use-bytes-for-string=false 
--svc-extractor-filters=replicate,colnames,pkey 
--property=replicator.filter.pkey.addPkeyToInserts=true 
--property=replicator.filter.pkey.addColumnsToDeletes=true 
--property=replicator.filter.replicate.do=db01.*,db02.* 
--start-and-report
©Continuent 2013
Installing Vertica Slave
19
$ tools/tungsten-installer --master-slave -a 
--service-name=mysql2vertica 
--home-directory=/opt/continuent 
--cluster-hosts=vertica1 
--master-host=mysql1 
--datasource-type=vertica 
--datasource-user=dbadmin 
--datasource-password=secret 
--datasource-port=5433 
--batch-enabled=true
--batch-load-template=vertica6 
--vertica-dbname=bigdata 
--java-user-timezone=GMT 
--java-file-encoding=UTF8 
--svc-applier-filters=dbtransform 
--property=replicator.filter.dbtransform.from_regex1=db02 
--property=replicator.filter.dbtransform.to_regex1=renamed02 
--property=replicator.stage.q-to-dbms.blockCommitRowCount=25000 
--start-and-report
©Continuent 2013
Generate Schema Using ddlscan
20
•Data types?
•Column lengths?
•Naming conventions?
•Staging tables?
MySQLTables
ddlscan
©Continuent 2013
Tungsten ddlscan Utility
cd /opt/continuent/tungsten/tungsten-replicator/bin
# Base table generation.
./ddlscan -template ddl-mysql-vertica.vm 
-db db01 -user tungsten -pass secret >> ddl.sql
# Staging table generation
./ddlscan -template ddl-mysql-vertica-staging.vm 
-db db01 -user tungsten -pass secret >> ddl.sql
# Load into Vertica
vsql -Udbadmin -wsecret < ddl.sql
21
©Continuent 2013
Checking Status
# Checking status on master
trepctl -host logos1 heartbeat
trepctl -host logos1 status
# Checking status on slave
trepctl -host vertica1 status
# Checking detailed performance of apply task.
trepctl -host vertica1 status -name tasks
22
©Continuent 2013
Application Tips and Tricks
23
©Continuent 2013
Application Design Practices
24
• Primary keys on all tables
• (Tungsten requires single column keys)
• Clean schema design *really* helps
• UTF-8 character set--or at least be consistent
• Use GMT timezone--or be very consistent
about dates
• Use row replication on MySQL master
©Continuent 2013
Transforming Data -- Replicator Filters
25
• Tables to ignore/include?
• Schema/table/column renaming?
• Map names to upper/lower case?
• Drop data?
tungsten-installer --master-slave -a 
--service-name=mysql2vertica 
...
--svc-extractor-filters=pkey,colnames,replicate 
--property=replicator.filter.replicate.do=db01.*,db02.*
...
©Continuent 2013
List of Commonly Used Filters
26
• CDC -- Transform log to record of changes
• colnames -- Add column names
• dbtransform -- Change db name only
• enumtostring -- Make MySQL enums a string
• pkey -- Add primary key metadata
• rename -- Rename db/table/column
• replicate -- Replicate/don’t replicate tables
• zerodate2null -- Make MySQL ‘0’ dates null
©Continuent 2013
Transforming Data -- Staging Server(s)
27
OLTP
Servers
Staging
Server with
Triggers/SQL
Vertica
Cluster
©Continuent 2013
Transforming Data -- Merge Script Hacks
28
# Hacked load script for Vertica--deletes always precede inserts, so
# inserts can load directly.
# Extract deleted data keys and put in temp CSV file for deletes.
!egrep '^"D",' %%CSV_FILE%% |cut -d, -f4 > %%CSV_FILE%%.delete
COPY %%STAGE_TABLE_FQN%% FROM '%%CSV_FILE%%.delete'
DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
# Delete rows using an IN clause. You could also set a column value to
# mark deleted rows.
DELETE FROM %%BASE_TABLE%% WHERE %%BASE_PKEY%% IN
(SELECT %%STAGE_PKEY%% FROM %%STAGE_TABLE_FQN%%)
# Load inserts directly into base table from a separate CSV file.
!egrep '^"I",' %%CSV_FILE%% |cut -d, -f4- > %%CSV_FILE%%.insert
COPY %%BASE_TABLE%% FROM '%%CSV_FILE%%.insert'
DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
©Continuent 2013
Provisioning -- Using CSV
29
mysql> SELECT * from sales INTO
OUTFILE ‘sales.csv’;
...
(Fix up data if necessary)
...
vsql> COPY sales FROM 'sales.csv'
DIRECT NULL 'null'
DELIMITER ',' ENCLOSED BY '"';
©Continuent 2013
Provisioning Using a Sandbox Server
30
OLTP
Server
Temporary
Sandbox Server
Vertica
Cluster
1. Restore
logical
backup
2. Replicate
restored
transactions
3. Replicate
normally after
restore loads
©Continuent 2013
Parallel Provisioning from Sandbox
31
OLTP
Server
Temporary
Sandbox Server
Vertica
Cluster
1. Restore
logical
backup
2. Replicate
restored data in
parallel
3. Replicate
normally after
restore loads
©Continuent 2013
Complex Topologies: Fan-In
32
Vertica
Cluster
logos1
Master
logos2
Master
logos2
Slave
Services
logos1
©Continuent 2013
Wrapping Up
33
©Continuent 2013
Tungsten University Sessions
34
• Load a Vertica Data Warehouse with MySQL
Data (May 30 10am PDT and June 4, 4pm CEST)
Send feedback to: tu@continuent.com
©Continuent 2012.
Continuent Web Page:
http://www.continuent.com
Tungsten Replicator 2.0:
http://code.google.com/p/tungsten-replicator
Our Blogs:
http://scale-out-blog.blogspot.com
http://!yingclusters.blogspot.com
http://datacharmer.org/blog
http://www.continuent.com/news/blogs
560 S. Winchester Blvd., Suite 500
San Jose, CA 95128
Tel +1 (866) 998-3642
Fax +1 (408) 668-1009
e-mail: sales@continuent.com

More Related Content

PDF
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
PDF
Set Up & Operate Real-Time Data Loading into Hadoop
PDF
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
PDF
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
PDF
Geographically Distributed Multi-Master MySQL Clusters
PDF
SQL Server 2016 database performance on the Dell PowerEdge R930 QLogic 16G Fi...
PDF
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
PDF
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures
Tungsten Use Case: How Gittigidiyor (a subsidiary of eBay) Replicates Data In...
Set Up & Operate Real-Time Data Loading into Hadoop
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
New VMware Continuent 5.0 - A powerful and cost-efficient Oracle GoldenGate a...
Geographically Distributed Multi-Master MySQL Clusters
SQL Server 2016 database performance on the Dell PowerEdge R930 QLogic 16G Fi...
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
M|18 Migrating from Oracle and Handling PL/SQL Stored Procedures

What's hot (20)

PDF
Set Up & Operate Tungsten Replicator
PDF
NY Meetup: Scaling MariaDB with Maxscale
PDF
Tungsten University: Configure & Provision Tungsten Clusters
PDF
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
PPTX
Eventually, Scylla Chooses Consistency
PDF
What's New in PostgreSQL 9.6
 
PDF
MariaDB and Cassandra Interoperability
PDF
Maxscale_메뉴얼
PDF
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
PDF
Building better Node.js applications on MariaDB
PDF
Introduction to MariaDB MaxScale
PDF
12cR2 Single-Tenant: Multitenant Features for All Editions
PDF
Oss4b - pxc introduction
PDF
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
PPTX
Getting innodb compression_ready_for_facebook_scale
PPTX
Lightweight Transactions at Lightning Speed
PDF
Using advanced options in MariaDB Connector/J
PDF
MySQL Performance Tuning Variables
KEY
Perf Tuning Short
PDF
Non-Relational Postgres
 
Set Up & Operate Tungsten Replicator
NY Meetup: Scaling MariaDB with Maxscale
Tungsten University: Configure & Provision Tungsten Clusters
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Eventually, Scylla Chooses Consistency
What's New in PostgreSQL 9.6
 
MariaDB and Cassandra Interoperability
Maxscale_메뉴얼
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Building better Node.js applications on MariaDB
Introduction to MariaDB MaxScale
12cR2 Single-Tenant: Multitenant Features for All Editions
Oss4b - pxc introduction
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Getting innodb compression_ready_for_facebook_scale
Lightweight Transactions at Lightning Speed
Using advanced options in MariaDB Connector/J
MySQL Performance Tuning Variables
Perf Tuning Short
Non-Relational Postgres
 
Ad

Similar to Tungsten University: Load A Vertica Data Warehouse With MySQL Data (20)

PDF
Replicating in Real-time from MySQL to Amazon Redshift
PDF
Real-Time Data Loading from MySQL to Hadoop
PDF
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
PDF
Keynote: Getting Serious about MySQL and Hadoop at Continuent
PDF
Tungsten University: Replicate Between MySQL And Oracle
PDF
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
PDF
Tungsten University: Setup and Operate Tungsten Replicators
PDF
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
PDF
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
PDF
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
PDF
Tungsten University: Setup & Operate Tungsten Replicator
PDF
Webinar Slides: Geo-Scale MySQL in AWS
PDF
Liberating Your Data From MySQL: Cross-Database Replication to the Rescue!
PDF
Webinar Slides: Multi-Master MySQL
PDF
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to analytics
PDF
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
PDF
Replication in real-time from Oracle and MySQL into data warehouses and analy...
PDF
Webinar Slides: Real-Time Analytics from MySQL
PDF
Juggle your data with Tungsten Replicator
Replicating in Real-time from MySQL to Amazon Redshift
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Tungsten University: Replicate Between MySQL And Oracle
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Tungsten University: Setup and Operate Tungsten Replicators
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Tungsten University: Setup & Operate Tungsten Replicator
Webinar Slides: Geo-Scale MySQL in AWS
Liberating Your Data From MySQL: Cross-Database Replication to the Rescue!
Webinar Slides: Multi-Master MySQL
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to analytics
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Webinar Slides: Real-Time Analytics from MySQL
Juggle your data with Tungsten Replicator
Ad

More from Continuent (20)

PDF
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
PDF
Continuent Tungsten Value Proposition Webinar
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
PDF
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
PDF
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
PDF
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
PPTX
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
PDF
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
PDF
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
PDF
Training Slides: 351 - Tungsten Replicator for Data Warehouses
PDF
Training Slides: 303 - Replicating out of a Cluster
PDF
Training Slides: 206 - Using the Tungsten Cluster AMI
PDF
Training Slides: 254 - Using the Tungsten Replicator AMI
PDF
Training Slides: 253 - Filter like a Pro
PDF
Training Slides: 252 - Monitoring & Troubleshooting
PDF
Training Slides: 302 - Securing Your Cluster With SSL
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Continuent Tungsten Value Proposition Webinar
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 253 - Filter like a Pro
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 302 - Securing Your Cluster With SSL

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Architecture types and enterprise applications.pdf
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPTX
The various Industrial Revolutions .pptx
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPT
Geologic Time for studying geology for geologist
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Five Habits of High-Impact Board Members
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
STKI Israel Market Study 2025 version august
Enhancing emotion recognition model for a student engagement use case through...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Chapter 5: Probability Theory and Statistics
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hindi spoken digit analysis for native and non-native speakers
A contest of sentiment analysis: k-nearest neighbor versus neural network
A review of recent deep learning applications in wood surface defect identifi...
Architecture types and enterprise applications.pdf
Web Crawler for Trend Tracking Gen Z Insights.pptx
The various Industrial Revolutions .pptx
Tartificialntelligence_presentation.pptx
Benefits of Physical activity for teenagers.pptx
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Geologic Time for studying geology for geologist
Module 1.ppt Iot fundamentals and Architecture
Five Habits of High-Impact Board Members
sustainability-14-14877-v2.pddhzftheheeeee
Zenith AI: Advanced Artificial Intelligence
STKI Israel Market Study 2025 version august

Tungsten University: Load A Vertica Data Warehouse With MySQL Data

  • 1. ©Continuent 2013 Tungsten University: Load a Vertica Data Warehouse with MySQL Data Robert Hodges CEO, Continuent
  • 2. ©Continuent 2013 Introducing Continuent 2 • The leading provider of clustering and replication for open source DBMS • Our Product: Continuent Tungsten • Clustering - Commercial-grade HA, performance scaling and data management for MySQL • Replication - Flexible, high-performance data movement
  • 3. ©Continuent 2013 OLTP and Data Warehouse Fundamentals 3
  • 4. ©Continuent 2013 The Contenders 4 Popular open source RDBMS for transaction processing Popular closed source RDBMS for analytics
  • 5. ©Continuent 2013 Storage Layout in MySQL 5 id cust_id prod_id ... 1 335301 532 ... 2 2378 6235 ... 3 ... ... ... Sales Table id sku type 532 C00135 consumer 533 S09957 specialty ... ... Product Table prod_id id 532 1 6235 2 ... ... Prod_ID Index Row format makes table scans very slow Indexes slow OLTP Low/no data compression Limited index types Limited join types
  • 6. ©Continuent 2013 Storage Layout in Vertica 6 Sales Table cust_id 335301 2378 ... prod_id 532 6235 ... Fast scans on columns Updates to single rows are hideously slow quantity 1 3 ... id 1 2 3 Every column is an index Good compression id 532 533 ... sku C00135 S09957 ... type consumer specialty ... Product Table Fast joins with parallel query
  • 7. ©Continuent 2013 Traditional ETL Problems 7 MySQL Sales Table Sales Table LoadTransferExtract Date columns = intrusive Batch-oriented = not timely Scan for changes = performance hit
  • 8. ©Continuent 2013 Questions for Real-Time Loading • Do I need to transform data and if so how? • Do I need to clean up bad information? • Do I need to process UPDATE/DELETE too? • Do I need to load from multiple sources? • How timely do loads need to be? • What if something fails? 8
  • 10. ©Continuent 2013 Real-Time Data Replication 10 MySQL Sales Table Sales Table Fast propagation = timely No SQL changes = transparent Automatic change capture = low impact DBMS Logs Data Replication
  • 11. ©Continuent 2013 Tungsten Master/Slave in Action 11 Master (Transactions + Metadata) Slave THL DBMS Logs Replicator (Transactions + Metadata) THLReplicator Download transactions via network Apply using JDBC
  • 12. ©Continuent 2013 Pipelines with Parallel Apply 12 Extract Filter Apply Stage Extract Filter Apply Stage Stage Pipeline Remote Master Transaction History Log Parallel Queue Slave DBMS Extract Filter Apply Extract Filter Apply Extract Filter Apply (Assign Shard ID)
  • 13. ©Continuent 2013 Real-Time Batch Loading 13 MySQL Tungsten Master Replicator Service my2vr MySQLExtractor Special Filters * pkey - Fill in pkey info * colnames - Fill in names * replicate - Ignore tables binlog_format=row Tungsten Slave Replicator Service my2vr MySQL Binlog CSV Files CSV Files CSV Files CSV Files CSV Files Large transaction batches to leverage load parallelization Single transactions from OLTP operations
  • 14. ©Continuent 2013 Batch Loading--The Gory Details 14 Replicator Service my2vr Transactions from master CSV Files CSV Files CSV Files Staging Tables Staging Tables Staging Tables Base Tables Base Tables Base Tables Merge Script (or) COPY directly to base tables COPY to stage tables SELECT to base tables
  • 15. ©Continuent 2013 Setting Up MySQL to Vertica Replication 15
  • 16. ©Continuent 2013 DEMO 16 MySQL toVertica replication with some bells and a whistle MySQL db01 db02 db03 db01 renamed02 X sysbench sysbench sysbench
  • 17. ©Continuent 2013 Get the Code wget --no-check-certificate https://s3.amazonaws.com/ files.continuent.com/builds/nightly/tungsten-2.0-snapshots/ tungsten-replicator-2.1.0-285.tar.gz tar -xf tungsten-replicator-2.1.0-285.tar.gz cd tungsten-replicator-2.1.0-285 17
  • 18. ©Continuent 2013 Installing MySQL Master 18 tools/tungsten-installer --master-slave -a --service-name=mysql2vertica --master-host=mysql1 --cluster-hosts=mysql1 --datasource-user=tungsten --datasource-password=secret --home-directory=/opt/continuent --buffer-size=100 --java-file-encoding=UTF8 --java-user-timezone=GMT --mysql-use-bytes-for-string=false --svc-extractor-filters=replicate,colnames,pkey --property=replicator.filter.pkey.addPkeyToInserts=true --property=replicator.filter.pkey.addColumnsToDeletes=true --property=replicator.filter.replicate.do=db01.*,db02.* --start-and-report
  • 19. ©Continuent 2013 Installing Vertica Slave 19 $ tools/tungsten-installer --master-slave -a --service-name=mysql2vertica --home-directory=/opt/continuent --cluster-hosts=vertica1 --master-host=mysql1 --datasource-type=vertica --datasource-user=dbadmin --datasource-password=secret --datasource-port=5433 --batch-enabled=true --batch-load-template=vertica6 --vertica-dbname=bigdata --java-user-timezone=GMT --java-file-encoding=UTF8 --svc-applier-filters=dbtransform --property=replicator.filter.dbtransform.from_regex1=db02 --property=replicator.filter.dbtransform.to_regex1=renamed02 --property=replicator.stage.q-to-dbms.blockCommitRowCount=25000 --start-and-report
  • 20. ©Continuent 2013 Generate Schema Using ddlscan 20 •Data types? •Column lengths? •Naming conventions? •Staging tables? MySQLTables ddlscan
  • 21. ©Continuent 2013 Tungsten ddlscan Utility cd /opt/continuent/tungsten/tungsten-replicator/bin # Base table generation. ./ddlscan -template ddl-mysql-vertica.vm -db db01 -user tungsten -pass secret >> ddl.sql # Staging table generation ./ddlscan -template ddl-mysql-vertica-staging.vm -db db01 -user tungsten -pass secret >> ddl.sql # Load into Vertica vsql -Udbadmin -wsecret < ddl.sql 21
  • 22. ©Continuent 2013 Checking Status # Checking status on master trepctl -host logos1 heartbeat trepctl -host logos1 status # Checking status on slave trepctl -host vertica1 status # Checking detailed performance of apply task. trepctl -host vertica1 status -name tasks 22
  • 24. ©Continuent 2013 Application Design Practices 24 • Primary keys on all tables • (Tungsten requires single column keys) • Clean schema design *really* helps • UTF-8 character set--or at least be consistent • Use GMT timezone--or be very consistent about dates • Use row replication on MySQL master
  • 25. ©Continuent 2013 Transforming Data -- Replicator Filters 25 • Tables to ignore/include? • Schema/table/column renaming? • Map names to upper/lower case? • Drop data? tungsten-installer --master-slave -a --service-name=mysql2vertica ... --svc-extractor-filters=pkey,colnames,replicate --property=replicator.filter.replicate.do=db01.*,db02.* ...
  • 26. ©Continuent 2013 List of Commonly Used Filters 26 • CDC -- Transform log to record of changes • colnames -- Add column names • dbtransform -- Change db name only • enumtostring -- Make MySQL enums a string • pkey -- Add primary key metadata • rename -- Rename db/table/column • replicate -- Replicate/don’t replicate tables • zerodate2null -- Make MySQL ‘0’ dates null
  • 27. ©Continuent 2013 Transforming Data -- Staging Server(s) 27 OLTP Servers Staging Server with Triggers/SQL Vertica Cluster
  • 28. ©Continuent 2013 Transforming Data -- Merge Script Hacks 28 # Hacked load script for Vertica--deletes always precede inserts, so # inserts can load directly. # Extract deleted data keys and put in temp CSV file for deletes. !egrep '^"D",' %%CSV_FILE%% |cut -d, -f4 > %%CSV_FILE%%.delete COPY %%STAGE_TABLE_FQN%% FROM '%%CSV_FILE%%.delete' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"' # Delete rows using an IN clause. You could also set a column value to # mark deleted rows. DELETE FROM %%BASE_TABLE%% WHERE %%BASE_PKEY%% IN (SELECT %%STAGE_PKEY%% FROM %%STAGE_TABLE_FQN%%) # Load inserts directly into base table from a separate CSV file. !egrep '^"I",' %%CSV_FILE%% |cut -d, -f4- > %%CSV_FILE%%.insert COPY %%BASE_TABLE%% FROM '%%CSV_FILE%%.insert' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"'
  • 29. ©Continuent 2013 Provisioning -- Using CSV 29 mysql> SELECT * from sales INTO OUTFILE ‘sales.csv’; ... (Fix up data if necessary) ... vsql> COPY sales FROM 'sales.csv' DIRECT NULL 'null' DELIMITER ',' ENCLOSED BY '"';
  • 30. ©Continuent 2013 Provisioning Using a Sandbox Server 30 OLTP Server Temporary Sandbox Server Vertica Cluster 1. Restore logical backup 2. Replicate restored transactions 3. Replicate normally after restore loads
  • 31. ©Continuent 2013 Parallel Provisioning from Sandbox 31 OLTP Server Temporary Sandbox Server Vertica Cluster 1. Restore logical backup 2. Replicate restored data in parallel 3. Replicate normally after restore loads
  • 32. ©Continuent 2013 Complex Topologies: Fan-In 32 Vertica Cluster logos1 Master logos2 Master logos2 Slave Services logos1
  • 34. ©Continuent 2013 Tungsten University Sessions 34 • Load a Vertica Data Warehouse with MySQL Data (May 30 10am PDT and June 4, 4pm CEST) Send feedback to: tu@continuent.com
  • 35. ©Continuent 2012. Continuent Web Page: http://www.continuent.com Tungsten Replicator 2.0: http://code.google.com/p/tungsten-replicator Our Blogs: http://scale-out-blog.blogspot.com http://!yingclusters.blogspot.com http://datacharmer.org/blog http://www.continuent.com/news/blogs 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: sales@continuent.com