SlideShare a Scribd company logo
ADDING FAST ANALYTICS TO
MYSQL APPLICATIONS
WITH CLICKHOUSE
Robert Hodges and Altinity Engineering Team
Introduction to Presenter
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Introduction to ClickHouse
SQL optimized for analytics
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
Is WAY fast on analytic queries
a b c d
a b c d
a b c d
a b c d
Introduction to MySQL
Full SQL implementation
Runs on bare metal to cloud
Stores data in rows
Single-threaded, concurrent query
Scales to high transaction loads
Is Open source (GPL V2)
Is WAY fast on updates & point queries
BTreeIndex
a b c d e f g h
i j k l
BTreeBTree
MySQL Trade-Offs
Queries on large MySQL tables are resource-intensive and inefficient....
● Enormous I/O load due to row organization
● Careful indexing required
● Compression of limited value
● Parallel query limited/unavailable
● Highly dependent on buffer pool size
For rows > 100M MySQL analytic results are very slow
Options for ClickHouse query acceleration
Full migration to ClickHouseMove big tables, keep
dimensions in MySQL
Other combinations are possible...
Accessing MySQL
Tables from
ClickHouse
MySQL sample tables
request_id*
datetime
date
customer_id
sku_id
Table: traffic
id*
name
Table: customer
id*
name
Table: sku
Database: repl
Accessing MySQL data from ClickHouse
MySQL Database Engine
MySQL Table Function
MySQL Table Engine
MySQL Dictionary
Selecting data from tables in MySQL
-- Select data from all tables.
SELECT
t.datetime, t.date, t.request_id,
c.name customer, s.name sku
FROM traffic t
JOIN customer c ON t.customer_id = c.id
JOIN sku s ON t.sku_id = s.id
LIMIT 10;
Access a MySQL database from ClickHouse
CREATE DATABASE mysql_repl
ENGINE=MySQL(
'127.0.0.1:3306',
'repl',
'root',
'secret')
use mysql_repl
show tables
Database engine
Navigating MySQL tables from ClickHouse
Demo
Selecting data from MySQL
SELECT
t.datetime, t.date, t.request_id,
t.name customer, s.name sku
FROM (
SELECT t.* FROM traffic t
JOIN customer c ON t.customer_id = c.id) AS t
JOIN sku s ON t.sku_id = s.id
WHERE customer_id = 5
ORDER BY t.request_id LIMIT 10
Predicate pushed
down to MySQL
ClickHouse performance beats MySQL!
Transferring data from MySQL Engine
-- Create a ClickHouse table from MySQL.
CREATE TABLE traffic as repl.traffic
ENGINE = MergeTree
PARTITION BY toYYYYMM(datetime)
ORDER BY (customer_id, date)
-- Pull in MySQL data.
INSERT INTO traffic SELECT *
FROM mysql_repl.traffic
SELECT count(*) FROM traffic
Accessing data using MySQL table function
SELECT t.datetime, t.date, t.request_id, c.name customer
FROM traffic t
JOIN
mysql('127.0.0.1:3306', 'repl', 'customer', 'root',
'secret') c
ON t.customer_id = c.id
WHERE t.customer_id = 5
ORDER BY t.request_id LIMIT 10
Predicate pushdown
works from base table
Accessing data using MySQL table engine
CREATE TABLE mysql_customer (
id Int32,
name String
)
ENGINE = MySQL(127.0.0.1:3306', 'repl', 'customer', 'root',
'secret')
SELECT t.datetime, t.date, t.request_id, c.name customer
FROM traffic t
JOIN mysql_customer c ON t.customer_id = c.id
ORDER BY t.request_id LIMIT 10
Access a MySQL table using a Dictionary
<yandex>
<dictionary><name>mysql_sku</name>
<source> <mysql>
<host>localhost</host> <port>3306</port><user>root</user>
<password>********</password><db>repl</db> <table>sku</table>
</mysql> </source>
<layout> <hashed/> </layout>
<structure>
<id> <name>id</name> </id>
<attribute>
<name>name</name> <type>String</type> <null_value></null_value>
</attribute>
</structure>
<lifetime>0</lifetime>
</dictionary>
</yandex>
Select local, remote, and dictionary data
SELECT
t.datetime,
t.date,
t.request_id,
c.name AS customer,
dictGetOrDefault('mysql_sku', 'name',
toUInt64(sku_id), 'NOT FOUND') AS sku
FROM traffic AS t
INNER JOIN mysql_customer AS c ON t.customer_id = c.id
ORDER BY t.request_id ASC
LIMIT 10
Figuring out what’s happening on MySQL
-- Enable MySQL query log
set global general_log=1;
(MySQL query log)
16 Connect root@localhost on repl using TCP/IP
16 Query SET NAMES utf8
16 Query SELECT `id`, `name` FROM `repl`.`sku` WHERE `id` = 3
Automatic
Propagation of
Changes
Replicate MySQL Data to ClickHouse
SELECT from MySQL Table
MySQL Row Replication
Kafka
SELECT Method: Setup
-- Create a tracking table on MySQL side.
CREATE TABLE last_request_id (
id bigint
);
INSERT INTO last_request_id VALUES (-1);
Ensure table is
visible in
ClickHouse
Load initial value
SELECT Method: Change Propagation
INSERT INTO traffic SELECT *
FROM mysql_repl.traffic
WHERE request_id >
(
SELECT max(id)
FROM mysql_repl.last_request_id
)
INSERT INTO mysql_repl.last_request_id SELECT max(request_id)
FROM traffic
Select new
rows
Update tracking value
MySQL Replication Method: Setup
my.cnf:
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
binlog-format = row
(1) Ensure MySQL
table(s) have
primary keys
(2) Enable row
replication
(3) Install and run
clickhouse-mysql
https://github.com/Altinity/clickhouse-mysql-data-reader
MySQL Replication: Change Propagation
Demo
Kafka Method: Discussion
Kafka
Queue
Binlog
Binlog
Binlog
Food for thought
● ClickHouse works best on wide tables
● Consider triggers to add dimension info to base rows on MySQL
● Replication methods are complex to operate
● Use Kafka when many MySQL instances generate data
● Best approach is to migrate large tables completely to MySQL
○ No replication to manage
○ MySQL runs faster and requires fewer resources
ClickHouse MySQL
Client Support
ClickHouse supports MySQL clients??!
<?xml version="1.0"?>
<yandex>
...
<!-- Enable MySQL wire protocol. -->
<mysql_port>33306</mysql_port>
...
</yandex>
Here’s the proof!
mysql -h127.0.0.1 -P33306 -udefault --password=''
...
mysql> use mysql_repl
mysql> SELECT
-> t.datetime, t.date, t.request_id,
-> t.name customer, s.name sku
-> FROM (
-> SELECT t.* FROM traffic t
-> JOIN customer c ON t.customer_id = c.id) AS t
-> JOIN sku s ON t.sku_id = s.id
-> WHERE customer_id = 5
-> ORDER BY t.request_id LIMIT 10;
...
Wrap-up
Key Takeaways
● ClickHouse has multiple ways to access data in MySQL
● Use replication to pull changes, if you have to
● ClickHouse supports MySQL Protocol so clients can
connect directly
● Keep approach as simple as possible for maximum joy
ClickHouse can query MySQL data faster than MySQL!*
*Your mileage may vary
Thank you!
Special Offer:
Contact us for a
1-hour consultation!
Contacts:
info@altinity.com
Visit us at:
https://www.altinity.com
ClickHouse-MySQL:
https://github.com/Altinity/clickhouse-mysql-data-reader
Free Consultation:
https://blog.altinity.com/offer

More Related Content

PDF
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
PDF
MariaDB and Clickhouse Percona Live 2019 talk
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PDF
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
PDF
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
PDF
ClickHouse Materialized Views: The Magic Continues
PDF
Your first ClickHouse data warehouse
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
MariaDB and Clickhouse Percona Live 2019 talk
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
ClickHouse Materialized Views: The Magic Continues
Your first ClickHouse data warehouse
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges

What's hot (20)

PDF
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
PDF
Altinity Quickstart for ClickHouse
PDF
Tiered storage intro. By Robert Hodges, Altinity CEO
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PDF
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
PDF
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
PDF
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
PDF
ClickHouse Features for Advanced Users, by Aleksei Milovidov
PDF
Fun with click house window functions webinar slides 2021-08-19
PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
PDF
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
PDF
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
PDF
Better than you think: Handling JSON data in ClickHouse
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
PDF
ClickHouse Monitoring 101: What to monitor and how
PDF
Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content a...
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Quickstart for ClickHouse
Tiered storage intro. By Robert Hodges, Altinity CEO
Creating Beautiful Dashboards with Grafana and ClickHouse
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Fun with click house window functions webinar slides 2021-08-19
ClickHouse materialized views - a secret weapon for high performance analytic...
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
Better than you think: Handling JSON data in ClickHouse
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Monitoring 101: What to monitor and how
Big Data and Beautiful Video: How ClickHouse enables Mux to Deliver Content a...
Ad

Similar to Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse (20)

PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PDF
Low Cost Transactional and Analytics with MySQL + Clickhouse
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
PDF
Low Cost Transactional and Analytics with MySQL + Clickhouse
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
PDF
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
PDF
Our Story With ClickHouse at seo.do
PDF
10 Good Reasons to Use ClickHouse
PDF
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Low Cost Transactional and Analytics with MySQL + Clickhouse
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Low Cost Transactional and Analytics with MySQL + Clickhouse
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Our Story With ClickHouse at seo.do
10 Good Reasons to Use ClickHouse
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Ad

More from Altinity Ltd (20)

PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Fun with ClickHouse Window Functions-2021-08-19.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
ClickHouse ReplacingMergeTree in Telecom Apps
Adventures with the ClickHouse ReplacingMergeTree Engine
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
A Presentation on Artificial Intelligence
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Webinar slides: Adding Fast Analytics to MySQL Applications with Clickhouse

  • 1. ADDING FAST ANALYTICS TO MYSQL APPLICATIONS WITH CLICKHOUSE Robert Hodges and Altinity Engineering Team
  • 2. Introduction to Presenter www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Introduction to ClickHouse SQL optimized for analytics Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast on analytic queries a b c d a b c d a b c d a b c d
  • 4. Introduction to MySQL Full SQL implementation Runs on bare metal to cloud Stores data in rows Single-threaded, concurrent query Scales to high transaction loads Is Open source (GPL V2) Is WAY fast on updates & point queries BTreeIndex a b c d e f g h i j k l BTreeBTree
  • 5. MySQL Trade-Offs Queries on large MySQL tables are resource-intensive and inefficient.... ● Enormous I/O load due to row organization ● Careful indexing required ● Compression of limited value ● Parallel query limited/unavailable ● Highly dependent on buffer pool size For rows > 100M MySQL analytic results are very slow
  • 6. Options for ClickHouse query acceleration Full migration to ClickHouseMove big tables, keep dimensions in MySQL Other combinations are possible...
  • 8. MySQL sample tables request_id* datetime date customer_id sku_id Table: traffic id* name Table: customer id* name Table: sku Database: repl
  • 9. Accessing MySQL data from ClickHouse MySQL Database Engine MySQL Table Function MySQL Table Engine MySQL Dictionary
  • 10. Selecting data from tables in MySQL -- Select data from all tables. SELECT t.datetime, t.date, t.request_id, c.name customer, s.name sku FROM traffic t JOIN customer c ON t.customer_id = c.id JOIN sku s ON t.sku_id = s.id LIMIT 10;
  • 11. Access a MySQL database from ClickHouse CREATE DATABASE mysql_repl ENGINE=MySQL( '127.0.0.1:3306', 'repl', 'root', 'secret') use mysql_repl show tables Database engine
  • 12. Navigating MySQL tables from ClickHouse Demo
  • 13. Selecting data from MySQL SELECT t.datetime, t.date, t.request_id, t.name customer, s.name sku FROM ( SELECT t.* FROM traffic t JOIN customer c ON t.customer_id = c.id) AS t JOIN sku s ON t.sku_id = s.id WHERE customer_id = 5 ORDER BY t.request_id LIMIT 10 Predicate pushed down to MySQL
  • 15. Transferring data from MySQL Engine -- Create a ClickHouse table from MySQL. CREATE TABLE traffic as repl.traffic ENGINE = MergeTree PARTITION BY toYYYYMM(datetime) ORDER BY (customer_id, date) -- Pull in MySQL data. INSERT INTO traffic SELECT * FROM mysql_repl.traffic SELECT count(*) FROM traffic
  • 16. Accessing data using MySQL table function SELECT t.datetime, t.date, t.request_id, c.name customer FROM traffic t JOIN mysql('127.0.0.1:3306', 'repl', 'customer', 'root', 'secret') c ON t.customer_id = c.id WHERE t.customer_id = 5 ORDER BY t.request_id LIMIT 10 Predicate pushdown works from base table
  • 17. Accessing data using MySQL table engine CREATE TABLE mysql_customer ( id Int32, name String ) ENGINE = MySQL(127.0.0.1:3306', 'repl', 'customer', 'root', 'secret') SELECT t.datetime, t.date, t.request_id, c.name customer FROM traffic t JOIN mysql_customer c ON t.customer_id = c.id ORDER BY t.request_id LIMIT 10
  • 18. Access a MySQL table using a Dictionary <yandex> <dictionary><name>mysql_sku</name> <source> <mysql> <host>localhost</host> <port>3306</port><user>root</user> <password>********</password><db>repl</db> <table>sku</table> </mysql> </source> <layout> <hashed/> </layout> <structure> <id> <name>id</name> </id> <attribute> <name>name</name> <type>String</type> <null_value></null_value> </attribute> </structure> <lifetime>0</lifetime> </dictionary> </yandex>
  • 19. Select local, remote, and dictionary data SELECT t.datetime, t.date, t.request_id, c.name AS customer, dictGetOrDefault('mysql_sku', 'name', toUInt64(sku_id), 'NOT FOUND') AS sku FROM traffic AS t INNER JOIN mysql_customer AS c ON t.customer_id = c.id ORDER BY t.request_id ASC LIMIT 10
  • 20. Figuring out what’s happening on MySQL -- Enable MySQL query log set global general_log=1; (MySQL query log) 16 Connect root@localhost on repl using TCP/IP 16 Query SET NAMES utf8 16 Query SELECT `id`, `name` FROM `repl`.`sku` WHERE `id` = 3
  • 22. Replicate MySQL Data to ClickHouse SELECT from MySQL Table MySQL Row Replication Kafka
  • 23. SELECT Method: Setup -- Create a tracking table on MySQL side. CREATE TABLE last_request_id ( id bigint ); INSERT INTO last_request_id VALUES (-1); Ensure table is visible in ClickHouse Load initial value
  • 24. SELECT Method: Change Propagation INSERT INTO traffic SELECT * FROM mysql_repl.traffic WHERE request_id > ( SELECT max(id) FROM mysql_repl.last_request_id ) INSERT INTO mysql_repl.last_request_id SELECT max(request_id) FROM traffic Select new rows Update tracking value
  • 25. MySQL Replication Method: Setup my.cnf: server-id = 1 log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M binlog-format = row (1) Ensure MySQL table(s) have primary keys (2) Enable row replication (3) Install and run clickhouse-mysql https://github.com/Altinity/clickhouse-mysql-data-reader
  • 26. MySQL Replication: Change Propagation Demo
  • 28. Food for thought ● ClickHouse works best on wide tables ● Consider triggers to add dimension info to base rows on MySQL ● Replication methods are complex to operate ● Use Kafka when many MySQL instances generate data ● Best approach is to migrate large tables completely to MySQL ○ No replication to manage ○ MySQL runs faster and requires fewer resources
  • 30. ClickHouse supports MySQL clients??! <?xml version="1.0"?> <yandex> ... <!-- Enable MySQL wire protocol. --> <mysql_port>33306</mysql_port> ... </yandex>
  • 31. Here’s the proof! mysql -h127.0.0.1 -P33306 -udefault --password='' ... mysql> use mysql_repl mysql> SELECT -> t.datetime, t.date, t.request_id, -> t.name customer, s.name sku -> FROM ( -> SELECT t.* FROM traffic t -> JOIN customer c ON t.customer_id = c.id) AS t -> JOIN sku s ON t.sku_id = s.id -> WHERE customer_id = 5 -> ORDER BY t.request_id LIMIT 10; ...
  • 33. Key Takeaways ● ClickHouse has multiple ways to access data in MySQL ● Use replication to pull changes, if you have to ● ClickHouse supports MySQL Protocol so clients can connect directly ● Keep approach as simple as possible for maximum joy ClickHouse can query MySQL data faster than MySQL!* *Your mileage may vary
  • 34. Thank you! Special Offer: Contact us for a 1-hour consultation! Contacts: info@altinity.com Visit us at: https://www.altinity.com ClickHouse-MySQL: https://github.com/Altinity/clickhouse-mysql-data-reader Free Consultation: https://blog.altinity.com/offer