SlideShare a Scribd company logo
Inexpensive
Datamasking for
MySQL with
ProxySQL
René Cannaò
Who we are
René Cannaò
Founder of ProxySQL
MySQL SRE at Dropbox
thanks to:
Frédéric Descamps
MySQL Community Manager
Other Sessions
273. ProxySQL, MaxScale, MySQL Router and other database traffic
managers / Petr Zaitsev (Percona)
155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
Agenda
● Database overview
● What is ProxySQL
● Features overview
● Data masking
● Rules
● Masking rules
● Obfuscation with mysqldump
● Examples
Overview of ProxySQL
Application and Database layers
APPLICATIONS
DATABASES
Main motivations
empower the DBAs
Improves manageability
understand and improve performance
High performance and High Availability
create a proxy layer to shield the database
Database as a Service (layered)
APPLICATIONS
DATABASES + MANAGER(s)
DAAS – REVERSE PROXY
What is ProxySQL?
The MySQL data stargate
How to deploy
How to deploy
ProxySQL Features (short list)
High Availability and Scalability
seamless failover
firewall
query throttling
query timeout
query mirroring
runtime reconfiguration
Scheduler
Support for Galera/PXC and
Group Replication
on-the-fly rewrite of queries
caching reads outside the database
connection pooling and multiplexing
complex query routing and r/w split
load balancing
real time statistics
monitoring
Data masking
Multiple instances on same ports
Native Clustering
Support for ClickHouse
Data Masking
Data masking or data obfuscation is the process of hiding original
data with random characters or data.
The main reason for applying masking to a data field is to protect
data that is classified as personal identifiable data, personal
sensitive data or commercially sensitive data, however the data
must remain usable for the purposes of undertaking valid test cycles
Why using ProxySQL as data masking
solution?
Open Source & Free like in beer
Other solutions are expensive or not working
Not worse than the other solutions as currently none is perfect
The best solution would be to have this feature implemented in the
server just after the handler API
Query Rules
instructions to "program" ProxySQL behavior
matching criteria
actions
flow control and chains
Query Rewrite
Dynamically rewrite queries sent by the application/client
without the client being aware
on the fly
using ProxySQL query rules
rules defined using regular expressions, s/match/replace/
The concept
We use Regular Expressions to modify the clients’ SQL statement
and replace the column(s) we want to hide by some characters or
generate fake data.
We will split our solution in two different solutions:
● Provide access to the database to developers
● Generate dump to populate a database to share
Only the defined users, in our example we use a developer, will
have his statements modified.
The concept (2)
We will also create two categories :
•data masking
•data obfuscating
Data Masking
Here we will just mask with a generic character the full value of the
column or part of it:
Data Obfuscation
Here we will just replace the value of the column with random
characters of the same type, we create fake data
Access
INSERT INTO mysql_users
(username, password, active, default_hostgroup)
VALUES ('devel','devel',1,1);
INSERT INTO mysql_users
(username, password, active, default_hostgroup)
VALUES ('backup','dumpme',1,1);
Create a user for masking:
Create a user for backups:
Rules
Avoid SELECT *
for the developer, we need to create some rules to block any
SELECT * variant on the table
if the column is part of many tables, we need to do so for each
of them
Rules (2)
Mask or obfuscate the field
when the field is selected in the columns we need:
● to replace the column by showing the first 2 characters and a
certain amount of X s or generate a random string
● keep the column name
● for mysqldump we need to allow SELECT * but mask and/or
obfuscate sensible values
Rules overview
rule_id: 1
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: `*first_name*`
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern: first_name
apply: 0
Rule #1
rule_id: 2
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: ((?)(`?w+`?.)?first_name()?)([ ,n])
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern:
1CONCAT(LEFT(2first_name,2),REPEAT('X',10))3 first_name4
apply: 0
Rule #2
rule_id: 158
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: ((?)(`?w+`?.)?salary()?)([ ,n])
negate_match_pattern: 0
re_modifiers: CASELESS,GLOBAL
flagOUT: NULL
replace_pattern: 1CONCAT( floor(rand() * 50000) + 10000,'')3
salary4
Rule #2 - obfuscating
Let's imagine we want to provide fake number for `salaries`.`salary` column.
We could instead of the previous rule use this one
rule_id: 3
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: )()?) first_names+(w),
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern: )1 2,
apply: 1
Rule #3
rule_id: 4
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: )()?) first_names+(.*)s+from
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern: )1 2 from
apply: 1
Rule #4
rule_id: 5
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+*.*FROM.*employees
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #5
rule_id: 6
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+employees.*.*FROM.*employees
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #6
rule_id: 7
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+(w+).*.*FROM.*employeess+(ass+)?(1)
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #6
Rules for mysqldump
To provide a dump that might be used by developers, Q/A or
support, we need to:
● generate valid data
● obfuscate sensitive information
● rewrite SQL statements issued by mysqldump
● only for tables and columns with sensitive data
mysqldump rules
rule_id: 8
active: 1
user: backup
schema: employees
flagIN: 0
match: ^/*!40001 SQL_NO_CACHE */ * FROM `salaries`
replace: SQL_NO_CACHE emp_no,
ROUND(RAND()*100000), from_date, to_date
FROM salaries
flagOUT: NULL
apply: 1
Rule #8
mysqldump rules
rule_id: 9
active: 1
user: backup
schema: employees
flagIN: 0
match: * FROM `employees`
replace: emp_no, CONCAT(LEFT(birth_date,2),
FLOOR(RAND()*50)+10,
RIGHT(birth_date,6)) birth_date,
CONCAT(LEFT(first_name,2),
REPEAT('x',LENGTH(first_name)-2)) first_name,
CONCAT(LEFT(last_name,3),
REPEAT('x',LENGTH(last_name)-3)) last_name,
gender, hire_date FROM employees
flagOUT: NULL
apply: 1
Rule #9
Limitions
● better support in proxySQL >= 1.4.x
○ RE2 an PCRE regexes
● all fields with the same name will be masked whatever the
name of the table is in the same schema
● the regexps can always be not sufficient
● block any query not matching whitelisted SQL statements
● the dump via ProxySQL solution seems to be the best
Make it easy
This is not really easy isn´t it ?
You can use this small bash script
(https://github.com/lefred/maskit) to generate them:
# ./maskit.sh -c first_name -t employees -d employees
column: first_name
table: employees
schema: employees
let's add the rules...
Examples
Easy ones:
SELECT * FROM employees;
SELECT emp_no, last_name, first_name FROM employees;
Examples (2)
More difficult:
select emp_no, concat(first_name), last_name from
employees;
select emp_no, first_name, first_name from
employees.employees
select emp_no, `first_name` from employees;
select emp_no, first_name
-> from employees; (*)
Examples (3)
More difficult:
select t1.first_name from employees.employees as t1;
select emp_no, first_name as fred from employees;
select emp_no, first_name rene from employees;
select emp_no, first_name `as` from employees;
select first_name as `as`, last_name from employees;
select `t1`.`first_name` from employees.employees as t1;
Examples (4)
More difficult:
select first_name fred, last_name from employees;
select emp_no, first_name /* first_name */ from
employees.employees;
/* */ select last_name, first_name from employees;
select CUSTOMERS.* from myapp.CUSTOMERS;
select a.* from employees.employees a;`
We need you!
Thank you!
Questions?
E: rene@proxysql.com

More Related Content

PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
PPTX
Sizing MongoDB Clusters
PDF
Maxscale switchover, failover, and auto rejoin
PDF
Upgrade from MySQL 5.7 to MySQL 8.0
PDF
Upgrade to MySQL 8.0!
PDF
[234]멀티테넌트 하둡 클러스터 운영 경험기
PDF
MySQL Parallel Replication by Booking.com
The Full MySQL and MariaDB Parallel Replication Tutorial
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
Sizing MongoDB Clusters
Maxscale switchover, failover, and auto rejoin
Upgrade from MySQL 5.7 to MySQL 8.0
Upgrade to MySQL 8.0!
[234]멀티테넌트 하둡 클러스터 운영 경험기
MySQL Parallel Replication by Booking.com

What's hot (20)

PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
PPTX
PostgreSQL Database Slides
PDF
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
PDF
Deep review of LMS process
PDF
Galera Cluster - Node Recovery - Webinar slides
PDF
MySQL developing Store Procedure
PDF
Maxscale_메뉴얼
PDF
Spark and S3 with Ryan Blue
PDF
[2018] MySQL 이중화 진화기
PDF
AWS 환경에서 MySQL BMT
PDF
[Pgday.Seoul 2020] SQL Tuning
PDF
Uber: Kafka Consumer Proxy
ODP
Introduction to Structured Streaming
PDF
Planning for Disaster Recovery (DR) with Galera Cluster
PDF
ProxySQL Tutorial - PLAM 2016
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
PPTX
Introduction to Kafka Cruise Control
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
PDF
Linux tuning to improve PostgreSQL performance
PDF
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
PostgreSQL Database Slides
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Deep review of LMS process
Galera Cluster - Node Recovery - Webinar slides
MySQL developing Store Procedure
Maxscale_메뉴얼
Spark and S3 with Ryan Blue
[2018] MySQL 이중화 진화기
AWS 환경에서 MySQL BMT
[Pgday.Seoul 2020] SQL Tuning
Uber: Kafka Consumer Proxy
Introduction to Structured Streaming
Planning for Disaster Recovery (DR) with Galera Cluster
ProxySQL Tutorial - PLAM 2016
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Introduction to Kafka Cruise Control
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Linux tuning to improve PostgreSQL performance
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
Ad

Viewers also liked (20)

PPTX
MEAN Stack
PDF
[스마트스터디]MongoDB 의 역습
PDF
SunshinePHP 2017 - Making the most out of MySQL
PDF
Building Scalable High Availability Systems using MySQL Fabric
PDF
MySQL Enterprise Cloud
PDF
Coding like a girl - DjangoCon
PDF
Strip your TEXT fields
PDF
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
PDF
Sharding using MySQL and PHP
PPTX
Exploring MongoDB & Elasticsearch: Better Together
PDF
The MySQL Server Ecosystem in 2016
PPTX
Laravel 5 and SOLID
PDF
MySQL 5.7 - 
Tirando o Máximo Proveito
PDF
20171104 hk-py con-mysql-documentstore_v1
PDF
Strip your TEXT fields - Exeter Web Feb/2016
PDF
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
PDF
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
PDF
MySQL Cluster Whats New
PDF
Mongodb
PDF
LAMP: Desenvolvendo além do trivial
MEAN Stack
[스마트스터디]MongoDB 의 역습
SunshinePHP 2017 - Making the most out of MySQL
Building Scalable High Availability Systems using MySQL Fabric
MySQL Enterprise Cloud
Coding like a girl - DjangoCon
Strip your TEXT fields
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Sharding using MySQL and PHP
Exploring MongoDB & Elasticsearch: Better Together
The MySQL Server Ecosystem in 2016
Laravel 5 and SOLID
MySQL 5.7 - 
Tirando o Máximo Proveito
20171104 hk-py con-mysql-documentstore_v1
Strip your TEXT fields - Exeter Web Feb/2016
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
MySQL Cluster Whats New
Mongodb
LAMP: Desenvolvendo além do trivial
Ad

Similar to Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL) (20)

PDF
Inexpensive Datamasking for MySQL with ProxySQL - data anonymization for deve...
PDF
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
PPT
My sql with querys
PPTX
MYSQL -1.pptx
PPTX
MySQL Essential Training
PDF
Sql injection
PDF
A Tour to MySQL Commands
PPT
ODP
PPT
4. Data Manipulation.ppt
PPT
My sql presentation
PDF
Mysql cheatsheet
ODP
Mysql1
PPTX
PYTHON_DATABASE_CONNECTIVITY_for_class_12.pptx
Inexpensive Datamasking for MySQL with ProxySQL - data anonymization for deve...
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Pyhton with Mysql to perform CRUD operations.pptx
My sql with querys
MYSQL -1.pptx
MySQL Essential Training
Sql injection
A Tour to MySQL Commands
4. Data Manipulation.ppt
My sql presentation
Mysql cheatsheet
Mysql1
PYTHON_DATABASE_CONNECTIVITY_for_class_12.pptx

More from Ontico (20)

PDF
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
PDF
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
PPTX
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
PDF
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
PDF
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PDF
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PDF
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
PPTX
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
PDF
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
PPTX
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
PPTX
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
PDF
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
PPT
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
PPTX
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
PPTX
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
PPTX
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
PPTX
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
PDF
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
PDF
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
composite construction of structures.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PPTX
Geodesy 1.pptx...............................................
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Welding lecture in detail for understanding
PPTX
additive manufacturing of ss316l using mig welding
PPT
Project quality management in manufacturing
PDF
PPT on Performance Review to get promotions
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
OOP with Java - Java Introduction (Basics)
Embodied AI: Ushering in the Next Era of Intelligent Systems
composite construction of structures.pdf
Digital Logic Computer Design lecture notes
Structs to JSON How Go Powers REST APIs.pdf
Internet of Things (IOT) - A guide to understanding
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
Geodesy 1.pptx...............................................
Mechanical Engineering MATERIALS Selection
Welding lecture in detail for understanding
additive manufacturing of ss316l using mig welding
Project quality management in manufacturing
PPT on Performance Review to get promotions
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Foundation to blockchain - A guide to Blockchain Tech
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

  • 2. Who we are René Cannaò Founder of ProxySQL MySQL SRE at Dropbox thanks to: Frédéric Descamps MySQL Community Manager
  • 3. Other Sessions 273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona) 155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
  • 4. Agenda ● Database overview ● What is ProxySQL ● Features overview ● Data masking ● Rules ● Masking rules ● Obfuscation with mysqldump ● Examples
  • 6. Application and Database layers APPLICATIONS DATABASES
  • 7. Main motivations empower the DBAs Improves manageability understand and improve performance High performance and High Availability create a proxy layer to shield the database
  • 8. Database as a Service (layered) APPLICATIONS DATABASES + MANAGER(s) DAAS – REVERSE PROXY
  • 9. What is ProxySQL? The MySQL data stargate
  • 12. ProxySQL Features (short list) High Availability and Scalability seamless failover firewall query throttling query timeout query mirroring runtime reconfiguration Scheduler Support for Galera/PXC and Group Replication on-the-fly rewrite of queries caching reads outside the database connection pooling and multiplexing complex query routing and r/w split load balancing real time statistics monitoring Data masking Multiple instances on same ports Native Clustering
  • 14. Data Masking Data masking or data obfuscation is the process of hiding original data with random characters or data. The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles
  • 15. Why using ProxySQL as data masking solution? Open Source & Free like in beer Other solutions are expensive or not working Not worse than the other solutions as currently none is perfect The best solution would be to have this feature implemented in the server just after the handler API
  • 16. Query Rules instructions to "program" ProxySQL behavior matching criteria actions flow control and chains
  • 17. Query Rewrite Dynamically rewrite queries sent by the application/client without the client being aware on the fly using ProxySQL query rules rules defined using regular expressions, s/match/replace/
  • 18. The concept We use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data. We will split our solution in two different solutions: ● Provide access to the database to developers ● Generate dump to populate a database to share Only the defined users, in our example we use a developer, will have his statements modified.
  • 19. The concept (2) We will also create two categories : •data masking •data obfuscating
  • 20. Data Masking Here we will just mask with a generic character the full value of the column or part of it:
  • 21. Data Obfuscation Here we will just replace the value of the column with random characters of the same type, we create fake data
  • 22. Access INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('devel','devel',1,1); INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('backup','dumpme',1,1); Create a user for masking: Create a user for backups:
  • 23. Rules Avoid SELECT * for the developer, we need to create some rules to block any SELECT * variant on the table if the column is part of many tables, we need to do so for each of them
  • 24. Rules (2) Mask or obfuscate the field when the field is selected in the columns we need: ● to replace the column by showing the first 2 characters and a certain amount of X s or generate a random string ● keep the column name ● for mysqldump we need to allow SELECT * but mask and/or obfuscate sensible values
  • 25. Rules overview rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0 Rule #1
  • 26. rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: ((?)(`?w+`?.)?first_name()?)([ ,n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: 1CONCAT(LEFT(2first_name,2),REPEAT('X',10))3 first_name4 apply: 0 Rule #2
  • 27. rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: ((?)(`?w+`?.)?salary()?)([ ,n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: 1CONCAT( floor(rand() * 50000) + 10000,'')3 salary4 Rule #2 - obfuscating Let's imagine we want to provide fake number for `salaries`.`salary` column. We could instead of the previous rule use this one
  • 28. rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: )()?) first_names+(w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )1 2, apply: 1 Rule #3
  • 29. rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: )()?) first_names+(.*)s+from re_modifiers: caseless,global flagOUT: NULL replace_pattern: )1 2 from apply: 1 Rule #4
  • 30. rule_id: 5 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #5
  • 31. rule_id: 6 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+employees.*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #6
  • 32. rule_id: 7 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+(w+).*.*FROM.*employeess+(ass+)?(1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #6
  • 33. Rules for mysqldump To provide a dump that might be used by developers, Q/A or support, we need to: ● generate valid data ● obfuscate sensitive information ● rewrite SQL statements issued by mysqldump ● only for tables and columns with sensitive data
  • 34. mysqldump rules rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/*!40001 SQL_NO_CACHE */ * FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1 Rule #8
  • 35. mysqldump rules rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: * FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1 Rule #9
  • 36. Limitions ● better support in proxySQL >= 1.4.x ○ RE2 an PCRE regexes ● all fields with the same name will be masked whatever the name of the table is in the same schema ● the regexps can always be not sufficient ● block any query not matching whitelisted SQL statements ● the dump via ProxySQL solution seems to be the best
  • 37. Make it easy This is not really easy isn´t it ? You can use this small bash script (https://github.com/lefred/maskit) to generate them: # ./maskit.sh -c first_name -t employees -d employees column: first_name table: employees schema: employees let's add the rules...
  • 38. Examples Easy ones: SELECT * FROM employees; SELECT emp_no, last_name, first_name FROM employees;
  • 39. Examples (2) More difficult: select emp_no, concat(first_name), last_name from employees; select emp_no, first_name, first_name from employees.employees select emp_no, `first_name` from employees; select emp_no, first_name -> from employees; (*)
  • 40. Examples (3) More difficult: select t1.first_name from employees.employees as t1; select emp_no, first_name as fred from employees; select emp_no, first_name rene from employees; select emp_no, first_name `as` from employees; select first_name as `as`, last_name from employees; select `t1`.`first_name` from employees.employees as t1;
  • 41. Examples (4) More difficult: select first_name fred, last_name from employees; select emp_no, first_name /* first_name */ from employees.employees; /* */ select last_name, first_name from employees; select CUSTOMERS.* from myapp.CUSTOMERS; select a.* from employees.employees a;`