A Year in Google - Percona Live Europe 2018

Our Year in Google
Carmen Mason :: VitalSource Technologies
Allan Mason :: Pythian Group, Inc.

2
Who we are
Allan and Carmen Mason

Digital learning platforms that
drive outcomes.
3

We Provide Solutions For:
4
Institutions Corporate
Learning
Associations &
Certifying Bodies
Executive
Education
Campus
Stores
Publishers

Key Statistics: Last 12 Months
5
15
STUDENTS SERVED
Million 7
INSTITUIONS
Thousand 22
BOOKS AND COURSES
DELIVERED
Million

© The Pythian Group Inc., 2018 8© The Pythian Group Inc., 2018 8
HELPING
BUSINESSES
BECOME MORE
DATA-DRIVEN

© The Pythian Group Inc., 2018 9© The Pythian Group Inc., 2018 9
PYTHIAN
A global IT company that helps businesses leverage disruptive technologies to better compete.
Our services and software solutions unleash the power of cloud, data and analytics to drive better
business outcomes for our clients.
Our 20 years in data, commitment to hiring the best talent, and our deep technical and business expertise
allow us to meet our promise of using technology to deliver the best outcomes faster.

© 2017 Pythian. Confidential 10
Years in Business
20 Pythian Experts
in 35 Countries
400+ Current Clients
Globally
350+

We’re Hiring!
https://www.pythian.com/careers/
© 2018 Pythian. Confidential

12
Agenda
Things we’ll cover today:
• Motivations
• Considerations that drove us
• Decisions we faced
• Our current stack
• What’s next

14
Local Data Center HA Configuration
Solution Provides
• Saved Binary Logs
• Differential Relay Logs
• VIP Handling
• Notification

15
Failover in the Data Center
• 10 – 20 seconds
• Automatically assigns most up to date replica as the new master
• VIP tied to CNAME
• Replicas are slaved to the new master
• A CHANGE MASTER statement is logged to bring the old master in line
once it’s fixed.
• Alerts by email, and notifications in Slack, with decisions that were made
during the failover.

16
Ownership
• DNS – Managed by Parent Company’s Windows team.
• Load Balancer – Managed by Parent Company’s Network team.
• App Servers – VMs managed by Parent Company’s VMware team.
• Covers – An expensive Isilon managed by Parent Company’s Storage
team.
• All our stuff – Network managed by Parent Company’s Network team.

18
“Firemen just ran into our datacenter…
with a hose.”

21
Why The Cloud?
• Flexibility to scale
• “Always on” mentality
• Easy to manage

23
Why Google Cloud Platform
• Google Cloud Platform regions are connected using Google’s private
network.
• They wrote Kubernetes; they support it well.
• Excellent, hands on support.
• Better stability and performance, in our experience.

24
Cost Comparison
Cloud Provider Instance Type vCPU RAM SSD Max IOPS Max
Throughput
Cost/mo.
n1-standard-16 16
cores
60G 1024G 32,000 based on
storage
25,000 based on
vCPU
480 IO block
sizes of 256
KB
$ 562.44
€ 489.88
m4.4xlarge 16
cores
64G 1024G 30,000 based on
storage.
320 IO $ 2,368.11
€ 2.062,68
as of 2018-10-22

Google Compute Engine
vs Google CloudSQL

26
Self Managed Instances vs. Google CloudSQL
(GCE)
CloudSQL

27
Self Managed Instances: The Pros
• Customizable instances
• Full control of the OS and installed packages
• MySQL built in max values on variables
• EG. Max Connections
• Choose Percona Server, MariaDB, etc.

28
Self Managed Instance: The Cons
• We manage the backups
• We manage patches and upgrades
• We manage failovers
• Requires more time on Systems Administration
• Leaving less time for Database work

29
CloudSQL: The Pros
• Routine patching
• Automatic failovers
• No resource management needed.
Google CloudSQL

30
CloudSQL: The Cons
• Can't customize CloudSQL instance types
• Configurable Limits vs. Fixed Limits
• Limitations based on Machine Type:
• Maximum Concurrent Connections: up to 4,000 concurrent users.
• Storage Limits:
• 10,230 GB on standard and high memory machine types.
• 3,062 GB on micro and small machine types.
• https://cloud.google.com/sql/docs/quotas
• https://cloud.google.com/sql/docs/mysql/known-issues
Google CloudSQL

31
CloudSQL:
High Availability Options
Failover Requirements
• Unresponsive for ~60 seconds or zone failure.
• Replica must be in same zone
• Create your own Region 2 slave!
• Replication lag must be < 10 minutes.
Actions of a Failover
• Master fails or is unresponsive.
• Or Zone failure.
• Failover replication catches up.
• Replica is promoted. (name and IP moved)
• New failover replica created.
• Read replica recreated in same zone as new master.
(IP moved)

32
Cost Comparisons – Compute Engine
2 x Database
Master, Replica
n1-standard-16 1460 total hours
per month
$776.72 / € 662.00
1 x Failover n1-standard-16 730 total hours per
month
$388.36 / € 331.00
SSD Persistent disk SSD Storage 3072 GB $522.24 / € 445.11
Total $1,687.32 / € 1.438,10
as of 2018-10-22
db-n1-standard-16 2048 GB 730 total hours per
month
$2,274.80 / € 1.938,81
db-n1-standard-16 1024 GB 730 total hours per
month
$963.32 / € 821.04
Total $3,238.12 / € 2.759,85
Compute Engine Costs
CloudSQL Costs

35
ProxySQL - Why
• Written specifically for MySQL.
• Monitors topology, quickly recognizes changes, and forwards queries accordingly.
• Provides functionality that we liked, such as:
- Rewriting queries on the fly – bad developers!
- Query routing – read / write splitting
- Load balancing – balance reads between replicas
- Query throttling – If you won’t throttle the API…

36
Current HA Configuration:
Google Container Engine (GKE) & ProxySQL
select * from mysql_servers;
hostgroup_id hostname port status
0 app1-db-p01-use1b 3306 ONLINE
1 app1-db-p02-use1c 3306 ONLINE

37
ProxySQL - Configuration
hostgroup_id hostname port status
0 app1-db-p01-use1b 3306 OFFLINE_HARD
1 app1-db-p02-use1c 3306 ONLINE
• Monitor user in MySQL.
• GRANT REPLICATION CLIENT ON *.* TO 'monitor'@'%' IDENTIFIED BY 'password’;
• ProxySQL monitors the read_only flag on the server to determine hostgroup.
• In our configuration:
• hostgroup 0 is writable.
• hostgroup 1 is read-only.
hostgroup = 0 (writers)
hostgroup = 1 (readers)

38
ProxySQL - Troubleshooting
Install a mysql client so we can look at proxysql admin interface:
apt-get update -qq ; apt-get install -y lsof netcat vim-tiny mysql-client > /dev/null 2>&1
Log in to proxysql admin interface:
mysql -h127.0.0.1 -P6032 –uuser -p@pass
Get a list of servers and their statuses:
select * from mysql_servers;
See monitoring checks for read_only:
SELECT * FROM monitor.mysql_server_read_only_log ORDER BY time_start_us DESC ;
See monitoring checks for response time:
SELECT * FROM monitor.mysql_server_ping_log ORDER BY time_start_us DESC;
Show current connection stats:
SELECT * FROM stats.stats_mysql_connection_pool;

39
A Little Mystery, a Little Drama…

40
WHERE clause?
Who needs a WHERE clause?
• Long Running Query Alerts
• Disk Space Alerts
• Missing WHERE Clauses
• Forced Developer Re-education Programs
• Kill it with FIRE

41
Finger Pointing – Bad MySQL Optimizer!
1. It MUST be MySQL!
2. It HAS to be ProxySQL!
3. It MUST be the ORM!

42
ProxySQL - CYA
SELECT * FROM table WHERE id = 12;

43
ProxySQL - CYA
SELECT * FROM table WHERE id = 12;

44
ProxySQL - CYA
UDATE table.column SET column = `new_value` WHERE id = 12;

45
ProxySQL - CYA
DELETE FROM table WHERE id = 12;

47
ProxySQL – CYA – Block Query Rules

48
ProxySQL – CYA – Block Query Rules
mysql_query_rules: (
{ rule_id=1 active=1 match_digest="^UPDATE" flagIN=0 flagOUT=1 log=0 comment="flag UPDATEs for chain 1" },
{ rule_id=2 active=1 match_digest="^DELETE" flagIN=0 flagOUT=1 log=0 comment="flag DELETEs for chain 1" },
{ rule_id=3 active=1 match_digest="^SELECT" flagIN=0 flagOUT=1 log=0 comment="flag SELECTs for chain 1" },
{ rule_id=4 active=1 match_pattern="WHERE " flagIN=1 apply=1 log=0 comment="if there's a WHERE, apply it" },
{ rule_id=5 active=1 match_pattern="LIMIT " flagIN=1 apply=1 log=0 comment="if there's no WHERE but there is a LIMIT, apply it" },
{ rule_id=6 active=1 error_msg="All UPDATE/DELETE/SELECT queries *MUST* include a WHERE or a LIMIT!" flagIN=1 apply=1
log=1 comment="if we're here, we're missing WHERE and LIMIT - throw an error and log it" }
)

49
ProxySQL – CYA – Show Rule Hits
SELECT hits,
mysql_query_rules.rule_id,
digest,
active,
username,
match_digest,
match_pattern,
replace_pattern,
cache_ttl,
apply
FROM mysql_query_rules
natural JOIN stats.stats_mysql_query_rules
ORDER BY mysql_query_rules.rule_id;

51
The Big Reveal – Maybe?
Latest idea:
NewRelic for Google Cloud combined with Ruby’s ActiveRecord

52
Current HA Configuration:
Master High Availability

53
Master High Availability (MHA) – Why?
• Easy to automate the configuration and installation
• Very fast failovers: 7 to 11 seconds of downtime, proven repeatedly in
production.
• Brings replication servers current using the logs from the most up to date
slave, or the master (if accessible).

54
Current HA Configuration
Solution Provides:
• Failover between Zones
• Failover between Regions (manual)
• Saved Binary Logs
• Differential Relay Logs…
• Notification

56
Why not MHA?
• Non-atomic relay log entries in RBR makes relay log use scary.
• https://dev.mysql.com/doc/refman/5.6/en/replication-solutions-unexpected-slave-
halt.html
• Cannot use to failover to central slave with current configuration.
• Requires manually updating config files.
• Can create a split brain scenario.
• Network issue between zones, means MHA manager can’t see the master. It will
perform a failover.
MHA

57
Big Disk = More Performance
Instance vCPU # Sustained Random IOPS Sustained Throughput (MB/s)
Read Max
(IO block sizes <8 KB)
Write Read Max
(IO block sizes >256 KB)
Write
>15 vCPUs 15000 15000 240 240
<32 vCPUs 60000 30000 1200 400
Volume Size Sustained Random IOPS Sustained Throughput (MB/s)
Reads Writes Reads Writes
10 GB 300 300 4.8 4.8
2048 GB 60000 30000 983 400
4096 GB 60000 30000 1200 400
SSD Persistent Disks Performances. Minimums and Maximums

58
IP Aliases
• VIP for MHA?
• “If you remove an alias IP range from one VM and assign it to another VM, it might
take up to a minute for the transfer to complete.”
- cloud.google.com/vpc/docs/configure-alias-ip-ranges
• Pods become natively routable on the GCP network.
• Regular network costs vs. egress bandwidth charges.
• Reduced latency.
• Improved Security

60
Different Proof of Concepts in Progress
• CloudSQL
• Google Spanner – Google’s “NewSQL” Database (DBaaS)
• Orchestrator with ProxySQL
Beam me up,
Scotty!!!

61
Carmen Mason
Senior Database Administrator
VitalSource Technologies, LLC
https://www.vitalsource.com
@CarmenMasonCIC

62
Allan Mason
Database Consultant
Pythian Group
https://pythian.com/
@_digitalknight

63
Watkins
Mascot
VitalSource Technologies
https://www.vitalsource.com/
@vitalsource

A Year in Google - Percona Live Europe 2018

More Related Content

What's hot (20)

Similar to A Year in Google - Percona Live Europe 2018 (20)

Recently uploaded (20)

A Year in Google - Percona Live Europe 2018

Editor's Notes