SlideShare a Scribd company logo
Jeff Freund, CTO Clickability
End of a long day, I am the last stop between you and ….. ~6 hours 42 mins left Yippee! Future CTO
Software-as-a-Service Web CMS True Multi-Tenant SaaS platform from the ground up Integrated solution of all services required to run a sophisticated business website HQ in San Francisco, 8+ years old, 60+ employees Global leader in On Demand  Web Content Management
 
250+ million pages delivered per month
Linux Apache MySQL Java Tomcat Proven open source building blocks
Scale-out horizontally Distributed infrastructure, including multiple datacenters Multiple Layers of caching for performance Loose-coupling of applications around data
M1 M2 S3 S4 S2 S1 S5 S6 VPN  Tunnel Data Center 1 Data Center 2
con = db.getReadWriteConnection();  con = db.getReadOnlyConnection(); con = db.getSafeReadConnection(); Application Code Intelligently Split Queries between Masters and Slaves Inserts/Updates/Deletes sent to Master Most Reads sent to Slaves “ Safe” Reads sent to Masters – zero tolerance for latency Manual code updates to implement the split 6+ months in production to find all “Safe” Reads Slave Master RO Connection Manager RW Connection Manager
The difference in time between when a transaction is committed on one database and then subsequently committed on a replicated database. Latency can either be “slowness” or “breakage”
7… Hardware Maintenance / Recovery 6… Schema updates / DB Maintenance 5… Elevated transaction rates (i.e. bulk loads) 4... High query load on slaves 3… Network bottlenecks / Loss of connectivity 2… “Slave Errors” (ie Duplicate keys, deadlocks)
 
while ( 1 ) while? echo "show slave status \G;" | mysql -u USER --password=PASSWORD | grep Seconds_Behind_Master >> replication.log while? sleep 1 while? end Seconds
M1 M2 S4 S6 VPN  Tunnel Data Center 1 Data Center 2 S3 S2 S1 S5
M1 M2 S4 S6 V  PN Tunnel CREATE TABLE `replTest` ( timecol` bigint(20) default NULL, KEY `idx_timecol` (`timecol`) ) Loop: $val = current timestamp in epoch milliseconds M2:  INSERT INTO replTest (timecol) VALUES ($val) M1:  SELECT  $val -max(timecol) from replTest; S4:  SELECT  $val -max(timecol) from replTest; S6:  SELECT  $val -max(timecol) from replTest; INSERT
All DBs are 1 replication hop away from transaction source All hardware is roughly equal Remote location is ~ 60 miles away Data taken from 100,000 samples over an hour of standard operations Database Characteristics Average Latency Max Latency M2 Transaction Source N/A N/A M1 Local; Moderate Load ~ 6 ms ~ 315 ms S4 Local; High Load ~ 190 ms ~12 seconds S6 Remote; Minimal load ~ 5 ms ~ 400 ms
S4 Database milliseconds 95 % of the time, replication latency will be 1 second or less
Now what?
If you do, your Ops Team will love you for it. Assume that it will happen in the course of standard operations.  Build the application to accommodate it.
Local ehcache on application  servers Distributed Object Cache  (memcached) Need to clear all caches  effectively on object updates Pub 1 Pub 2 Pub 3 Local  cache Reliable Cache Clearing Messages Distributed Object Cache
Multicast Notification Bus for “clear cache” messages The race is on!  If message arrives before transaction is replicated, stale object maybe reloaded…. Frequently accessed objects most susceptible to problems CMS Pub DB1 DB2
Multicast Notification Bus with tuning parameters The race is on again!  But the database transaction gets a tunable head start.  0.5 sec, 1 sec, 2 secs, 5 secs Better – lasted for years, but in the end 99.99+% still wasn’t reliable enough...(remember the long tail on chart?) CMS PUB DB1 DB2
Database Queue table for messages Messages are committed after data, injecting them into the replication data stream. All apps poll the database queue table once per second. Guaranteed that data will arrive before message!!! CMS PUB DB1 DB2 Queue Poller
If you don’t need to replicate it, don’t! Split data functionally (i.e. separate large blog storage from relational transactions to keep the pipes clear) Build the appropriate recovery tools – our “rewind button”
Masters in multiple data centers Greater geographic distance between data centers MySQL load balancing – will messaging still be reliable???
[email_address] Questions?  Feedback?

More Related Content

PDF
Tips and Tricks for Operating Apache Kafka
PDF
OSMC 2017 | SNMP explained by Rob Hassing
PDF
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
PPTX
Google file system
PDF
Fosdem 2014 - MySQL & Friends Devroom: 15 tips galera cluster
PPTX
Determinism in finance
PDF
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
PDF
Openstack meetup lyon_2017-09-28
Tips and Tricks for Operating Apache Kafka
OSMC 2017 | SNMP explained by Rob Hassing
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Google file system
Fosdem 2014 - MySQL & Friends Devroom: 15 tips galera cluster
Determinism in finance
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Openstack meetup lyon_2017-09-28

What's hot (20)

PPTX
Google File Systems
PDF
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
PDF
Percon XtraDB Cluster in a nutshell
PDF
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
ODP
Low level java programming
PPTX
Google file system
PPTX
Zookeeper Architecture
PPTX
Stabilising the jenga tower
PDF
Scaling with sync_replication using Galera and EC2
ODP
Testing Wi-Fi with OSS Tools
PPTX
Prometheus with Grafana - AddWeb Solution
PDF
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
PPT
Advanced off heap ipc
PPTX
Google file system
PPT
Galera Cluster Best Practices for DBA's and DevOps Part 1
PPTX
Apache Incubator Samza: Stream Processing at LinkedIn
PDF
Advanced Operations
PPTX
Mario on spark
PDF
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
PDF
How to understand Galera Cluster - 2013
Google File Systems
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
Percon XtraDB Cluster in a nutshell
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Low level java programming
Google file system
Zookeeper Architecture
Stabilising the jenga tower
Scaling with sync_replication using Galera and EC2
Testing Wi-Fi with OSS Tools
Prometheus with Grafana - AddWeb Solution
Matthew Treinish, HP - subunit2sql: Tracking 1 Test Result in Millions, OpenS...
Advanced off heap ipc
Google file system
Galera Cluster Best Practices for DBA's and DevOps Part 1
Apache Incubator Samza: Stream Processing at LinkedIn
Advanced Operations
Mario on spark
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
How to understand Galera Cluster - 2013
Ad

Similar to Mysql Latency (20)

PDF
Data Grids with Oracle Coherence
PDF
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
PPTX
Software architecture for data applications
PDF
Log everything! @DC13
PDF
SMACK Stack 1.1
PDF
Apache Spark 2.0: Faster, Easier, and Smarter
PDF
How to get started with Oracle Cloud Infrastructure
PPTX
Cisco OpenSOC
PDF
Fully fault tolerant real time data pipeline with docker and mesos
PDF
Spca2014 advanced share point troubleshooting hessing
PDF
PuppetDB: Sneaking Clojure into Operations
PPTX
Apache Beam: A unified model for batch and stream processing data
PDF
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
PDF
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
PPT
strata_spark_streaming.ppt
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
PPT
Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.com
PDF
FPC for the Masses - CoRIIN 2018
PDF
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
PDF
Multitenancy: Kafka clusters for everyone at LINE
Data Grids with Oracle Coherence
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Software architecture for data applications
Log everything! @DC13
SMACK Stack 1.1
Apache Spark 2.0: Faster, Easier, and Smarter
How to get started with Oracle Cloud Infrastructure
Cisco OpenSOC
Fully fault tolerant real time data pipeline with docker and mesos
Spca2014 advanced share point troubleshooting hessing
PuppetDB: Sneaking Clojure into Operations
Apache Beam: A unified model for batch and stream processing data
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy—160 Billion Daily Messages on One Shared Cluster at LINE
strata_spark_streaming.ppt
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.com
FPC for the Masses - CoRIIN 2018
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Multitenancy: Kafka clusters for everyone at LINE
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4

Mysql Latency

  • 1. Jeff Freund, CTO Clickability
  • 2. End of a long day, I am the last stop between you and ….. ~6 hours 42 mins left Yippee! Future CTO
  • 3. Software-as-a-Service Web CMS True Multi-Tenant SaaS platform from the ground up Integrated solution of all services required to run a sophisticated business website HQ in San Francisco, 8+ years old, 60+ employees Global leader in On Demand Web Content Management
  • 4.  
  • 5. 250+ million pages delivered per month
  • 6. Linux Apache MySQL Java Tomcat Proven open source building blocks
  • 7. Scale-out horizontally Distributed infrastructure, including multiple datacenters Multiple Layers of caching for performance Loose-coupling of applications around data
  • 8. M1 M2 S3 S4 S2 S1 S5 S6 VPN Tunnel Data Center 1 Data Center 2
  • 9. con = db.getReadWriteConnection(); con = db.getReadOnlyConnection(); con = db.getSafeReadConnection(); Application Code Intelligently Split Queries between Masters and Slaves Inserts/Updates/Deletes sent to Master Most Reads sent to Slaves “ Safe” Reads sent to Masters – zero tolerance for latency Manual code updates to implement the split 6+ months in production to find all “Safe” Reads Slave Master RO Connection Manager RW Connection Manager
  • 10. The difference in time between when a transaction is committed on one database and then subsequently committed on a replicated database. Latency can either be “slowness” or “breakage”
  • 11. 7… Hardware Maintenance / Recovery 6… Schema updates / DB Maintenance 5… Elevated transaction rates (i.e. bulk loads) 4... High query load on slaves 3… Network bottlenecks / Loss of connectivity 2… “Slave Errors” (ie Duplicate keys, deadlocks)
  • 12.  
  • 13. while ( 1 ) while? echo "show slave status \G;" | mysql -u USER --password=PASSWORD | grep Seconds_Behind_Master >> replication.log while? sleep 1 while? end Seconds
  • 14. M1 M2 S4 S6 VPN Tunnel Data Center 1 Data Center 2 S3 S2 S1 S5
  • 15. M1 M2 S4 S6 V PN Tunnel CREATE TABLE `replTest` ( timecol` bigint(20) default NULL, KEY `idx_timecol` (`timecol`) ) Loop: $val = current timestamp in epoch milliseconds M2: INSERT INTO replTest (timecol) VALUES ($val) M1: SELECT $val -max(timecol) from replTest; S4: SELECT $val -max(timecol) from replTest; S6: SELECT $val -max(timecol) from replTest; INSERT
  • 16. All DBs are 1 replication hop away from transaction source All hardware is roughly equal Remote location is ~ 60 miles away Data taken from 100,000 samples over an hour of standard operations Database Characteristics Average Latency Max Latency M2 Transaction Source N/A N/A M1 Local; Moderate Load ~ 6 ms ~ 315 ms S4 Local; High Load ~ 190 ms ~12 seconds S6 Remote; Minimal load ~ 5 ms ~ 400 ms
  • 17. S4 Database milliseconds 95 % of the time, replication latency will be 1 second or less
  • 19. If you do, your Ops Team will love you for it. Assume that it will happen in the course of standard operations. Build the application to accommodate it.
  • 20. Local ehcache on application servers Distributed Object Cache (memcached) Need to clear all caches effectively on object updates Pub 1 Pub 2 Pub 3 Local cache Reliable Cache Clearing Messages Distributed Object Cache
  • 21. Multicast Notification Bus for “clear cache” messages The race is on! If message arrives before transaction is replicated, stale object maybe reloaded…. Frequently accessed objects most susceptible to problems CMS Pub DB1 DB2
  • 22. Multicast Notification Bus with tuning parameters The race is on again! But the database transaction gets a tunable head start. 0.5 sec, 1 sec, 2 secs, 5 secs Better – lasted for years, but in the end 99.99+% still wasn’t reliable enough...(remember the long tail on chart?) CMS PUB DB1 DB2
  • 23. Database Queue table for messages Messages are committed after data, injecting them into the replication data stream. All apps poll the database queue table once per second. Guaranteed that data will arrive before message!!! CMS PUB DB1 DB2 Queue Poller
  • 24. If you don’t need to replicate it, don’t! Split data functionally (i.e. separate large blog storage from relational transactions to keep the pipes clear) Build the appropriate recovery tools – our “rewind button”
  • 25. Masters in multiple data centers Greater geographic distance between data centers MySQL load balancing – will messaging still be reliable???