SlideShare a Scribd company logo
The MySQL Availability Company
Tungsten Replicator Master Class
Intermediate: Replicator Monitoring & Troubleshooting
Chris Parker, Customer Success Director, EMEA & APAC
Topics
In this short course, we will
โ€ข Discuss Monitoring & Troubleshooting
โ€ข Review Common Issues
โ€ข Triggers
โ€ข Finding and understanding Log Files
โ€ข Handling (and recovering from) Failures
โ€ข Skipping Transactions
โ€ข Resetting
โ€ข Scripts for Monitoring
Most Common Issues
Common Incidents:
โ€ข Network Failures
โ€ข Disk Capacity
โ€ข OOM
โ€ข MySQL Crashes
โ€ข MySQL Max Connections
โ€ข Human Error
โ€ข Deletion of THL
โ€ข Incorrect Shutdown Procedures
โ€ข Writing direct to a replica
Things to check:
โ€ข Set up Monitoring / Alerts
โ€ข Check status outputs
โ€ข Check Log Files
โ€ข Check OS
โ€ข Check MySQL state
Triggers
โ€ข Triggers are an issue in all MySQL<>MySQL replication topologies
โ€ข The combination of the binlog_format (ROW vs MIXED) and the TRIGGER type (DETERMINISTIC vs
NONDETERMINISTIC) will result in triggers behaving differently.
โ€ข Either:
โ€ข Avoid Triggers altogether if possible, or
โ€ข Code Triggers to intelligently fire ONLY on Primary hosts
โ€ข Read the docs for a full explanation:
โ€ข https://docs.continuent.com/tungsten-clustering-6.1/troubleshooting-known-issues-
triggers.html
Log Files
โ€ข /opt/continuent/service_logs/
โ€ข trepsvc.log for complete replicator log
โ€ข Use logrotate to manage
โ€ข `replicator dump`
โ€ข Dumps memory stack to log file
โ€ข Can be useful in support cases
โ€ข Automatically issued by `tpm diag`
Recovering from Failures
โ€ข Enable auto-recovery properties
โ€ข Review log files to understand errors
โ€ข Simply try restarting replicator โ€“ YES this sometimes works!
โ€ข Which node is experiencing an error?
โ€ข If statement clash, is skipping safe?
โ€ข If database crashed, have you lost binary logs?
โ€ข Have you lost THL Files?
โ€ข Worse case โ€“ Reset and Re-provision
Monitoring Scripts
โ€ข Nagios-compatible scripts are located in
โ€ข These scripts will only check the local host. They do not check other hosts.
โ€ข check_tungsten_online โ€“ report ok if all services are online
โ€ข check_tungsten_services โ€“r โ€“ report ok if replicator services are healthy
โ€ข check_tungsten_latency โ€“ used for Nagios style warning and critical replicator latency
thresholds
โ€ข Write your own! - New API coming in v7
$CONTINUENT_ROOT/tungsten/cluster-home/bin/check_tungsten_*
Summary
What we have learnt today
โ€ข Reviewed Common Issues
โ€ข Discussed Triggers
โ€ข Looked at Log Files
โ€ข Discussed Failures
โ€ข Skipping Transactions
โ€ข Resetting
โ€ข Reviewed Scripts for Monitoring
Next Steps
In the next session we will
โ€ข Explore Filtering!
THANK YOU FOR LISTENING
continuent.com
The MySQL Availability Company
Chris Parker, Customer Success Director, EMEA & APAC

More Related Content

PPTX
Stress driven development
PDF
Self-Aware Applications: Automatic Production Monitoring (TechDays NL 2017)
PPTX
Testing Below the Application
ODP
Akka Persistence
PPTX
Test driving QML
PDF
Introduction to K6
PDF
Software Testing
PPTX
OTP, Concurrency and Testing Strategies
Stress driven development
Self-Aware Applications: Automatic Production Monitoring (TechDays NL 2017)
Testing Below the Application
Akka Persistence
Test driving QML
Introduction to K6
Software Testing
OTP, Concurrency and Testing Strategies

What's hot (20)

PPTX
Open Source Load Testing: JMeter, Gatling and Taurus
PPTX
Antifragility and testing for distributed systems failure
ย 
PDF
Gatling - Bordeaux JUG
PPTX
Bsides Knoxville - APT2
PDF
The Beam Vision for Portability: "Write once run anywhere"
PPTX
Load Testing with Taurus using Jenkins and AWS
PPT
A Practical Event Driven Model
ย 
KEY
CPAN Gems From The Far East
PDF
Speed geeking-lotusscript
PPTX
WebLogic Stability; Detect and Analyse Stuck Threads
PDF
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
PDF
EUC2015 - Load testing XMPP servers with Plain Old Erlang
KEY
Test Driven Development
PDF
Take a Look at Akka+Java (English version)
PPTX
Tuenti Release Workflow
ย 
PDF
Summit 16: Stop Writing Legacy Code!
ย 
PDF
Into The Box 2018 | Assert control over your legacy applications
PDF
Hands On, Duchess 10/17/2012
PPTX
Introduction to GOCD - Amulya Sharma
PPTX
Project Reactor By Example
Open Source Load Testing: JMeter, Gatling and Taurus
Antifragility and testing for distributed systems failure
ย 
Gatling - Bordeaux JUG
Bsides Knoxville - APT2
The Beam Vision for Portability: "Write once run anywhere"
Load Testing with Taurus using Jenkins and AWS
A Practical Event Driven Model
ย 
CPAN Gems From The Far East
Speed geeking-lotusscript
WebLogic Stability; Detect and Analyse Stuck Threads
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
EUC2015 - Load testing XMPP servers with Plain Old Erlang
Test Driven Development
Take a Look at Akka+Java (English version)
Tuenti Release Workflow
ย 
Summit 16: Stop Writing Legacy Code!
ย 
Into The Box 2018 | Assert control over your legacy applications
Hands On, Duchess 10/17/2012
Introduction to GOCD - Amulya Sharma
Project Reactor By Example
Ad

Similar to Training Slides: 252 - Monitoring & Troubleshooting (20)

PDF
Buytaert kris my_sql-pacemaker
PPTX
Performance tuning Grails Applications GR8Conf US 2014
PDF
Gearman: A Job Server made for Scale
PPTX
Metasploit & Windows Kernel Exploitation
PPT
High Performance Mysql
PDF
Gearman - Northeast PHP 2012
PDF
Got Problems? Let's Do a Health Check
PPTX
Maria DB Galera Cluster for High Availability
ย 
PPTX
MariaDB Galera Cluster
PPTX
Ansible Best Practices - July 30
PPTX
Practical Windows Kernel Exploitation
PDF
Performance tuning Grails applications
PPTX
Techno-Fest-15nov16
PPTX
Fastest Servlets in the West
PPTX
Asynchronous programming using CompletableFutures in Java
PDF
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
PPSX
LMAX Disruptor - High Performance Inter-Thread Messaging Library
PPTX
Scaling apps for the big time
PPT
MySQL Performance Tuning at COSCUP 2014
PPTX
Hot to build continuously processing for 24/7 real-time data streaming platform?
Buytaert kris my_sql-pacemaker
Performance tuning Grails Applications GR8Conf US 2014
Gearman: A Job Server made for Scale
Metasploit & Windows Kernel Exploitation
High Performance Mysql
Gearman - Northeast PHP 2012
Got Problems? Let's Do a Health Check
Maria DB Galera Cluster for High Availability
ย 
MariaDB Galera Cluster
Ansible Best Practices - July 30
Practical Windows Kernel Exploitation
Performance tuning Grails applications
Techno-Fest-15nov16
Fastest Servlets in the West
Asynchronous programming using CompletableFutures in Java
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
LMAX Disruptor - High Performance Inter-Thread Messaging Library
Scaling apps for the big time
MySQL Performance Tuning at COSCUP 2014
Hot to build continuously processing for 24/7 real-time data streaming platform?
Ad

More from Continuent (20)

PDF
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
PDF
Continuent Tungsten Value Proposition Webinar
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracleโ€™s InnoDB Cluster
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
PDF
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
PDF
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
PDF
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
PPTX
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
PDF
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
PDF
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
PDF
Training Slides: 351 - Tungsten Replicator for Data Warehouses
PDF
Training Slides: 303 - Replicating out of a Cluster
PDF
Training Slides: 206 - Using the Tungsten Cluster AMI
PDF
Training Slides: 254 - Using the Tungsten Replicator AMI
PDF
Training Slides: 253 - Filter like a Pro
PDF
Training Slides: 302 - Securing Your Cluster With SSL
PDF
Webinar Slides: Global MySQL Availability: SaaS Cloud Contact Center Secures ...
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Continuent Tungsten Value Proposition Webinar
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracleโ€™s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 253 - Filter like a Pro
Training Slides: 302 - Securing Your Cluster With SSL
Webinar Slides: Global MySQL Availability: SaaS Cloud Contact Center Secures ...

Recently uploaded (20)

PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PPTX
Introduction to Information and Communication Technology
PDF
๐Ÿ’ฐ ๐”๐Š๐“๐ˆ ๐Š๐„๐Œ๐„๐๐€๐๐†๐€๐ ๐Š๐ˆ๐๐„๐‘๐Ÿ’๐ƒ ๐‡๐€๐‘๐ˆ ๐ˆ๐๐ˆ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ“ ๐Ÿ’ฐ
ย 
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
international classification of diseases ICD-10 review PPT.pptx
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Internet___Basics___Styled_ presentation
PDF
Introduction to the IoT system, how the IoT system works
ย 
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
ย 
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPT
tcp ip networks nd ip layering assotred slides
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
SAP Ariba Sourcing PPT for learning material
The New Creative Director: How AI Tools for Social Media Content Creation Are...
WebRTC in SignalWire - troubleshooting media negotiation
Introduction to Information and Communication Technology
๐Ÿ’ฐ ๐”๐Š๐“๐ˆ ๐Š๐„๐Œ๐„๐๐€๐๐†๐€๐ ๐Š๐ˆ๐๐„๐‘๐Ÿ’๐ƒ ๐‡๐€๐‘๐ˆ ๐ˆ๐๐ˆ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ“ ๐Ÿ’ฐ
ย 
Cloud-Scale Log Monitoring _ Datadog.pdf
international classification of diseases ICD-10 review PPT.pptx
Unit-1 introduction to cyber security discuss about how to secure a system
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Internet___Basics___Styled_ presentation
Introduction to the IoT system, how the IoT system works
ย 
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
Sims 4 Historia para lo sims 4 para jugar
RPKI Status Update, presented by Makito Lay at IDNOG 10
ย 
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
tcp ip networks nd ip layering assotred slides
PptxGenJS_Demo_Chart_20250317130215833.pptx
INTERNET------BASICS-------UPDATED PPT PRESENTATION
SASE Traffic Flow - ZTNA Connector-1.pdf
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
SAP Ariba Sourcing PPT for learning material

Training Slides: 252 - Monitoring & Troubleshooting

  • 1. The MySQL Availability Company Tungsten Replicator Master Class Intermediate: Replicator Monitoring & Troubleshooting Chris Parker, Customer Success Director, EMEA & APAC
  • 2. Topics In this short course, we will โ€ข Discuss Monitoring & Troubleshooting โ€ข Review Common Issues โ€ข Triggers โ€ข Finding and understanding Log Files โ€ข Handling (and recovering from) Failures โ€ข Skipping Transactions โ€ข Resetting โ€ข Scripts for Monitoring
  • 3. Most Common Issues Common Incidents: โ€ข Network Failures โ€ข Disk Capacity โ€ข OOM โ€ข MySQL Crashes โ€ข MySQL Max Connections โ€ข Human Error โ€ข Deletion of THL โ€ข Incorrect Shutdown Procedures โ€ข Writing direct to a replica Things to check: โ€ข Set up Monitoring / Alerts โ€ข Check status outputs โ€ข Check Log Files โ€ข Check OS โ€ข Check MySQL state
  • 4. Triggers โ€ข Triggers are an issue in all MySQL<>MySQL replication topologies โ€ข The combination of the binlog_format (ROW vs MIXED) and the TRIGGER type (DETERMINISTIC vs NONDETERMINISTIC) will result in triggers behaving differently. โ€ข Either: โ€ข Avoid Triggers altogether if possible, or โ€ข Code Triggers to intelligently fire ONLY on Primary hosts โ€ข Read the docs for a full explanation: โ€ข https://docs.continuent.com/tungsten-clustering-6.1/troubleshooting-known-issues- triggers.html
  • 5. Log Files โ€ข /opt/continuent/service_logs/ โ€ข trepsvc.log for complete replicator log โ€ข Use logrotate to manage โ€ข `replicator dump` โ€ข Dumps memory stack to log file โ€ข Can be useful in support cases โ€ข Automatically issued by `tpm diag`
  • 6. Recovering from Failures โ€ข Enable auto-recovery properties โ€ข Review log files to understand errors โ€ข Simply try restarting replicator โ€“ YES this sometimes works! โ€ข Which node is experiencing an error? โ€ข If statement clash, is skipping safe? โ€ข If database crashed, have you lost binary logs? โ€ข Have you lost THL Files? โ€ข Worse case โ€“ Reset and Re-provision
  • 7. Monitoring Scripts โ€ข Nagios-compatible scripts are located in โ€ข These scripts will only check the local host. They do not check other hosts. โ€ข check_tungsten_online โ€“ report ok if all services are online โ€ข check_tungsten_services โ€“r โ€“ report ok if replicator services are healthy โ€ข check_tungsten_latency โ€“ used for Nagios style warning and critical replicator latency thresholds โ€ข Write your own! - New API coming in v7 $CONTINUENT_ROOT/tungsten/cluster-home/bin/check_tungsten_*
  • 8. Summary What we have learnt today โ€ข Reviewed Common Issues โ€ข Discussed Triggers โ€ข Looked at Log Files โ€ข Discussed Failures โ€ข Skipping Transactions โ€ข Resetting โ€ข Reviewed Scripts for Monitoring
  • 9. Next Steps In the next session we will โ€ข Explore Filtering!
  • 10. THANK YOU FOR LISTENING continuent.com The MySQL Availability Company Chris Parker, Customer Success Director, EMEA & APAC