SlideShare a Scribd company logo
MariaDB High
Availability
MariaDB Corp
Roadshow 2017
MariaDB MaxScaleMariaDB Multi-Master Cluster
O P E R A T I N G S Y S T E M / F I L E S Y S T E M / S A N / C L O U D
Application
Connectors
MariaDB Server
NoSQL CRUD API
Original Core MariaDB
MariaDB Engineering
Community Contribution
Replicas
Supporting
Asynchronous,
Semi-Sync &
Synchronous
replication
MariaDB
C JDBC ODBC
Replication Kernel Production Plugins
Parallel Slave
GTIDBinLog API
Multi-Source SQL Parser
Cache/Buffer
Optimiser
Connection
Pool
Temporal
PL/SQL
Audit
AWS KMS
Authentication
Handler Socket Etc.
40+ Plugins
SQL
Lightweight Transactional Interoperability
Performance
& Scalability
Graph &
SearchAnalytics
InnoDB
XtraDBAria
Memory
MyISAM
CONNECTColumnStoreSpider OQGRAPH
MyRocks Mroonga
STORAGE LAYER EXTENSIBILITY
KERNEL EXTENSIBILITY
High
Availability
Defined
In information technology,
high availability refers to a
system or component that is
continuously operational for a
desirably long length of time.
Availability – Wikipedia
up time / total time
Approach to HA
3.7 days / year
Backup /
Restore
1
< 99.9%
52.6 min / year
Replication /
Automatic failover
3
~ 99.99%
8.8hs / year
Simple
replication /
manual
failover
2
~ 99.9%
5.3 min / year
Galera
Cluster
~ 99.999%
4 5
Other
Strategies for High Availability
An average of 80 percent of mission-critical application service
downtime is directly caused by people or process failures. The
other 20 percent is caused by technology failure, environmental
failure or a disaster
Gartner Research
High Availability Background
• High Availability isn’t always equal to long Uptime
– A system is “up” but it might not be accessible
– A system that is “down” just once, but for a long time, is NOT highly available
• High Availability rather means
– Long Mean Time Between Failures (MTBF)
– Short Mean Time To Recover (MTTR)
• High availability is:
– a system design protocol and associated implementation that ensures a certain degree of
operational continuity during a given measurement period.
High Availability Components
High availability is a system design protocol and associated implementation that
ensures a certain degree of operational continuity during a measurement period.
For stateful services, we
need to make sure that
data is made redundant.
It is not a replacement
for backups!
Data Redundancy
Some mechanism to
redirect traffic from the
failed server or
Datacenter to a working
one
Failover or Switchover
Solution
Availability of the
services needs to be
monitored, to take
action when there is a
failure or even to
prevent them
Monitoring and
Management
Some Dictionary
General Terms
• Single Point of Failure (SPOF)
– An element is a SPOF when its failure results in a full stop of the service as no other element
can take over (storage, WAN connection, replication channel)
– It is important to evaluate the costs for eliminating the SPOF, the likelihood that it may fail,
and the time required to bring it into service again
• Downtime
– the period of time a service is down. Planned and unplanned. Planned downtime is part of the
overall availability
• Shared vs. Local Storage
– Shared storage systems like SANs can provide built-in high availability, though this comes with
equally high costs
– Not really suitable for Disaster Recovery scenario on multiple Data Center
– Local storage comes with low cost but we need to implement ways for replication/mirroring
General Terms
• Switchover
– When a manual process is used to switch from one system to a redundant or standby system in
case of a failure
• Failover
– Automatic switchover, without human intervention
• Failback
– A (often-underestimated) task to handle the recovery of a failed system and how to fail-back to
the original system after recovery
Data
Redundancy
HA for MariaDB
Replication Scheme
All nodes are masters
and applications can read
and write from any node
Synchronous Replication
The Master does not
confirm transactions to
the client application until
at least one slave has
copied the change to its
relay log, and flushed it to
disk
Semi-Syncronous
Replication
The Master does not
wait for Slave, the
master writes events to
its binary log and
slaves request them
when they are ready
Asynchronous
Replication
HA Begins from Data Replication
• Replication enables data from one MariaDB server (the master) to be replicated to one
or more MariaDB servers (the slaves).
• MariaDB Replication is:
– very easy to setup
– used to scale out read workloads
– provide a first level of high availability and geographic redundancy
– offload backups and analytic jobs.
Asynchronous Replication
• MariaDB Replication is asynchronous by default.
• Slave determines how much to read and from which point in the binary log
• Slave can be behind master in reading and applying changes.
– Single threaded vs parallel replication
• If the master crashes, transactions might not have been transmitted to any slave
• Asynchronous replication is great for read scaling as adding more replicas does not
impact replication latency
Asynchronous Replication-Switch Over
1. The master server is taken down or we encounter a fault by our monitoring
2. The slave server is updated to the last position in the relay log
3. The clients point at the designated slave server
4. The designated slave server becomes the master server
5. All steps are manual
Master and Slaves
ReadOnly Slaves
Master and Slaves
ReadOnly Slaves
Async Replication Topologies
Master and Slaves
ReadOnly Slaves
Master with Relay Slave Circular Replication
Semi-synchronous Replication
• MariaDB supports semi-synchronous replication:
– the master does not confirm transactions to the client application until at least one slave has
copied the change to its relay log, and flushed it to disk.
– In semi-synchronous replication, only after the events have been written to the relay log and
flushed does the slave acknowledge receipt of a transaction's events
– Semi-synchronous is a practical solution for many cases where high availability and no data-loss
is important.
– When a commit returns successfully, it is known that the data exists in at least two places (on the
master and at least one slave).
– Semi- synchronous has a performance impact due to the additional round trip
MariaDB Enhanced Semi-synchronous Replication
• One or more slaves can be defined as working semi-synchronously.
• For these slaves, the master waits until the I/O thread on one or more of the semi-synch slaves
has flushed the transaction to disk.
• This ensures that all committed transactions are at least stored in the relay log of the slave.
• If no semi-synch slave can acknowledge the transaction, the master will downgrade to
asynchronous replication after waiting for a timeout period. Once a semi-synch slave
comes back online, the master will reset back to semi-synch replication.
Semi-synchronous Replication – Switch Over
• The steps for a failover are the same as when using the standard replication
• but in Step 2, a slave should be chosen among those (if many) that are be semi- synched
with the master
Master and Slaves
Semi-Sync
Slave
Async Slaves
Master and Slaves
Async Slaves
Semi-Sync Replication Topologies
• Semi- synchronous replication is used between master
and backup master
• Semi- sync replication has a performance impact, but the
risk for data loss is minimized.
• This topology works well when performing master
failover
– The backup master acts as a warm-standby server
– it has the highest probability of having up-to-date data if
compared to other slaves.
Semi_sync
Asynchronous
ReadOnly/
Backup Master
ReadOnly
MariaDB Multi-Source Replication
• It enables a slave to receive transactions from
multiple sources simultaneously.
• It can be used to backup multiple servers to a
single server, to merge table shards, and
consolidate data from multiple servers to a single
server.
Master 2Master 1 Master 3
Slave
Combining MariaDB Replication Features
• Replication features can be combined to form more
resilient configurations
• Example:
– Implement semi-sync circular replication to increase data
resilience
– Use GTID to avoid duplicate transactions
– Use read-only slaves for read scale out
– Use MaxScale:
• Transactions will go to active master
• Reads will be offloaded to slaves
• Fast failover
Semi_sync
Asynchronous
Backup Master
ReadOnly
Synchronous Replication (Galera)
• Galera Replication is a synchronous multi-master
replication plug-in that enables a true master-master
setup for InnoDB.
• Every component of the cluster (node) is a share
nothing server
• All nodes are masters and applications can read and
write from any node
• A minimal Galera cluster consists of 3 nodes:
– A proper cluster needs to reach a quorum (i.e. the
majority of the nodes of the cluster)
• Transactions are synchronously committed on all
nodes.
MariaDB
MariaDB
MariaDB
Synchronous Replication (Galera)
• PROS
– A high availability solution with synchronous
replication, failover and resynchronization
– No loss of data
– All servers have up-to-date data (no slave lag)
– Read scalability
– 'Pretty good' write scalability
MariaDB
MariaDB
MariaDB
Synchronous Replication (Galera)
• CONS
– It only supports InnoDB
– The transaction rollback rate and hence the
transaction latency, can increase with the number of
the cluster nodes
– The cluster performs as its least performing node
• an overloaded master affects the performance of
the Galera cluster
– Network latency affects transaction throughput
MariaDB
MariaDB
MariaDB
MaxScale for HA
MDBE
Cluster Failover
Clustered nodes cooperate
to remain in sync
With multiple master nodes,
reads and updates both scale*
Synchronous replication with
optimistic locking delivers high
availability with little overhead
Fast failover because all
nodes remains synchronizedMariaDB
MariaDB
MariaDB
Load Balancing
and Failover
Application /
App Server
MaxScale Use Case
Master/Slaves Async
Replication
MaxScale monitors a MariaDB Topology
Master/Slaves + R/W split routing
Max
Scale
MariaDB
MaxScale Use Case
Master/Slaves Async
Replication
Master/Slaves + R/W split routing
Max
Scale
MariaDB
1
1 . Master failure
MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
Master/Slaves + R/W split routing
Max
Scale
MariaDB
script
Failover Script
master_down event
2
MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
3 . In case it is configured, MaxScale launches a
Failover Script that promotes a slave as a new
Master
Master/Slaves + R/W split routing
Max
Scale
MariaDB
script
Failover Script
master_down event
2
Promote as master3
MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
3 . In case it is configured, MaxScale launches a
Failover Script that promotes a slave as a new
Master
Master/Slaves + R/W split routing
Max
Scale
MariaDB
script
Failover Script
master_down event
2
Promote as master3
MaxScale Use Case
Master/Slaves Async
Replication
1 . Master failure
2 . MaxScale Monitor detects the master_down
event
3 . In case it is configured, MaxScale launches a
Failover Script that promotes a slave as a new
Master
4 . MaxScale monitor automatically detects new
replication topology after the switch
Master/Slaves + R/W split routing
Max
Scale
MariaDB
2
4
MaxScale Use Case
MDBE Cluster
Synchronous Replication
Each application server
uses only 1 connection
MaxScale selects one node
as “master” and the other
nodes as “slaves”
If the “master” node fails,
a new one can be elected
immediately
Galera Cluster + R/W split routing
Max
Scale
MariaDB HA: MaxScale
• Re-route traffic between
master and slave(s)
• Does not manage servers
• Failover / slave promotion
is an external process
• Implemented for Booking.com
• Part of MaxScale release
• All slaves are in sync,
easy to promote any slave
tter Detects Active Master
Binary Log Server
HA / Scalability with MaxScale 2.1
Existing in MaxScale 2.0
New in MaxScale 2.1
Aurora
Cluster Monitor
Multi-master and
Failover Mode for
MySQL Monitor
Read-write
Splitting with
Master Pinning
Transaction Scaling to support user
growth and simplify applications
MariaDB Master/Slave and MariaDB Galera Cluster
– Load balancing
– Database aware dynamic query routing
– Traffic profile based routing
Replication Scaling to support
web-scale applications’ user base
Binlog Server for horizontal scaling of slaves in Master/Slave architecture
Multi-tenant database scaling to
transparently grow tenants and data volume
Schema sharding
Connection Rate Limitation
Thank you
Vielen Dank!
Thank you
Speaker’s Name
Speaker’s email

More Related Content

PDF
Best Practice for Achieving High Availability in MariaDB
PDF
Using all of the high availability options in MariaDB
PDF
MySQL Parallel Replication: inventory, use-case and limitations
PDF
MariaDB MaxScale
PDF
Galera cluster for high availability
PDF
M|18 Architectural Overview: MariaDB MaxScale
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
How to Manage Scale-Out Environments with MariaDB MaxScale
Best Practice for Achieving High Availability in MariaDB
Using all of the high availability options in MariaDB
MySQL Parallel Replication: inventory, use-case and limitations
MariaDB MaxScale
Galera cluster for high availability
M|18 Architectural Overview: MariaDB MaxScale
The Full MySQL and MariaDB Parallel Replication Tutorial
How to Manage Scale-Out Environments with MariaDB MaxScale

What's hot (20)

PDF
AWS 환경에서 MySQL BMT
PDF
MariaDB Performance Tuning and Optimization
PPTX
My sql failover test using orchestrator
PDF
MariaDB: in-depth (hands on training in Seoul)
PDF
MySQL/MariaDB Proxy Software Test
PDF
Maxscale_메뉴얼
PDF
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
PDF
ProxySQL High Availability (Clustering)
PPTX
Maxscale 소개 1.1.1
PDF
Intro ProxySQL
PDF
Automated master failover
PPTX
Getting the most out of MariaDB MaxScale
PDF
[2018] MySQL 이중화 진화기
DOCX
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
PDF
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
PDF
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
PDF
ProxySQL High Avalability and Configuration Management Overview
PDF
How to set up orchestrator to manage thousands of MySQL servers
PDF
MariaDB MaxScale monitor 매뉴얼
PPTX
ProxySQL for MySQL
AWS 환경에서 MySQL BMT
MariaDB Performance Tuning and Optimization
My sql failover test using orchestrator
MariaDB: in-depth (hands on training in Seoul)
MySQL/MariaDB Proxy Software Test
Maxscale_메뉴얼
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
ProxySQL High Availability (Clustering)
Maxscale 소개 1.1.1
Intro ProxySQL
Automated master failover
Getting the most out of MariaDB MaxScale
[2018] MySQL 이중화 진화기
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Oracle Active Data Guard 12c: Far Sync Instance, Real-Time Cascade and Other ...
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
ProxySQL High Avalability and Configuration Management Overview
How to set up orchestrator to manage thousands of MySQL servers
MariaDB MaxScale monitor 매뉴얼
ProxySQL for MySQL
Ad

Similar to MariaDB High Availability (20)

PDF
Choosing the right high availability strategy
PDF
Choosing the right high availability strategy
PDF
MariaDB High Availability Webinar
PDF
02 2017 emea_roadshow_milan_ha
PDF
M|18 Choosing the Right High Availability Strategy for You
PDF
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
PPTX
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
PPTX
Running MariaDB in multiple data centers
PDF
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
PDF
Webinar Slides: Geo-Distributed MySQL Clustering Done Right!
ODP
MySQL 5.7 clustering: The developer perspective
PDF
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
PPTX
Megastore by Google
PPT
Using galera replication to create geo distributed clusters on the wan
PDF
Netherlands Tech Tour 02 - MySQL Fabric
PDF
Using galera replication to create geo distributed clusters on the wan
PDF
Using galera replication to create geo distributed clusters on the wan
PDF
MariaDB Galera Cluster webinar — 2025 Edition.pdf
PDF
Percona Live 2014 - Scaling MySQL in AWS
PPTX
Maria DB Galera Cluster for High Availability
Choosing the right high availability strategy
Choosing the right high availability strategy
MariaDB High Availability Webinar
02 2017 emea_roadshow_milan_ha
M|18 Choosing the Right High Availability Strategy for You
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Hochverfügbarkeit mit MariaDB Enterprise - MariaDB Roadshow Summer 2014 Hambu...
Running MariaDB in multiple data centers
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: Geo-Distributed MySQL Clustering Done Right!
MySQL 5.7 clustering: The developer perspective
SAP HANA System Replication (HSR) versus SAP Replication Server (SRS)
Megastore by Google
Using galera replication to create geo distributed clusters on the wan
Netherlands Tech Tour 02 - MySQL Fabric
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
MariaDB Galera Cluster webinar — 2025 Edition.pdf
Percona Live 2014 - Scaling MySQL in AWS
Maria DB Galera Cluster for High Availability
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
PDF
MariaDB München Roadshow - 24 September, 2024
PDF
MariaDB Paris Roadshow - 19 September 2024
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
PDF
MariaDB Paris Workshop 2023 - Newpharma
PDF
MariaDB Paris Workshop 2023 - Cloud
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
PDF
MariaDB Paris Workshop 2023 - MaxScale
PDF
MariaDB Paris Workshop 2023 - novadys presentation
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
PDF
Hochverfügbarkeitslösungen mit MariaDB
PDF
Die Neuheiten in MariaDB Enterprise Server
PDF
Global Data Replication with Galera for Ansell Guardian®
PDF
Introducing workload analysis
PDF
Under the hood: SkySQL monitoring
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB München Roadshow - 24 September, 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
Einführung : MariaDB Tech und Business Update Hamburg 2023
Hochverfügbarkeitslösungen mit MariaDB
Die Neuheiten in MariaDB Enterprise Server
Global Data Replication with Galera for Ansell Guardian®
Introducing workload analysis
Under the hood: SkySQL monitoring

Recently uploaded (20)

PPTX
ai tools demonstartion for schools and inter college
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
System and Network Administration Chapter 2
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPT
Introduction Database Management System for Course Database
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
history of c programming in notes for students .pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
System and Network Administraation Chapter 3
ai tools demonstartion for schools and inter college
ManageIQ - Sprint 268 Review - Slide Deck
System and Network Administration Chapter 2
Design an Analysis of Algorithms I-SECS-1021-03
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Odoo POS Development Services by CandidRoot Solutions
Which alternative to Crystal Reports is best for small or large businesses.pdf
Softaken Excel to vCard Converter Software.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
How to Migrate SBCGlobal Email to Yahoo Easily
Introduction Database Management System for Course Database
Understanding Forklifts - TECH EHS Solution
CHAPTER 2 - PM Management and IT Context
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
VVF-Customer-Presentation2025-Ver1.9.pptx
Design an Analysis of Algorithms II-SECS-1021-03
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
history of c programming in notes for students .pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
System and Network Administraation Chapter 3

MariaDB High Availability

  • 2. MariaDB MaxScaleMariaDB Multi-Master Cluster O P E R A T I N G S Y S T E M / F I L E S Y S T E M / S A N / C L O U D Application Connectors MariaDB Server NoSQL CRUD API Original Core MariaDB MariaDB Engineering Community Contribution Replicas Supporting Asynchronous, Semi-Sync & Synchronous replication MariaDB C JDBC ODBC Replication Kernel Production Plugins Parallel Slave GTIDBinLog API Multi-Source SQL Parser Cache/Buffer Optimiser Connection Pool Temporal PL/SQL Audit AWS KMS Authentication Handler Socket Etc. 40+ Plugins SQL Lightweight Transactional Interoperability Performance & Scalability Graph & SearchAnalytics InnoDB XtraDBAria Memory MyISAM CONNECTColumnStoreSpider OQGRAPH MyRocks Mroonga STORAGE LAYER EXTENSIBILITY KERNEL EXTENSIBILITY
  • 3. High Availability Defined In information technology, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability – Wikipedia up time / total time
  • 4. Approach to HA 3.7 days / year Backup / Restore 1 < 99.9% 52.6 min / year Replication / Automatic failover 3 ~ 99.99% 8.8hs / year Simple replication / manual failover 2 ~ 99.9% 5.3 min / year Galera Cluster ~ 99.999% 4 5 Other Strategies for High Availability
  • 5. An average of 80 percent of mission-critical application service downtime is directly caused by people or process failures. The other 20 percent is caused by technology failure, environmental failure or a disaster Gartner Research
  • 6. High Availability Background • High Availability isn’t always equal to long Uptime – A system is “up” but it might not be accessible – A system that is “down” just once, but for a long time, is NOT highly available • High Availability rather means – Long Mean Time Between Failures (MTBF) – Short Mean Time To Recover (MTTR) • High availability is: – a system design protocol and associated implementation that ensures a certain degree of operational continuity during a given measurement period.
  • 7. High Availability Components High availability is a system design protocol and associated implementation that ensures a certain degree of operational continuity during a measurement period. For stateful services, we need to make sure that data is made redundant. It is not a replacement for backups! Data Redundancy Some mechanism to redirect traffic from the failed server or Datacenter to a working one Failover or Switchover Solution Availability of the services needs to be monitored, to take action when there is a failure or even to prevent them Monitoring and Management
  • 9. General Terms • Single Point of Failure (SPOF) – An element is a SPOF when its failure results in a full stop of the service as no other element can take over (storage, WAN connection, replication channel) – It is important to evaluate the costs for eliminating the SPOF, the likelihood that it may fail, and the time required to bring it into service again • Downtime – the period of time a service is down. Planned and unplanned. Planned downtime is part of the overall availability • Shared vs. Local Storage – Shared storage systems like SANs can provide built-in high availability, though this comes with equally high costs – Not really suitable for Disaster Recovery scenario on multiple Data Center – Local storage comes with low cost but we need to implement ways for replication/mirroring
  • 10. General Terms • Switchover – When a manual process is used to switch from one system to a redundant or standby system in case of a failure • Failover – Automatic switchover, without human intervention • Failback – A (often-underestimated) task to handle the recovery of a failed system and how to fail-back to the original system after recovery
  • 12. Replication Scheme All nodes are masters and applications can read and write from any node Synchronous Replication The Master does not confirm transactions to the client application until at least one slave has copied the change to its relay log, and flushed it to disk Semi-Syncronous Replication The Master does not wait for Slave, the master writes events to its binary log and slaves request them when they are ready Asynchronous Replication
  • 13. HA Begins from Data Replication • Replication enables data from one MariaDB server (the master) to be replicated to one or more MariaDB servers (the slaves). • MariaDB Replication is: – very easy to setup – used to scale out read workloads – provide a first level of high availability and geographic redundancy – offload backups and analytic jobs.
  • 14. Asynchronous Replication • MariaDB Replication is asynchronous by default. • Slave determines how much to read and from which point in the binary log • Slave can be behind master in reading and applying changes. – Single threaded vs parallel replication • If the master crashes, transactions might not have been transmitted to any slave • Asynchronous replication is great for read scaling as adding more replicas does not impact replication latency
  • 15. Asynchronous Replication-Switch Over 1. The master server is taken down or we encounter a fault by our monitoring 2. The slave server is updated to the last position in the relay log 3. The clients point at the designated slave server 4. The designated slave server becomes the master server 5. All steps are manual Master and Slaves ReadOnly Slaves Master and Slaves ReadOnly Slaves
  • 16. Async Replication Topologies Master and Slaves ReadOnly Slaves Master with Relay Slave Circular Replication
  • 17. Semi-synchronous Replication • MariaDB supports semi-synchronous replication: – the master does not confirm transactions to the client application until at least one slave has copied the change to its relay log, and flushed it to disk. – In semi-synchronous replication, only after the events have been written to the relay log and flushed does the slave acknowledge receipt of a transaction's events – Semi-synchronous is a practical solution for many cases where high availability and no data-loss is important. – When a commit returns successfully, it is known that the data exists in at least two places (on the master and at least one slave). – Semi- synchronous has a performance impact due to the additional round trip
  • 18. MariaDB Enhanced Semi-synchronous Replication • One or more slaves can be defined as working semi-synchronously. • For these slaves, the master waits until the I/O thread on one or more of the semi-synch slaves has flushed the transaction to disk. • This ensures that all committed transactions are at least stored in the relay log of the slave. • If no semi-synch slave can acknowledge the transaction, the master will downgrade to asynchronous replication after waiting for a timeout period. Once a semi-synch slave comes back online, the master will reset back to semi-synch replication.
  • 19. Semi-synchronous Replication – Switch Over • The steps for a failover are the same as when using the standard replication • but in Step 2, a slave should be chosen among those (if many) that are be semi- synched with the master Master and Slaves Semi-Sync Slave Async Slaves Master and Slaves Async Slaves
  • 20. Semi-Sync Replication Topologies • Semi- synchronous replication is used between master and backup master • Semi- sync replication has a performance impact, but the risk for data loss is minimized. • This topology works well when performing master failover – The backup master acts as a warm-standby server – it has the highest probability of having up-to-date data if compared to other slaves. Semi_sync Asynchronous ReadOnly/ Backup Master ReadOnly
  • 21. MariaDB Multi-Source Replication • It enables a slave to receive transactions from multiple sources simultaneously. • It can be used to backup multiple servers to a single server, to merge table shards, and consolidate data from multiple servers to a single server. Master 2Master 1 Master 3 Slave
  • 22. Combining MariaDB Replication Features • Replication features can be combined to form more resilient configurations • Example: – Implement semi-sync circular replication to increase data resilience – Use GTID to avoid duplicate transactions – Use read-only slaves for read scale out – Use MaxScale: • Transactions will go to active master • Reads will be offloaded to slaves • Fast failover Semi_sync Asynchronous Backup Master ReadOnly
  • 23. Synchronous Replication (Galera) • Galera Replication is a synchronous multi-master replication plug-in that enables a true master-master setup for InnoDB. • Every component of the cluster (node) is a share nothing server • All nodes are masters and applications can read and write from any node • A minimal Galera cluster consists of 3 nodes: – A proper cluster needs to reach a quorum (i.e. the majority of the nodes of the cluster) • Transactions are synchronously committed on all nodes. MariaDB MariaDB MariaDB
  • 24. Synchronous Replication (Galera) • PROS – A high availability solution with synchronous replication, failover and resynchronization – No loss of data – All servers have up-to-date data (no slave lag) – Read scalability – 'Pretty good' write scalability MariaDB MariaDB MariaDB
  • 25. Synchronous Replication (Galera) • CONS – It only supports InnoDB – The transaction rollback rate and hence the transaction latency, can increase with the number of the cluster nodes – The cluster performs as its least performing node • an overloaded master affects the performance of the Galera cluster – Network latency affects transaction throughput MariaDB MariaDB MariaDB
  • 27. MDBE Cluster Failover Clustered nodes cooperate to remain in sync With multiple master nodes, reads and updates both scale* Synchronous replication with optimistic locking delivers high availability with little overhead Fast failover because all nodes remains synchronizedMariaDB MariaDB MariaDB Load Balancing and Failover Application / App Server
  • 28. MaxScale Use Case Master/Slaves Async Replication MaxScale monitors a MariaDB Topology Master/Slaves + R/W split routing Max Scale MariaDB
  • 29. MaxScale Use Case Master/Slaves Async Replication Master/Slaves + R/W split routing Max Scale MariaDB 1 1 . Master failure
  • 30. MaxScale Use Case Master/Slaves Async Replication 1 . Master failure 2 . MaxScale Monitor detects the master_down event Master/Slaves + R/W split routing Max Scale MariaDB script Failover Script master_down event 2
  • 31. MaxScale Use Case Master/Slaves Async Replication 1 . Master failure 2 . MaxScale Monitor detects the master_down event 3 . In case it is configured, MaxScale launches a Failover Script that promotes a slave as a new Master Master/Slaves + R/W split routing Max Scale MariaDB script Failover Script master_down event 2 Promote as master3
  • 32. MaxScale Use Case Master/Slaves Async Replication 1 . Master failure 2 . MaxScale Monitor detects the master_down event 3 . In case it is configured, MaxScale launches a Failover Script that promotes a slave as a new Master Master/Slaves + R/W split routing Max Scale MariaDB script Failover Script master_down event 2 Promote as master3
  • 33. MaxScale Use Case Master/Slaves Async Replication 1 . Master failure 2 . MaxScale Monitor detects the master_down event 3 . In case it is configured, MaxScale launches a Failover Script that promotes a slave as a new Master 4 . MaxScale monitor automatically detects new replication topology after the switch Master/Slaves + R/W split routing Max Scale MariaDB 2 4
  • 34. MaxScale Use Case MDBE Cluster Synchronous Replication Each application server uses only 1 connection MaxScale selects one node as “master” and the other nodes as “slaves” If the “master” node fails, a new one can be elected immediately Galera Cluster + R/W split routing Max Scale
  • 35. MariaDB HA: MaxScale • Re-route traffic between master and slave(s) • Does not manage servers • Failover / slave promotion is an external process • Implemented for Booking.com • Part of MaxScale release • All slaves are in sync, easy to promote any slave tter Detects Active Master Binary Log Server
  • 36. HA / Scalability with MaxScale 2.1 Existing in MaxScale 2.0 New in MaxScale 2.1 Aurora Cluster Monitor Multi-master and Failover Mode for MySQL Monitor Read-write Splitting with Master Pinning Transaction Scaling to support user growth and simplify applications MariaDB Master/Slave and MariaDB Galera Cluster – Load balancing – Database aware dynamic query routing – Traffic profile based routing Replication Scaling to support web-scale applications’ user base Binlog Server for horizontal scaling of slaves in Master/Slave architecture Multi-tenant database scaling to transparently grow tenants and data volume Schema sharding Connection Rate Limitation