SlideShare a Scribd company logo
Synchronous multi-master clusters
          with MySQL:
    an introduction to Galera

                         Henrik Ingo

               OUGF Harmony conference
                 Aulanko, 2012-05-31



              Please share and reuse this presentation
       licensed under Creative Commonse Attribution license
             http://creativecommons.org/licenses/by/3.0/
Agenda



* MySQL replication issues                              How does it perform?
* Galera internal architecture                          * In memory workload
* Installing and configuring it                         * Scale-out for writes - how is it
* Synchronous multi-master                              possible?
clustering, what does it mean?                          * Disk bound workload
* Load balancing and other                              * NDB shootout
options
* How network partitioning is
handled
* WAN replication

                                                        .
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                   2
So what is Galera all about?
Created by Codership Oy



●   Participated in 3 MySQL
    cluster developments, since
    2003
●   Started Galera work 2007
●   Galera is free, open source.
    Codership offers support and
    consulting
●   Percona XtraDB Cluster
    based on Galera, launched
    2012
●   Is (and can be) integrated into
    other MySQL and non-MySQL
    products
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                4
MySQL replication challenges




●   Asynchronous = You will lose data
●   MySQL 5.5 semi-sync:
    Better, but falls back to
    asynchronous when in trouble...
●   Single-threaded => slave lag => 50-                                            ?
    80% performance penalty
●   Master-slave: read-write splitting,
    failovers, diverged slaves
●   Low level: manually provision DB to
                                                                          Master       Slave
    slave, configure binlog positions, ...




Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                     5
So what about DRBD?




●   Synchronous, yay!
●   Cold standby
●   No scale-out                                                                    ?
    (But you can combine with
    MySQL replication...)
●   50% performance penalty
                                                                                             Cold
                                                                          Primary
●   (SAN based HA has                                                                      standby
    these same issues btw)
                                                                           Disk     DRBD    Disk



Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                           6
Galera in a nutshell



●   True multi-master:
    Read & write to any node
●   Synchronous replication
●   No slave lag
●   No integrity issues
●   No master-slave failovers or
    VIP needed
●   Multi-threaded slave, no
    performance penalty                                              Master   Master   Master
●   Automatic node provisioning
                                                                              Galera

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                      7
Ok, let's take a closer look...
A MySQL Galera cluster is...


                                                                         mysqldump      xtrabackup
       SHOW STATUS LIKE "wsrep%"
          SHOW VARIABLES ...                                               rsync           etc...

                                                                           Snapshot State Transfer




                                           Replication API
                      MySQL


                                                             Wsrep API
                                                                          Galera group comm library


                                                                                                              MySQL



                                          s
                                      tche
                                    Pa
            MyISAM        InnoDB
                                                                                                      MySQL


        http://www.codership.com/downloads/download-mysqlgalera
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                    9
Starting your first cluster



Lets assume we have nodes 10.0.0.1, 10.0.0.2 and 10.0.0.3.
On node 1, set this in my.cnf:
wsrep_cluster_address="gcomm://"
Then start the first node:
/etc/init.d/mysql start
Important! Now change this in my.cnf to point to any of the other nodes. Don't leave it with empty
gcomm string!
wsrep_cluster_address="gcomm://10.0.0.3"
                                                                                       Use one of these to observe SST:
On node 2, set this in my.cnf:                                                         ps aux|grep mysqldump
wsrep_cluster_address="gcomm://10.0.0.1"
                                                                                       ps aux|grep xtrabackup
And start it                                                                           ps aux|grep rsync
/etc/init.d/mysql start
Node 2 now connects to Node 1. Then do the same on Node 3:
wsrep_cluster_address="gcomm://10.0.0.1" # or .2
/etc/init.d/mysql start
Node 3 connects to node 1, after which it establishes communication with all nodes currently in the
cluster.
wsrep_cluster_address should be set to any one node that is currently part of the cluster. After
startup, it doesn't have any meaning, all nodes connect to all others all the time.
Tip: Galera outputs a lot of info to MySQL error log, especially when nodes join or leave the cluster.

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                         10
Checking that nodes are connected and ready



SHOW STATUS LIKE "wsrep%";
...
wsrep_local_state                                         4
wsrep_local_state_comment                                 Synced (6)
...                                                                       Increments when a node joins
wsrep_cluster_conf_id                                     54              or leaves the cluster.
wsrep_cluster_size                                        3       Nr of nodes connected.
wsrep_cluster_state_uuid                                  3108998a-67d4-11e1-...
wsrep_cluster_status                                      Primary
...
                     Primary or Non-Primary                               UUID generated when first
                     cluster component?                                   node is started, all nodes share
                                                                          same UUID. Identity of the
 Q: Can you tell me which nodes are connected to this                     clustered database.
 node/component?
 A: See MySQL error log.
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                          11
Your first query



# mysql -uroot -prootpass -h 10.0.0.1
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 11
Server version: 5.1.53 wsrep_0.8.0

mysql> create table t (k int primary key auto_increment, v blob);

mysql> show create table t;
| Table | Create Table |
| t | CREATE TABLE `t` (
`k` int(11) NOT NULL auto_increment,
`v` blob,
PRIMARY KEY (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
1 row in set (0.00 sec)

mysql> insert into t (k) values ( "first row");

mysql> insert into t (k) values ( "second row");

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                12
How it looks from another node



# mysql -uroot -prootpass -h 10.0.0.2
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 11
Server version: 5.1.53 wsrep_0.8.0

mysql> show create table t;
| Table | Create Table |
| t | CREATE TABLE `t` (
`k` int(11) NOT NULL auto_increment,
`v` blob,
PRIMARY KEY (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

mysql> select * from t;
| k | v |
| 1 | first row |                                                         Galera automatically sets
| 4 | second row |
                                                                          auto_increment_increment
                                                                          and auto_increment_offset


Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                      13
More interesting Galera (Wsrep) status vars



|   wsrep_local_send_queue                |   34                                            |
|   wsrep_local_send_queue_avg            |   0.824589                                      |
|   wsrep_local_recv_queue                |   30                                            |
|   wsrep_local_recv_queue_avg            |   1.415460                                      |
|   wsrep_flow_control_paused             |   0.790793                                      |
|   wsrep_flow_control_sent               |   9                                             |
|   wsrep_flow_control_recv               |   52                                            |
|   wsrep_cert_deps_distance              |   105.201550                                    |
|   wsrep_apply_oooe                      |   0.349014                                      |
|   wsrep_apply_oool                      |   0.012709                                      |
|   wsrep_apply_window                    |   2.714530                                      |


                                                        Fraction of time that Galera replication was
Tip: These variables reset every                        halted due to slave(s) overloaded. Bigger than
time you do SHOW VARIABLES.                             0.1 means you have a problem!
So do it once, wait 10 sec, then
do it again and read the second
values.

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                       14
Did you say parallel slave threads?


mysql> show variablesG
....
*************************** 296. row ***************************
Variable_name: wsrep_slave_threads
Value: 32

mysql> SHOW PROCESSLIST;
...
| 31 | system user | | NULL | Sleep | 32 | committed 933 | NULL
| 32 | system user | | NULL | Sleep | 21 | committed 944 | NULL

●   MySQL replication allows master to proceed and leaves slave lagging.
●   Galera cluster's are tightly coupled: Master throughput will degrade if any
    slave can't keep up.
●   For best throughput, use 4-64 slave threads. (Esp disk-bound workloads.)
●   Generic solution:
    –   works with any application, schema.
    –   Commit order is preserved.
    –   Also possible: Out-of-order-commits. Small additional benefit, unsafe for most apps.

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                     15
MySQL options with Galera friendly values



# (This must be substituted by wsrep_format)
binlog_format=ROW

# Currently only InnoDB storage engine is supported
default-storage-engine=innodb

# No need to sync to disk when replication is synchronous
sync_binlog=0
innodb_flush_log_at_trx_commit=0
innodb_doublewrite=0

# to avoid issues with 'bulk mode inserts' using autoinc
innodb_autoinc_lock_mode=2

# This is a must for paralell applying
innodb_locks_unsafe_for_binlog=1

# Query Cache is not supported with wsrep
query_cache_size=0
query_cache_type=0

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                16
Setting WSREP and Galera options



# Most settings belong to wsrep api = part of MySQL
#
# State Snapshot Transfer method
wsrep_sst_method=mysqldump
#
# SST authentication string. This will be used to send SST to joining
nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
wsrep_sst_auth=root:password

# Use of Galera library is opaque to MySQL. It is a "wsrep provider".
# Full path to wsrep provider library or 'none'
#wsrep_provider=none
wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so

# Provider specific configuration options
# Here we increase window size for a WAN setup
wsrep_provider_options="evs.send_window=512; evs.user_send_window=512;"


Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                 17
Synchronous Multi-Master Clustering
Multi-master = you can write to any node & no failovers needed



What do you mean no
failover???
●   Use a load balancer
●   Application sees just one IP                                              LB

●   Write to any available
    node, round-robin
●   If node fails, just write to
    another one
                                                                    MySQL   MySQL    MySQL
●   What if load balancer fails?
    -> Turtles all the way down                                             Galera

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                               19
Protip: JDBC, PHP come with built-in load balancing!




●   No Single Point of Failure                                              LB        LB

●   One less layer of network
    components
●   Is aware of MySQL
    transaction states and
    errors
jdbc:mysql:loadbalance://
10.0.0.1,10.0.0.2,10.0.0.3
/<database>?
loadBalanceBlacklistTimeout=5000
                                                                    MySQL    MySQL         MySQL


                                                                             Galera

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                     20
Load balancer per each app node




●   Also no Single Point of
                                                                            LB        LB
    Failure
●   LB is an additional layer,
    but localhost = pretty fast
●   Need to manage more
    load balancers
●   Good for languages other
    than Java, PHP                                                  MySQL    MySQL         MySQL


                                                                             Galera

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                     21
You can still do VIP based failovers. But why?
                => You can run a 2 node Galera cluster like this.




                                                      failover

                      VIP                                                    VIP
                                 Clustering                                          Clustering
                                 framework                                           framework




   MySQL            MySQL          MySQL                            MySQL   MySQL     MySQL


                    Galera                                                  Galera

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                  22
Understanding the transaction sequence in Galera



                              Master                                       Slave
                                                                                      Optimistic
             BEGIN
                                                                                      locking
            SELECT                                                                    between
                                                                                      nodes
            UPDATE User transaction                                                   =
                                                                                      Risk for
            COMMIT                               Group                                deadlocks
                                                 communication
Commit                     Certification           => GTID            Certification   Certification =
delay                    COMMIT ROLLB                               COMMIT discard
                                                                                      deterministic

                      InnoDB                                           Apply          Virtual
              commit commit                                                           synchrony
               return                                                InnoDB           =
                                                                     commit           Committed
                                                                                      events
                                                                                      written to
                                                                                      InnoDB
                                                                                      after small
 Synchronous multi-master clusters with MySQL: an introduction to Galera              delay
 2012-05-31                                                                                    23
How network partitioning is handled
               aka
   How split brain is prevented
Preventing split brain



●   If part of the cluster can't be
    reached, it means
                                                                          LB        LB
    –   The node(s) has crashed
    –   Nodes are fine and it's a network
        connectivity issue
        = network partition
    –   A node cannot know which of the two
        has happened
    –   Network partition may lead to split
        brain if both parts continue to commit
        transactions.
●   Split brain leads to 2 diverging
    clusters, 2 diverged datasets                                MySQL     MySQL         MySQL
●   Clustering SW must ensure there
    is only 1 cluster partition active at                                      Galera
    all times
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                   25
Quorum



●   Galera uses quorum
    based failure handling:                                               LB        LB
    –   When cluster partitioning is
        detected, the majority
        partition "has quorum" and
        can continue
    –   A minority partition cannot
        commit transactions, but
        will attempt to re-connect to
        primary partition
                                                                 MySQL     MySQL         MySQL
●   A load balancer will notice
    the errors and remove
                                                                               Galera
    failed node from its pool
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                   26
What is majority?



●   50% is not majority
●   Any failure in 2 node cluster
                                                               MySQL           MySQL
    = both nodes must stop
●   4 node cluster split in half =
    both halves must stop
●   pc.ignore_sb exists but don't
    use it
●   You can
    manually/automatically
    enable one half by setting                          MySQL        MySQL   MySQL   MySQL
    wsrep_cluster_address
●   Use 3, 5, 7... nodes
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                              27
WAN replication
WAN replication




●   Works fine
●   Use higher timeouts and
    send windows                                                          MySQL           MySQL


●   No impact on reads
●   No impact within a
    transaction                                                                   MySQL

●   adds 100-300 ms to
    commit latency


Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                        29
WAN with MySQL asynchronous replication



●   You can mix Galera replication and
    MySQL replication
                                                                 MySQL MySQL                    MySQL
    –   But it can give you a headache :-)
●   Good option on slow WAN link
                                                                      MySQL                   MySQL MySQL
    (China, Australia)
●   You'll possibly need more nodes
    than in pure Galera cluster
●   Remember to watch out for slave
    lag, etc...
                                                                                 MySQL
●   If binlog position is lost (e.g. due to
    node crash) must reprovision whole
                                                                              MySQL   MySQL
    cluster.
●   Mixed replication also useful when
    you want an asynchronous slave
    (such as time-delayed, or filtered).
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                         30
Failures & WAN replication



                                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS
●   Q: What does 50% rule
    mean when I have 2
    datacenters?                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS




                                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS




                                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS




Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                              31
Pop Quiz



                                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS
●   Q: What does 50% rule
    mean when I have 2
    datacenters?                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS




                                                       DBMS     DBMS DBMS   DBMS   DBMS DBMS




●   A: Better have 3 data                              DBMS     DBMS DBMS   DBMS   DBMS DBMS

    centers too.

Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                              32
WAN replication with uneven node distribution



●   Q: What does 50% rule mean when you have uneven
    amount of nodes per data center?

                                          DBMS     DBMS DBMS




                    DBMS                                                  DBMS




Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                       33
WAN replication with uneven node distribution



●   Q: What does 50% rule mean when you have uneven
    amount of nodes per data center?

                                          DBMS     DBMS DBMS




                    DBMS                                                  DBMS




●   A: Better distribute nodes evenly.
       (We will address this in future release.)
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                       34
Benchmarks!
Baseline: Single node MySQL (sysbench oltp, in-memory)




●   Red, Blue: Constrained by InnoDB group commit bug
    –   Fixed in Percona Server 5.5, MariaDB 5.3 and MySQL 5.6
●   Brown: InnoDB syncs, binlog doesn't
●   Green: No InnoDB syncing either
●   Yellow: No InnoDB syncs, Galera wsrep module enabled
                                  http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                    36
3 node Galera cluster (sysbench oltp, in memory)




                                  http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                    37
Comments on 3 node cluster (sysbench oltp, in memory)




●   Yellow, Red are equal
    -> No overhead or bottleneck from Galera replication!
●   Green, Brown = writing to 2 and 3 masters
    -> scale-out for read-write workload!
     –   top shows 700% CPU util (8 cores)
                                  http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                    38
Sysbench disk bound (20GB data / 6GB buffer), tps



●   EC2 w local disk
    –   Note: pretty poor I/O here
●   Blue vs red: turning off
    innodb_flush_log_at_trx
    _commit gives 66%
    improvement
●   Scale-out factors:
    2N = 0.5 x 1N
    4N = 0.5 x 2N
●   5th node was EC2
    weakness. Later test
    scaled a little more up to
    8 nodes
                                            http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                  39
Sysbench disk bound (20GB data / 6GB buffer), latency



●   As before
●   Not syncing InnoDB
    decreases latency
●   Scale-out decreases
    latency
●   Galera does not add
    latency overhead




                                            http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                  40
Galera and NDB shootout: sysbench "out of the box"



●
    Galera is 4x better
Ok, so what does this
really mean?
●   That Galera is better...
    –   For this workload
    –   With default settings
        (Severalnines)
    –   Pretty user friendly and
        general purpose
●   NDB
    –   Excels at key-value and
        heavy-write workloads (which
        sysbench is not)
    –   Would benefit here from
        PARTITION BY RANGE
                                                              http://codership.com/content/whats-difference-kenneth
Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                                    41
Summary


Many MySQL replication idioms go away:                  Compared to other alternatives:
synchronous, multi-master, no slave lag, no
binlog positions, automatic node                        Galera is better than MySQL replication:
provisioning.                                           - synchronous
                                                        - parallel slave
Many LB and VIP architectures possible,                 - multi-master
JDBC/PHP load balancer is best.                         - easier
Also for WAN replication. Adds 100-300 ms               Galera is better than DRBD:
to commit.                                              - no performance penalty
                                                        - active/active
Quorum based: Majority partition wins.                  - easier
Minimum 3 nodes. Minimum 3 data centers.                Galera is better than NDB:
Great benchmark results: more                           - based on good old InnoDB
performance, not less, with replication!                - more general purpose
                                                        - easier
Galera is good where InnoDB is good:
general purpose, easy to use HA cluster



Synchronous multi-master clusters with MySQL: an introduction to Galera
2012-05-31                                                                                         42
Questions?




Thank you for listening!
 Happy Clustering :-)

More Related Content

PDF
Galera cluster for high availability
PPT
腾讯大讲堂06 qq邮箱性能优化
PDF
MariaDB Galera Cluster - Simple, Transparent, Highly Available
PDF
MongoDB performance
PPTX
Disaster Recovery Planning for MySQL & MariaDB
PDF
MariaDB Galera Cluster presentation
PPT
Galera Cluster Best Practices for DBA's and DevOps Part 1
PPTX
A Technical Introduction to WiredTiger
Galera cluster for high availability
腾讯大讲堂06 qq邮箱性能优化
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MongoDB performance
Disaster Recovery Planning for MySQL & MariaDB
MariaDB Galera Cluster presentation
Galera Cluster Best Practices for DBA's and DevOps Part 1
A Technical Introduction to WiredTiger

What's hot (20)

PDF
Evolution of MySQL Parallel Replication
PDF
Galera cluster for MySQL - Introduction Slides
PPTX
PostgreSQL and Linux Containers
PDF
Maxscale switchover, failover, and auto rejoin
PDF
Introduction to Galera Cluster
PDF
Parallel Replication in MySQL and MariaDB
PPTX
Maria db 이중화구성_고민하기
PDF
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
PPT
Megastore: Providing scalable and highly available storage
PDF
Transactions and Concurrency Control Patterns - 2019
PDF
Using all of the high availability options in MariaDB
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PPTX
Shareplex Presentation
PDF
Best Practice for Achieving High Availability in MariaDB
PPTX
Deploying MariaDB databases with containers at Nokia Networks
PDF
Oracle 索引介紹
PDF
MariaDB 10.5 binary install (바이너리 설치)
PDF
Mysteries of the binary log
PDF
MariaDB Performance Tuning and Optimization
Evolution of MySQL Parallel Replication
Galera cluster for MySQL - Introduction Slides
PostgreSQL and Linux Containers
Maxscale switchover, failover, and auto rejoin
Introduction to Galera Cluster
Parallel Replication in MySQL and MariaDB
Maria db 이중화구성_고민하기
Introduction to Nebula Graph, an Open-Source Distributed Graph Database
Building High-Throughput, Low-Latency Pipelines in Kafka
Megastore: Providing scalable and highly available storage
Transactions and Concurrency Control Patterns - 2019
Using all of the high availability options in MariaDB
From cache to in-memory data grid. Introduction to Hazelcast.
Shareplex Presentation
Best Practice for Achieving High Availability in MariaDB
Deploying MariaDB databases with containers at Nokia Networks
Oracle 索引介紹
MariaDB 10.5 binary install (바이너리 설치)
Mysteries of the binary log
MariaDB Performance Tuning and Optimization
Ad

Similar to Introduction to Galera (20)

ODP
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
PDF
MySQL High Availability Solutions
PDF
Mysqlhacodebits20091203 1260184765-phpapp02
PDF
MySQL High Availability Solutions
DOCX
Master master vs master-slave database
ODP
MySQL Group Replication
PDF
Drupal Con My Sql Ha 2008 08 29
PPTX
Introduction to XtraDB Cluster
PDF
Using and Benchmarking Galera in different architectures (PLUK 2012)
ODP
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
ODP
Sharding and scale out
PDF
Scaling MySQL -- Swanseacon.co.uk
PDF
Percona XtraDB Cluster ( Ensure high Availability )
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
ODP
MySQL 5.7 clustering: The developer perspective
PPTX
ConFoo MySQL Replication Evolution : From Simple to Group Replication
PDF
OSDC 2016 - MySQL-Server in Teamwork - Replication and Galera Cluster by Jörg...
PDF
MySQL Replication Basics -Ohio Linux Fest 2016
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL High Availability Solutions
Mysqlhacodebits20091203 1260184765-phpapp02
MySQL High Availability Solutions
Master master vs master-slave database
MySQL Group Replication
Drupal Con My Sql Ha 2008 08 29
Introduction to XtraDB Cluster
Using and Benchmarking Galera in different architectures (PLUK 2012)
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
Sharding and scale out
Scaling MySQL -- Swanseacon.co.uk
Percona XtraDB Cluster ( Ensure high Availability )
The Full MySQL and MariaDB Parallel Replication Tutorial
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
MySQL 5.7 clustering: The developer perspective
ConFoo MySQL Replication Evolution : From Simple to Group Replication
OSDC 2016 - MySQL-Server in Teamwork - Replication and Galera Cluster by Jörg...
MySQL Replication Basics -Ohio Linux Fest 2016
Ad

More from Henrik Ingo (17)

PDF
ICPE25 Henrik Ingo Optimizing Hunter Nyrkiö slides (1).pdf
PDF
SPEC June 2025 - Using e-divisive means change detection in continuous benchm...
PDF
Introduction to new high performance storage engines in mongodb 3.0
PDF
Meteor - The next generation software stack
PDF
MongoDB for Oracle Experts - OUGF Harmony 2014
PDF
Building Your First MongoDB App
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
PDF
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
PDF
Failover or not to failover
PDF
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
PDF
Introducing Xtrabackup Manager
PDF
Froscon 2012 how big corporations play the open source game
PDF
Databases and the Cloud
PDF
Fixed in drizzle
PDF
Choosing a MySQL High Availability solution - Percona Live UK 2011
PDF
Froscon2011: How i learned to use sql and then learned not to use it
PDF
How to grow your open source project 10x and revenues 5x OSCON2011
ICPE25 Henrik Ingo Optimizing Hunter Nyrkiö slides (1).pdf
SPEC June 2025 - Using e-divisive means change detection in continuous benchm...
Introduction to new high performance storage engines in mongodb 3.0
Meteor - The next generation software stack
MongoDB for Oracle Experts - OUGF Harmony 2014
Building Your First MongoDB App
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Failover or not to failover
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Introducing Xtrabackup Manager
Froscon 2012 how big corporations play the open source game
Databases and the Cloud
Fixed in drizzle
Choosing a MySQL High Availability solution - Percona Live UK 2011
Froscon2011: How i learned to use sql and then learned not to use it
How to grow your open source project 10x and revenues 5x OSCON2011

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Modernizing your data center with Dell and AMD
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf

Introduction to Galera

  • 1. Synchronous multi-master clusters with MySQL: an introduction to Galera Henrik Ingo OUGF Harmony conference Aulanko, 2012-05-31 Please share and reuse this presentation licensed under Creative Commonse Attribution license http://creativecommons.org/licenses/by/3.0/
  • 2. Agenda * MySQL replication issues How does it perform? * Galera internal architecture * In memory workload * Installing and configuring it * Scale-out for writes - how is it * Synchronous multi-master possible? clustering, what does it mean? * Disk bound workload * Load balancing and other * NDB shootout options * How network partitioning is handled * WAN replication . Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 2
  • 3. So what is Galera all about?
  • 4. Created by Codership Oy ● Participated in 3 MySQL cluster developments, since 2003 ● Started Galera work 2007 ● Galera is free, open source. Codership offers support and consulting ● Percona XtraDB Cluster based on Galera, launched 2012 ● Is (and can be) integrated into other MySQL and non-MySQL products Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 4
  • 5. MySQL replication challenges ● Asynchronous = You will lose data ● MySQL 5.5 semi-sync: Better, but falls back to asynchronous when in trouble... ● Single-threaded => slave lag => 50- ? 80% performance penalty ● Master-slave: read-write splitting, failovers, diverged slaves ● Low level: manually provision DB to Master Slave slave, configure binlog positions, ... Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 5
  • 6. So what about DRBD? ● Synchronous, yay! ● Cold standby ● No scale-out ? (But you can combine with MySQL replication...) ● 50% performance penalty Cold Primary ● (SAN based HA has standby these same issues btw) Disk DRBD Disk Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 6
  • 7. Galera in a nutshell ● True multi-master: Read & write to any node ● Synchronous replication ● No slave lag ● No integrity issues ● No master-slave failovers or VIP needed ● Multi-threaded slave, no performance penalty Master Master Master ● Automatic node provisioning Galera Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 7
  • 8. Ok, let's take a closer look...
  • 9. A MySQL Galera cluster is... mysqldump xtrabackup SHOW STATUS LIKE "wsrep%" SHOW VARIABLES ... rsync etc... Snapshot State Transfer Replication API MySQL Wsrep API Galera group comm library MySQL s tche Pa MyISAM InnoDB MySQL http://www.codership.com/downloads/download-mysqlgalera Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 9
  • 10. Starting your first cluster Lets assume we have nodes 10.0.0.1, 10.0.0.2 and 10.0.0.3. On node 1, set this in my.cnf: wsrep_cluster_address="gcomm://" Then start the first node: /etc/init.d/mysql start Important! Now change this in my.cnf to point to any of the other nodes. Don't leave it with empty gcomm string! wsrep_cluster_address="gcomm://10.0.0.3" Use one of these to observe SST: On node 2, set this in my.cnf: ps aux|grep mysqldump wsrep_cluster_address="gcomm://10.0.0.1" ps aux|grep xtrabackup And start it ps aux|grep rsync /etc/init.d/mysql start Node 2 now connects to Node 1. Then do the same on Node 3: wsrep_cluster_address="gcomm://10.0.0.1" # or .2 /etc/init.d/mysql start Node 3 connects to node 1, after which it establishes communication with all nodes currently in the cluster. wsrep_cluster_address should be set to any one node that is currently part of the cluster. After startup, it doesn't have any meaning, all nodes connect to all others all the time. Tip: Galera outputs a lot of info to MySQL error log, especially when nodes join or leave the cluster. Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 10
  • 11. Checking that nodes are connected and ready SHOW STATUS LIKE "wsrep%"; ... wsrep_local_state 4 wsrep_local_state_comment Synced (6) ... Increments when a node joins wsrep_cluster_conf_id 54 or leaves the cluster. wsrep_cluster_size 3 Nr of nodes connected. wsrep_cluster_state_uuid 3108998a-67d4-11e1-... wsrep_cluster_status Primary ... Primary or Non-Primary UUID generated when first cluster component? node is started, all nodes share same UUID. Identity of the Q: Can you tell me which nodes are connected to this clustered database. node/component? A: See MySQL error log. Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 11
  • 12. Your first query # mysql -uroot -prootpass -h 10.0.0.1 Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 11 Server version: 5.1.53 wsrep_0.8.0 mysql> create table t (k int primary key auto_increment, v blob); mysql> show create table t; | Table | Create Table | | t | CREATE TABLE `t` ( `k` int(11) NOT NULL auto_increment, `v` blob, PRIMARY KEY (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | 1 row in set (0.00 sec) mysql> insert into t (k) values ( "first row"); mysql> insert into t (k) values ( "second row"); Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 12
  • 13. How it looks from another node # mysql -uroot -prootpass -h 10.0.0.2 Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 11 Server version: 5.1.53 wsrep_0.8.0 mysql> show create table t; | Table | Create Table | | t | CREATE TABLE `t` ( `k` int(11) NOT NULL auto_increment, `v` blob, PRIMARY KEY (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | mysql> select * from t; | k | v | | 1 | first row | Galera automatically sets | 4 | second row | auto_increment_increment and auto_increment_offset Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 13
  • 14. More interesting Galera (Wsrep) status vars | wsrep_local_send_queue | 34 | | wsrep_local_send_queue_avg | 0.824589 | | wsrep_local_recv_queue | 30 | | wsrep_local_recv_queue_avg | 1.415460 | | wsrep_flow_control_paused | 0.790793 | | wsrep_flow_control_sent | 9 | | wsrep_flow_control_recv | 52 | | wsrep_cert_deps_distance | 105.201550 | | wsrep_apply_oooe | 0.349014 | | wsrep_apply_oool | 0.012709 | | wsrep_apply_window | 2.714530 | Fraction of time that Galera replication was Tip: These variables reset every halted due to slave(s) overloaded. Bigger than time you do SHOW VARIABLES. 0.1 means you have a problem! So do it once, wait 10 sec, then do it again and read the second values. Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 14
  • 15. Did you say parallel slave threads? mysql> show variablesG .... *************************** 296. row *************************** Variable_name: wsrep_slave_threads Value: 32 mysql> SHOW PROCESSLIST; ... | 31 | system user | | NULL | Sleep | 32 | committed 933 | NULL | 32 | system user | | NULL | Sleep | 21 | committed 944 | NULL ● MySQL replication allows master to proceed and leaves slave lagging. ● Galera cluster's are tightly coupled: Master throughput will degrade if any slave can't keep up. ● For best throughput, use 4-64 slave threads. (Esp disk-bound workloads.) ● Generic solution: – works with any application, schema. – Commit order is preserved. – Also possible: Out-of-order-commits. Small additional benefit, unsafe for most apps. Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 15
  • 16. MySQL options with Galera friendly values # (This must be substituted by wsrep_format) binlog_format=ROW # Currently only InnoDB storage engine is supported default-storage-engine=innodb # No need to sync to disk when replication is synchronous sync_binlog=0 innodb_flush_log_at_trx_commit=0 innodb_doublewrite=0 # to avoid issues with 'bulk mode inserts' using autoinc innodb_autoinc_lock_mode=2 # This is a must for paralell applying innodb_locks_unsafe_for_binlog=1 # Query Cache is not supported with wsrep query_cache_size=0 query_cache_type=0 Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 16
  • 17. Setting WSREP and Galera options # Most settings belong to wsrep api = part of MySQL # # State Snapshot Transfer method wsrep_sst_method=mysqldump # # SST authentication string. This will be used to send SST to joining nodes. # Depends on SST method. For mysqldump method it is root:<root password> wsrep_sst_auth=root:password # Use of Galera library is opaque to MySQL. It is a "wsrep provider". # Full path to wsrep provider library or 'none' #wsrep_provider=none wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so # Provider specific configuration options # Here we increase window size for a WAN setup wsrep_provider_options="evs.send_window=512; evs.user_send_window=512;" Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 17
  • 19. Multi-master = you can write to any node & no failovers needed What do you mean no failover??? ● Use a load balancer ● Application sees just one IP LB ● Write to any available node, round-robin ● If node fails, just write to another one MySQL MySQL MySQL ● What if load balancer fails? -> Turtles all the way down Galera Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 19
  • 20. Protip: JDBC, PHP come with built-in load balancing! ● No Single Point of Failure LB LB ● One less layer of network components ● Is aware of MySQL transaction states and errors jdbc:mysql:loadbalance:// 10.0.0.1,10.0.0.2,10.0.0.3 /<database>? loadBalanceBlacklistTimeout=5000 MySQL MySQL MySQL Galera Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 20
  • 21. Load balancer per each app node ● Also no Single Point of LB LB Failure ● LB is an additional layer, but localhost = pretty fast ● Need to manage more load balancers ● Good for languages other than Java, PHP MySQL MySQL MySQL Galera Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 21
  • 22. You can still do VIP based failovers. But why? => You can run a 2 node Galera cluster like this. failover VIP VIP Clustering Clustering framework framework MySQL MySQL MySQL MySQL MySQL MySQL Galera Galera Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 22
  • 23. Understanding the transaction sequence in Galera Master Slave Optimistic BEGIN locking SELECT between nodes UPDATE User transaction = Risk for COMMIT Group deadlocks communication Commit Certification => GTID Certification Certification = delay COMMIT ROLLB COMMIT discard deterministic InnoDB Apply Virtual commit commit synchrony return InnoDB = commit Committed events written to InnoDB after small Synchronous multi-master clusters with MySQL: an introduction to Galera delay 2012-05-31 23
  • 24. How network partitioning is handled aka How split brain is prevented
  • 25. Preventing split brain ● If part of the cluster can't be reached, it means LB LB – The node(s) has crashed – Nodes are fine and it's a network connectivity issue = network partition – A node cannot know which of the two has happened – Network partition may lead to split brain if both parts continue to commit transactions. ● Split brain leads to 2 diverging clusters, 2 diverged datasets MySQL MySQL MySQL ● Clustering SW must ensure there is only 1 cluster partition active at Galera all times Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 25
  • 26. Quorum ● Galera uses quorum based failure handling: LB LB – When cluster partitioning is detected, the majority partition "has quorum" and can continue – A minority partition cannot commit transactions, but will attempt to re-connect to primary partition MySQL MySQL MySQL ● A load balancer will notice the errors and remove Galera failed node from its pool Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 26
  • 27. What is majority? ● 50% is not majority ● Any failure in 2 node cluster MySQL MySQL = both nodes must stop ● 4 node cluster split in half = both halves must stop ● pc.ignore_sb exists but don't use it ● You can manually/automatically enable one half by setting MySQL MySQL MySQL MySQL wsrep_cluster_address ● Use 3, 5, 7... nodes Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 27
  • 29. WAN replication ● Works fine ● Use higher timeouts and send windows MySQL MySQL ● No impact on reads ● No impact within a transaction MySQL ● adds 100-300 ms to commit latency Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 29
  • 30. WAN with MySQL asynchronous replication ● You can mix Galera replication and MySQL replication MySQL MySQL MySQL – But it can give you a headache :-) ● Good option on slow WAN link MySQL MySQL MySQL (China, Australia) ● You'll possibly need more nodes than in pure Galera cluster ● Remember to watch out for slave lag, etc... MySQL ● If binlog position is lost (e.g. due to node crash) must reprovision whole MySQL MySQL cluster. ● Mixed replication also useful when you want an asynchronous slave (such as time-delayed, or filtered). Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 30
  • 31. Failures & WAN replication DBMS DBMS DBMS DBMS DBMS DBMS ● Q: What does 50% rule mean when I have 2 datacenters? DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 31
  • 32. Pop Quiz DBMS DBMS DBMS DBMS DBMS DBMS ● Q: What does 50% rule mean when I have 2 datacenters? DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS DBMS ● A: Better have 3 data DBMS DBMS DBMS DBMS DBMS DBMS centers too. Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 32
  • 33. WAN replication with uneven node distribution ● Q: What does 50% rule mean when you have uneven amount of nodes per data center? DBMS DBMS DBMS DBMS DBMS Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 33
  • 34. WAN replication with uneven node distribution ● Q: What does 50% rule mean when you have uneven amount of nodes per data center? DBMS DBMS DBMS DBMS DBMS ● A: Better distribute nodes evenly. (We will address this in future release.) Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 34
  • 36. Baseline: Single node MySQL (sysbench oltp, in-memory) ● Red, Blue: Constrained by InnoDB group commit bug – Fixed in Percona Server 5.5, MariaDB 5.3 and MySQL 5.6 ● Brown: InnoDB syncs, binlog doesn't ● Green: No InnoDB syncing either ● Yellow: No InnoDB syncs, Galera wsrep module enabled http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 36
  • 37. 3 node Galera cluster (sysbench oltp, in memory) http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 37
  • 38. Comments on 3 node cluster (sysbench oltp, in memory) ● Yellow, Red are equal -> No overhead or bottleneck from Galera replication! ● Green, Brown = writing to 2 and 3 masters -> scale-out for read-write workload! – top shows 700% CPU util (8 cores) http://openlife.cc/blogs/2011/august/running-sysbench-tests-against-galera-cluster Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 38
  • 39. Sysbench disk bound (20GB data / 6GB buffer), tps ● EC2 w local disk – Note: pretty poor I/O here ● Blue vs red: turning off innodb_flush_log_at_trx _commit gives 66% improvement ● Scale-out factors: 2N = 0.5 x 1N 4N = 0.5 x 2N ● 5th node was EC2 weakness. Later test scaled a little more up to 8 nodes http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 39
  • 40. Sysbench disk bound (20GB data / 6GB buffer), latency ● As before ● Not syncing InnoDB decreases latency ● Scale-out decreases latency ● Galera does not add latency overhead http://codership.com/content/scaling-out-oltp-load-amazon-ec2-revisited Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 40
  • 41. Galera and NDB shootout: sysbench "out of the box" ● Galera is 4x better Ok, so what does this really mean? ● That Galera is better... – For this workload – With default settings (Severalnines) – Pretty user friendly and general purpose ● NDB – Excels at key-value and heavy-write workloads (which sysbench is not) – Would benefit here from PARTITION BY RANGE http://codership.com/content/whats-difference-kenneth Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 41
  • 42. Summary Many MySQL replication idioms go away: Compared to other alternatives: synchronous, multi-master, no slave lag, no binlog positions, automatic node Galera is better than MySQL replication: provisioning. - synchronous - parallel slave Many LB and VIP architectures possible, - multi-master JDBC/PHP load balancer is best. - easier Also for WAN replication. Adds 100-300 ms Galera is better than DRBD: to commit. - no performance penalty - active/active Quorum based: Majority partition wins. - easier Minimum 3 nodes. Minimum 3 data centers. Galera is better than NDB: Great benchmark results: more - based on good old InnoDB performance, not less, with replication! - more general purpose - easier Galera is good where InnoDB is good: general purpose, easy to use HA cluster Synchronous multi-master clusters with MySQL: an introduction to Galera 2012-05-31 42
  • 43. Questions? Thank you for listening! Happy Clustering :-)