SlideShare a Scribd company logo
Methods of Sharding MySQL
            Percona Live NYC 2012
Who are Palomino?
Bespoke Services: we work with and like you.
Production Experienced: senior DBAs, admins, and engineers.
24x7: globally-distributed on-call staff.
Short-term no-lock-in contracts.
Professional Services (DevOps):
 ➢ Chef,

 ➢ Puppet,

 ➢ Ansible.


Big Data Cluster Administration (OpsDev):
 ➢ MySQL, PostgreSQL,

 ➢ Cassandra, HBase,

 ➢ MongoDB, Couchbase.
Methods of Sharding MySQL
               Percona Live NYC 2012
Who am I?
Tim Ellis
CTO/Principal Architect, Palomino

Achievements:
 ➢ Palomino Big Data Strategy.

 ➢ Datawarehouse Cluster at Riot Games.

 ➢ Back-end Storage Architecture for Firefox Sync.

 ➢ Led DB teams at Digg for four years.

 ➢ Harassed the Reddit team at one of their parties.


Ensured Successful Business for:
 ➢ Digg, Friendster,

 ➢ Riot Games,

 ➢ Mozilla,

 ➢ StumbleUpon.
Methods of Sharding MySQL
         What is this Talk?
Large cluster admin: when one DB isn't enough.
 ➢ What is a shard?

 ➢ What shard types can I choose?

 ➢ How to build a large DB cluster.

 ➢ How to administer that giant mess of DBs.




Types of large clusters:
 ➢ Just a bunch of databases.

 ➢ Distributed database across machines.
Methods of Sharding MySQL
         Where the Focus will Lie
12% – Sharding theory/considerations.

25% – Building a Cluster to administer (tutorial):
 ➢ Palomino Cluster Tool.




50% – Flexible large-cluster administration (tutorial):
 ➢ Tumblr's Jetpants.




13% – Other sharding technologies (talk-only):
 ➢ Youtube's Vtocc (Vitess),

 ➢ Twitter's Gizzard,

 ➢ HAproxy.
Methods of Sharding MySQL
         What about the Silver Bullets?
NoSQL Distributed Databases:
➢ Promise “sharding” for free,

➢ Uptime and horizontal scaling trivially.




Reality:
➢ RDBMS is 40-yr-old tech,

➢ NoSQL is 10-yr-old tech,

➢ Which responsible for how many high-profile

  downtimes in the past 10 years?
➢ Evaluate the alternatives without illusions.
Methods of Sharding MySQL
                      What is a Shard?
A location for a subset of data:
➢ Itself made of pieces.

➢ Typically itself redundant.



   Shard for User Data        Shard for Logging Data    Shard for Posts Data




       Master                      Master                   Master




    Slave           Slave       Slave           Slave    Slave           Slave
            Slave                       Slave                    Slave
Methods of Sharding MySQL
         What are the Sharding Method Choices?
By-Function:
➢ Move busy tables onto new shard.

➢ Writes of busiest tables on new hardware.

➢ Writes of remaining tables on current.


By-Columns:
➢ Split table into chunks of related columns,

  store each set on its own Master/Slaves shard.
By-Rows:
➢ A table is split into N shards, shard gets a

  subset of the rows of the table.
Methods of Sharding MySQL
         Shard Method Choices
By-function and By-Column Methods:
➢ Much easier.

➢ Can get you through months to years.

➢ Eventually you run out of options here.




By-Row Method:
➢ The hardest to do.

➢ Requires new ways of accessing data.

➢ Often requires sophisticated cache strategies.

➢ Itself can be done several ways.
Methods of Sharding MySQL
         By-Function Sharding
Picking a Functional Split:
 ➢ A subset of tables commonly joined.

 ➢ Tables outside this subset nearly never joined.

 ➢ One of them responsible for many writes.




Every table outside subset requires rewriting
JOINs into code-based multi-SELECTs.

Once subset of tables moved onto their own
server, writes are distributed.
Methods of Sharding MySQL
         By-Column Sharding (Vertical Partition)
Identifying candidate table:
 ➢ Many columns (“users” anyone?),

 ➢ Many updates,

 ➢ Many indexes.




Required: even split of columns/indexes by
update frequency. Attempt: logical grouping.

JOINs not possible nor desireable: write multi-
SELECT code in application DAL.
Methods of Sharding MySQL
         Row-based Sharding Choices
Range-based Sharding:
➢ Easy to understand.

➢ Each shard gets a range of rows.

➢ Oft-times some shards are “hot.”

➢ Hot shards are split into separate shards.

➢ Cold shards are joined into a single shard.

➢ Juggling shard load is a frequent process.




Typically the best solution. Shortcomings have
known work-arounds.
Methods of Sharding MySQL
         Row-based Sharding Choices
Modulus/Hash-based Sharding:
➢ Row key is hashed to integer modulo number

  of shards, then placed on that shard.
➢ Only rarely are some shards are “hot.”

➢ Shard splitting is difficult to implement.




Also a common method of sharding. We hope
not to split shards often (or ever).

When we do, it's a multi-week process.
Methods of Sharding MySQL
         Row-based Sharding Choices
Lookup Table-based Sharding:
 ➢ Easy to understand.

 ➢ Row key mapped to shard in a lookup table.

 ➢ Easy to move load off hot shards.

 ➢ Lookup table method is problematic:

   ➢ Single point of failure.
   ➢ Performance bottleneck.


   ➢ Billions of rows, itself may need sharding.
Prerequisite: Build a Large Cluster
         Allocating the Hardware
Getting Hardware – your own company's:
➢ Can be politically-charged.

➢ Get a small batch first.

➢ Build small demonstration cluster.

➢ Get everyone on-board with the demo.


Renting/Leasing Hardware – the Cloud:
➢ Allocate hardware in EC2 or elsewhere.

➢ Usually easier, but possibly harder admin:

   ➢ Hardware failure more common.
   ➢ Hardware/network flakiness more common.
Prerequisite: Build a Large Cluster
        Building the Cluster




Okay, I've got the hardware. What next?
Prerequisite: Build a Large Cluster
         Building the Cluster
Configuring the Hardware. The old dilemma:
➢ Spend days to install/configure DB software?


  Subsequent management is painful.
➢ Use SSH in “for” loops?


  Rolling your own configuration management
  tools is a lot of work.
➢ Learn a configuration management tool?


  Obvious choice in 2012. Well-documented
  tools like Chef, Puppet, Ansible.
Configuration Management Tools
         My Experience
Puppet: 6 years ago at Digg
 ➢ Manage/Deploy of hundreds of servers.

 ➢ Painful, but not as bad as hand-coding it all.


Chef: 2 years ago at Drawn to Scale and Riot
 ➢ Manage/Deploy dozens of servers.

 ➢ Learning Ruby is a “joy” of its own.


Ansible: 6 months ago at Palomino
 ➢ Manage/Deploy dozens of servers.

 ➢ First Palomino Cluster Tool subset built.
Prerequisite: Build a Large Cluster
         Configuration Management Options
Pick your Configuration Management:
 ➢ Chef: Popular, use Ruby to “code your

   infrastructure.” Must learn Ruby.
 ➢ Puppet: Mature, use data structures to “define

   your infrastructure.” Less coding.
 ➢ Ansible: Tiny and modular, similar to Puppet,

   but with ordering for deployment. Pragmatic.
Write/Get Recipes, Manifests, Playbooks?
 ➢ Writing is tedious. Can take >1 week.

 ➢ Get from internet? Often incomplete.
Prerequisite: Build a Large Cluster
               The Palomino Cluster Tool
Palomino's tool for building large DB clusters:
 ➢ Chef, Puppet, Ansible modules.

 ➢ Open-source on Github.

     ➢   https://github.com/time-palominodb/PalominoClusterTool
     ➢   Google: “Palomino Cluster Tool.”
➢   Will build a large cluster for you in hours:
     ➢ Master(s)
     ➢ Slaves – hundreds of them as easy as two.


     ➢ MHA – when master fails, a slave takes over.


➢   Previously this would take days.
The Palomino Cluster Tool
         Building the Management Node
Cluster Management Node:
➢ Will build the initial cluster.

➢ Will do subsequent cluster management.




Tool for Initial Cluster Build:
 ➢ Palomino Cluster Tool (Ansible subset).




Tool for Cluster Management:
 ➢ Jetpants (Ruby).
The Palomino Cluster Tool
           Building the Management Node
Palomino Cluster Tool (Ansible subset).

Why Ansible?
➢ No server to set up, simply uses SSH.

➢ Easy-to-understand non-code Playbooks.

➢ Use a language you know for modules.

➢ For demo purposes, obvious choice.

➢ Also production-worthy:

   ➢   Built by Michael DeHaan, long-time
       configuration management guru.
The Palomino Cluster Tool
          Building the Management Node
Management node lives alongside your cluster.
➢ We are building our cluster in EC2.

➢ Thus management node in EC2.

➢ This tutorial assumes Ubuntu 12.04.

➢ t1.micro is fine for management node.




Install basic tools:
 ➢ apt-get install git (for Ansible/P.C.T.)

 ➢ apt-get install make python-jinja2 (for

   Ansible)
The Palomino Cluster Tool
         Configuring the Management Node
Install Ansible:
 ➢ git clone git://github.com/ansible/ansible.git

 ➢ make install




Install Palomino Cluster Tool:
 ➢ git clone git://github.com/time-

   palominodb/PalominoClusterTool.git

I think we just finished the management node!
The Palomino Cluster Tool
         Allocating Shard Nodes
Shard nodes:
 ➢ m1.small or larger: at least 1.6GB RAM,

 ➢ :3306, :80, and :22 open between all (one

   security group in EC2),
 ➢ Ubuntu 12.04 (other Debian-alikes at your

   own risk – but may work!).

Do not need OS/database configuration:
➢ Ansible will configure them.
The Palomino Cluster Tool
            Building the First Shard – Step 1
  From README: edit IP addresses in cluster
  layout file (PalominoClusterToolLayout.ini):
# Alerting/Trending -----
[alertmaster]
10.252.157.110
[trendmaster]
10.252.157.110

# Servers -----
[mhamanager]
10.252.157.110


  This section identical for all Shards.
The Palomino Cluster Tool
            Building the First Shard – Step 2
  From README: edit IP addresses in cluster
  layout file (PalominoClusterToolLayout.ini):
[mysqlmasters]
10.244.17.6

[mysqlslaves]
10.244.26.199
10.244.18.178

[mysqls:vars]
master_host=10.244.17.6


  This section different for every Shard.
The Palomino Cluster Tool
            Building the First Shard – Step 3
  Run setup command to put configuration and
  SSH keys into /etc:
$ cd PalominoClusterTool/AnsiblePlaybooks/Ubuntu-12.04
$ ./00-Setup_PalominoClusterTool.sh ShardA


  Run build command – it's a wrapper around
  Ansible Playbooks:
$ ./10-MySQL_MHA_Manager.sh ShardA
The Palomino Cluster Tool
            Building the Second Shard
  Just make one shard with a master and many
  slaves. In real life, you might do something like
  this instead:
for i in ShardB ShardC ShardD ; do
  (manual step):
  vim PalominoClusterToolLayout.ini
  (scriptable steps):
  ./00-Setup_PalominoClusterTool.sh $i
  ./10-MySQL_MHA_Manager.sh $i
done


  Run them in separate terminals to save time.
Make the Cluster Real
              Data makes Shard Split Interesting
    Fill ShardA using random data script.*

    Palomino Cluster Tool includes such a tool.
     ➢ HelperScripts/makeGiantDatafile.pl



$   ssh root@sharda-master
#   cd PalominoClusterTool/HelperScripts
#   mysql -e 'create database palomino'
#   ./makeGiantDatafile.pl 1200000 3 | mysql -f palomino


    Install Jetpants, do shard split now.
    * Be sure /var/lib/mysql is on large partition!
Administering the Cluster
              Install Jetpants
    General idea: Install Ruby >=1.9.2 and
    RubyGems, then Jetpants via RubyGems.

    On my systems, /etc/alternatives always
    incorrect, ln the proper binaries for Jetpants.
#   apt-get install ruby1.9.3 rubygems libmysqlclient-dev
#   ln -sf /usr/bin/ruby1.9.3 /etc/alternatives/ruby
#   ln -sf /usr/bin/gem1.9.3 /etc/alternatives/gem
#   gem install jetpants
Administering the Cluster
              Configure Jetpants
    General idea: edit /etc/jetpants.yaml and
    create/own Jetpants inventory and application
    configuration to Jetpants user:
#   vim /etc/jetpants.yaml
#   mkdir -p /var/jetpants
#   touch /var/jetpants/assets.json
#   chown jetpantsusr: /var/jetpants/assets.json
#   mkdir -p /var/www
#   touch /var/www/databases.yaml
#   chown jetpantsusr: /var/www/databases.yaml
Administering the Cluster
              Jetpants Shard Splits
  Tell Jetpants Console about your ShardA:
Jetpants> s = Shard.new(1, 999999999, '10.12.34.56',
:ready) #10.12.34.56==ShardA master
Jetpants> s.sync_configuration


  Create spares within Console for all others
  (improved workflow in Jetpants 0.7.8):
Jetpants>   topology.tracker.spares << '10.23.45.67'
Jetpants>   topology.tracker.spares << '10.23.45.68'
Jetpants>   topology.tracker.spares << '10.23.45.69'
Jetpants>   topology.write_config
Jetpants>   topology.update_tracker_data
Administering the Cluster
           Jetpants Shard Splits
Just for this tutorial:
 ➢ Create the “palomino” database,

 ➢ Break the replication on all the spares,

 ➢ Be sure spares are read/write:

     ➢ Edit my.cnf,
     ➢ service mysql restart


➢   Ensure “jetpants pools” proper:
     ➢ One master,
     ➢ Two slaves.
Administering the Cluster
            Jetpants Shard Splits
  How to perform an actual Shard Split:
$ jetpants shard_split --min-id=1 --max-id=999999999


  Notes:
  ➢ Process takes hours. Use screen or nohup.

  ➢ LeftID == parent's first, RightID == parent's

    last, no overlap/gap.
  ➢ Make children 1-300000,300001-999999999.
Jetpants Shard Splitting
                         The Gory Details
       After “jetpants shard_split”:
ubuntu@ip-10-252-157-110:~$ jetpants pools
shard-1-999999999 [3GB]
master          = 10.244.136.107 ip-10-244-136-107
standby slave 1 = 10.244.143.195 ip-10-244-143-195
standby slave 2 = 10.244.31.91 ip-10-244-31-91
shard-1-400000 (state: replicating) [2GB]
master          = 10.244.144.183 ip-10-244-144-183
shard-400001-999999999 (state: replicating) [1GB]
master          = 10.244.146.27 ip-10-244-146-27

   0   global pools
   3   shard pools
----   --------------
   3   total pools

   3   masters
   0   active slaves
   2   standby slaves
   0   backup slaves
----   --------------
   5   total nodes
Jetpants Improvements
         The Result of an Experiment
Jetpants only well-tested on RHEL/CentOS.

Palomino Cluster Tool only well-tested to build
Ubuntu 12.04 clusters.

Little effort to fix Jetpants:
 ➢ /sbin/service location different,

 ➢ service mysql status output different.
Jetpants Improvements
         The Result of an Experiment
Jetpants only well-tested on MySQL 5.1.

I built a cluster of MySQL 5.5.

A little more effort to fix Jetpants:
➢ Set master_host=' ' is syntax error,

➢ reset slave needs keyword “all” appended.
Jetpants Improvements
         The Result of an Experiment
Jetpants only well-tested on large datasets.

I built a cluster with only hundreds of MB.

A wee tad more effort to fix Jetpants:
➢ Some timings assumed large datasets,

➢ Edge cases for small/quick operations

  reported back to the author.
Jetpants Improvements
         OSS Collaboration and Win
Evan Elias implemented these fixes last week!
 ➢ jetpants add_pool,

 ➢ jetpants add_shard,

 ➢ jetpants add_spare (with sanity-check spare),

 ➢ Shards with 1 slave (not for prod!),

 ➢ read_only spares not fatal,

 ➢ Debian-alike (Ubuntu) fixes,

 ➢ MySQL 5.5 fixes,

 ➢ Mid-split Jetpants pools output simpler.


Really responsive ownership of project!
Twitter's Gizzard
             What is it?
General Framework for distributed database.
➢ Hides sharding from you.

➢ Literally, it is middleware.

    ➢ Applications connect to Gizzard,
    ➢ Gizzard sends connections to proper place,


    ➢ Shard splits and hardware failure taken care of.


➢ Created at Twitter by rogue cowboys.
➢ Not completely production-ready.

    ➢   Better than rolling your own!
Twitter's Gizzard
         Why should I use it?
You've settled on row-based partition scheme:
 ➢ Master nearing I/O capacity, won't scale up,

 ➢ Can't move some tables to their own pool,

 ➢ Can't split the columns/indexes out,

 ➢ You want to keep using the DBMS you

   already know and love: Percona Server.*
 ➢ Don't want to think about fault-tolerance or

   shard splits (much),

* Actually use any storage back-end.
Twitter's Gizzard
         The Fine Print
This sounds perfect. Why not Gizzard?

Writes must follow strict diet. Must be:
➢ Idempotent*,

➢ Commutative**,

➢ Must not have tuberculosis.




* Pfizer cannot remove the idempotency
requirement of Gizzard.
** Even on evenings and weekends.
Twitter's Gizzard
         Expanding the Fine Print
Idempotency:
 ➢ Submit a write. Again. And again.

 ➢ Must be identical to doing it once.

 ➢ Bad: “update set col = col + 1”




Commutative – writes in arbitrary order:
➢ WriteA→WriteB→WriteC on Node1.

➢ WriteB→WriteC→WriteA on Node2.

➢ Bad: “update set col1 = 42”→“update set

  col2 = col1 + 5”
Twitter's Gizzard
         Expanding the Fine Print
Cluster is Eventually Consistent:
➢ May return old values for reads.

➢ Unknown when consistency will occur.




Like a politician's position on the budget:
 ➢ Might be consistent in the future.

 ➢ Just not right now.

 ➢ Or now.
Twitter's Gizzard
           Working Around the Shortcomings
Gizzard work-around:
➢ Add timestamp to every transaction.

➢ Good:

     ➢ “col1.ts=1; update set col1=42” →
     ➢ “update set col2=col1 + 5 where col1.ts=1”


➢   Implementation trickier if DBMS doesn't
    support column attributes.

Cannot escape: must radically re-think schema
and application/DBMS interaction.
Twitter's Gizzard
             Trying it Out
I'm convinced! How do I begin?
 ➢ Learn Scala.

 ➢ Clone “rowz” from Github.

    ➢   https://github.com/twitter/Rowz
➢ Modify it to suit your needs.
➢ Learn how it interacts with existing tools.

➢ Write new monitoring/alerting plugins.

➢ Write unit tests!

➢ You should OSS it to help with overhead.
Twitter's Gizzard
          Trying it Out
Sounds daunting. Maybe I'll roll my own?

Learn from others' mistakes:
 ➢ Digg: 2 engineers 6 months. Code thrown

   away. Digg out of business.
 ➢ Countless identical stories in Silicon Valley.




NIHS attitude == Go out of business*.

* 8-figure R&D budgets excepted.
Youtube's Vitess/Vtocc
         What is it?
Vitess is a library. Vtocc is an implemenation
using it.

Vtocc is another middleware solution.
➢ Sharding,

➢ Caching,

➢ Connection-pooling,

➢ In-use at Youtube,

➢ Built-in fail-safe features.
Youtube's Vtocc
         Why use it?
Proven high-volume sharding solution.

Interesting feature-list:
 ➢ Auto query/transaction over-limit killing.

 ➢ Better query-cache implementation.

 ➢ Query comment-stripping for query cache.

 ➢ Query consolidation.

 ➢ Zero downtime restarts.




Less coding than Gizzard (more plug-in).
Youtube's Vtocc
         Hold on, Zero Downtime Restarts?
Just start new Vtocc instance.
 ➢ Instance1 passes new requests to Instance2,

 ➢ Instance1's connections get 30s to complete,

 ➢ Instance2 kills Instance1 and takes over.




                    Vtocc Instance 1




                            Vtocc Instance 2
Youtube's Vtocc
          The Fine Print
Requires Particular Primary Keys:
➢ varbinary datatype,

➢ Choose carefully to prevent hot-spots.




Max result-set size: larger resultsets fail.

Additional administration burden:
➢ “My query was killed. Why?”

➢ Middleware adds spooky hard-to-diagnose

  failure modes.
Youtube's Vtocc
                 Implementation Details
➢   Run Vtocc on same server as MySQL.
➢   Configure Vtocc fail-safes for expected load:
    ➢ Pool Size (connection count),

    ➢ Max Transactions (has own connection pool),

    ➢ Query Timeout (before killed),

    ➢ Transaction Timeout (before killed),

    ➢ Max Resultset Size in rows

        ➢   Go language doesn't free allocated memory, so
            pick this value carefully.
➢   More details: http://code.google.com/p/vitess/wiki/Operations
HAproxy
        Re-thinking Proxy Topology
Old-school Proxy Topology:
➢ DB Clients one one side,

➢ DB Servers on the other,

➢ Proxy in-between.




                 Single Point of Failure
HAproxy
         Re-thinking Proxy Topology
Free proxy provides new architecture option:
 ➢ Proxy on every DB client node.

 ➢ Good-bye single-point-of-failure.

 ➢ Hello configuration management for proxy.



                  HAproxy



             HAproxy



                  HAproxy



             HAproxy



                  HAproxy
Methods of Sharding MySQL
         Q&A
Questions? Suggestions:
➢ Interesting stuff. Got a job for me?

➢ Well I got a job for you. Interested?

➢ Warn me next time so I can sleep in the back

  row.
➢ Was that a question?




Thank you! Emails to domain palominodb,
username time. Percona Live 2012 in New York
City. Enjoy the rest of the show!

More Related Content

PDF
Scaling MySQL in Amazon Web Services
PDF
High-Availability using MySQL Fabric
PPTX
MySQL Multi Master Replication
PDF
HTTP Plugin for MySQL!
PDF
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
ODP
MySQL Group Replication
ODP
MySQL 5.7 clustering: The developer perspective
PPTX
High Availability with MariaDB Enterprise
Scaling MySQL in Amazon Web Services
High-Availability using MySQL Fabric
MySQL Multi Master Replication
HTTP Plugin for MySQL!
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
MySQL Group Replication
MySQL 5.7 clustering: The developer perspective
High Availability with MariaDB Enterprise

What's hot (20)

ODP
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
PPTX
MariaDB Galera Cluster
PDF
Mysql User Camp : 20-June-14 : Mysql Fabric
ODP
Data massage: How databases have been scaled from one to one million nodes
PDF
Webinar slides: Introduction to Database Proxies (for MySQL)
PDF
Building Scalable High Availability Systems using MySQL Fabric
PDF
MySQL Fabric Tutorial, October 2014
PDF
Running Galera Cluster on Microsoft Azure
PDF
MySQL Group Replication - an Overview
ODP
Built-in query caching for all PHP MySQL extensions/APIs
ODP
NoSQL in MySQL
PDF
Choosing a MySQL High Availability solution - Percona Live UK 2011
PDF
MySQL High Availability Solutions
PDF
Introduction to Galera
PPTX
MySQL Options in OpenStack
PDF
Mysql User Camp : 20th June - Mysql New Features
PPTX
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
PPTX
Tips to drive maria db cluster performance for nextcloud
PDF
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
PDF
Using MySQL in Automated Testing
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MariaDB Galera Cluster
Mysql User Camp : 20-June-14 : Mysql Fabric
Data massage: How databases have been scaled from one to one million nodes
Webinar slides: Introduction to Database Proxies (for MySQL)
Building Scalable High Availability Systems using MySQL Fabric
MySQL Fabric Tutorial, October 2014
Running Galera Cluster on Microsoft Azure
MySQL Group Replication - an Overview
Built-in query caching for all PHP MySQL extensions/APIs
NoSQL in MySQL
Choosing a MySQL High Availability solution - Percona Live UK 2011
MySQL High Availability Solutions
Introduction to Galera
MySQL Options in OpenStack
Mysql User Camp : 20th June - Mysql New Features
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Tips to drive maria db cluster performance for nextcloud
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
Using MySQL in Automated Testing
Ad

Viewers also liked (20)

PDF
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
PPTX
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
PPTX
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
PDF
Sharding using MySQL and PHP
PPTX
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
PDF
사례를 통해 알아보는 IoT 분석 플랫폼 요건
PDF
Getting Started with PL/Proxy
PDF
MySQL High-Availability and Scale-Out architectures
PDF
MySQL Proxy: Architecture and concepts of misuse
PPTX
MySQL Fabric: High Availability using Python/Connector
PDF
High Availability with MySQL
PDF
MySQL Performance Tuning
PDF
MySQL highav Availability
PDF
MySQL Proxy. From Architecture to Implementation
PDF
DIY: A distributed database cluster, or: MySQL Cluster
PDF
MySQL Proxy tutorial
PPTX
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
KEY
Inside PyMongo - MongoNYC
PDF
MySQL HA Solutions
PDF
MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
Sharding using MySQL and PHP
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
사례를 통해 알아보는 IoT 분석 플랫폼 요건
Getting Started with PL/Proxy
MySQL High-Availability and Scale-Out architectures
MySQL Proxy: Architecture and concepts of misuse
MySQL Fabric: High Availability using Python/Connector
High Availability with MySQL
MySQL Performance Tuning
MySQL highav Availability
MySQL Proxy. From Architecture to Implementation
DIY: A distributed database cluster, or: MySQL Cluster
MySQL Proxy tutorial
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Inside PyMongo - MongoNYC
MySQL HA Solutions
MySQL Proxy. A powerful, flexible MySQL toolbox.
Ad

Similar to Methods of Sharding MySQL (20)

PDF
Massively sharded my sql at tumblr presentation
PDF
Evan Ellis "Tumblr. Massively Sharded MySQL"
PDF
Scaling MySQL -- Swanseacon.co.uk
PDF
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
ODP
MySQL And Search At Craigslist
PPTX
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
KEY
MongoDB vs Mysql. A devops point of view
PDF
MySQL cluster 72 in the Cloud
PPTX
Database highload solutions
KEY
Scaling MongoDB (Mongo Austin)
PPTX
Database highload solutions
PDF
How We Scaled Freshdesk To Take 65M Requests/week
PDF
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
PDF
Scaling MongoDB with Horizontal and Vertical Sharding
ODP
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
ODP
MySQL HA with PaceMaker
PDF
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
PDF
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
PDF
Drupal Con My Sql Ha 2008 08 29
ODP
MySQL HA Alternatives 2010
Massively sharded my sql at tumblr presentation
Evan Ellis "Tumblr. Massively Sharded MySQL"
Scaling MySQL -- Swanseacon.co.uk
Scaling MySQl 1 to N Servers -- Los Angelese MySQL User Group Feb 2014
MySQL And Search At Craigslist
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
MongoDB vs Mysql. A devops point of view
MySQL cluster 72 in the Cloud
Database highload solutions
Scaling MongoDB (Mongo Austin)
Database highload solutions
How We Scaled Freshdesk To Take 65M Requests/week
Scaling-MongoDB-with-Horizontal-and-Vertical-Sharding Mydbops Opensource Data...
Scaling MongoDB with Horizontal and Vertical Sharding
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL HA with PaceMaker
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
Drupal Con My Sql Ha 2008 08 29
MySQL HA Alternatives 2010

More from Laine Campbell (10)

PDF
Recruiting for diversity in tech
PDF
Database engineering
PDF
Velocity pythian operational visibility
PPTX
Pythian operational visibility
PDF
RDS for MySQL, No BS Operations and Patterns
PDF
Running MySQL in AWS
PDF
An Introduction To Palomino
PDF
Hybrid my sql_hadoop_datawarehouse
PDF
CouchConf SF 2012 Lightning Talk - Operational Excellence
PPT
Understanding MySQL Performance through Benchmarking
Recruiting for diversity in tech
Database engineering
Velocity pythian operational visibility
Pythian operational visibility
RDS for MySQL, No BS Operations and Patterns
Running MySQL in AWS
An Introduction To Palomino
Hybrid my sql_hadoop_datawarehouse
CouchConf SF 2012 Lightning Talk - Operational Excellence
Understanding MySQL Performance through Benchmarking

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Modernizing your data center with Dell and AMD
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology

Methods of Sharding MySQL

  • 1. Methods of Sharding MySQL Percona Live NYC 2012 Who are Palomino? Bespoke Services: we work with and like you. Production Experienced: senior DBAs, admins, and engineers. 24x7: globally-distributed on-call staff. Short-term no-lock-in contracts. Professional Services (DevOps): ➢ Chef, ➢ Puppet, ➢ Ansible. Big Data Cluster Administration (OpsDev): ➢ MySQL, PostgreSQL, ➢ Cassandra, HBase, ➢ MongoDB, Couchbase.
  • 2. Methods of Sharding MySQL Percona Live NYC 2012 Who am I? Tim Ellis CTO/Principal Architect, Palomino Achievements: ➢ Palomino Big Data Strategy. ➢ Datawarehouse Cluster at Riot Games. ➢ Back-end Storage Architecture for Firefox Sync. ➢ Led DB teams at Digg for four years. ➢ Harassed the Reddit team at one of their parties. Ensured Successful Business for: ➢ Digg, Friendster, ➢ Riot Games, ➢ Mozilla, ➢ StumbleUpon.
  • 3. Methods of Sharding MySQL What is this Talk? Large cluster admin: when one DB isn't enough. ➢ What is a shard? ➢ What shard types can I choose? ➢ How to build a large DB cluster. ➢ How to administer that giant mess of DBs. Types of large clusters: ➢ Just a bunch of databases. ➢ Distributed database across machines.
  • 4. Methods of Sharding MySQL Where the Focus will Lie 12% – Sharding theory/considerations. 25% – Building a Cluster to administer (tutorial): ➢ Palomino Cluster Tool. 50% – Flexible large-cluster administration (tutorial): ➢ Tumblr's Jetpants. 13% – Other sharding technologies (talk-only): ➢ Youtube's Vtocc (Vitess), ➢ Twitter's Gizzard, ➢ HAproxy.
  • 5. Methods of Sharding MySQL What about the Silver Bullets? NoSQL Distributed Databases: ➢ Promise “sharding” for free, ➢ Uptime and horizontal scaling trivially. Reality: ➢ RDBMS is 40-yr-old tech, ➢ NoSQL is 10-yr-old tech, ➢ Which responsible for how many high-profile downtimes in the past 10 years? ➢ Evaluate the alternatives without illusions.
  • 6. Methods of Sharding MySQL What is a Shard? A location for a subset of data: ➢ Itself made of pieces. ➢ Typically itself redundant. Shard for User Data Shard for Logging Data Shard for Posts Data Master Master Master Slave Slave Slave Slave Slave Slave Slave Slave Slave
  • 7. Methods of Sharding MySQL What are the Sharding Method Choices? By-Function: ➢ Move busy tables onto new shard. ➢ Writes of busiest tables on new hardware. ➢ Writes of remaining tables on current. By-Columns: ➢ Split table into chunks of related columns, store each set on its own Master/Slaves shard. By-Rows: ➢ A table is split into N shards, shard gets a subset of the rows of the table.
  • 8. Methods of Sharding MySQL Shard Method Choices By-function and By-Column Methods: ➢ Much easier. ➢ Can get you through months to years. ➢ Eventually you run out of options here. By-Row Method: ➢ The hardest to do. ➢ Requires new ways of accessing data. ➢ Often requires sophisticated cache strategies. ➢ Itself can be done several ways.
  • 9. Methods of Sharding MySQL By-Function Sharding Picking a Functional Split: ➢ A subset of tables commonly joined. ➢ Tables outside this subset nearly never joined. ➢ One of them responsible for many writes. Every table outside subset requires rewriting JOINs into code-based multi-SELECTs. Once subset of tables moved onto their own server, writes are distributed.
  • 10. Methods of Sharding MySQL By-Column Sharding (Vertical Partition) Identifying candidate table: ➢ Many columns (“users” anyone?), ➢ Many updates, ➢ Many indexes. Required: even split of columns/indexes by update frequency. Attempt: logical grouping. JOINs not possible nor desireable: write multi- SELECT code in application DAL.
  • 11. Methods of Sharding MySQL Row-based Sharding Choices Range-based Sharding: ➢ Easy to understand. ➢ Each shard gets a range of rows. ➢ Oft-times some shards are “hot.” ➢ Hot shards are split into separate shards. ➢ Cold shards are joined into a single shard. ➢ Juggling shard load is a frequent process. Typically the best solution. Shortcomings have known work-arounds.
  • 12. Methods of Sharding MySQL Row-based Sharding Choices Modulus/Hash-based Sharding: ➢ Row key is hashed to integer modulo number of shards, then placed on that shard. ➢ Only rarely are some shards are “hot.” ➢ Shard splitting is difficult to implement. Also a common method of sharding. We hope not to split shards often (or ever). When we do, it's a multi-week process.
  • 13. Methods of Sharding MySQL Row-based Sharding Choices Lookup Table-based Sharding: ➢ Easy to understand. ➢ Row key mapped to shard in a lookup table. ➢ Easy to move load off hot shards. ➢ Lookup table method is problematic: ➢ Single point of failure. ➢ Performance bottleneck. ➢ Billions of rows, itself may need sharding.
  • 14. Prerequisite: Build a Large Cluster Allocating the Hardware Getting Hardware – your own company's: ➢ Can be politically-charged. ➢ Get a small batch first. ➢ Build small demonstration cluster. ➢ Get everyone on-board with the demo. Renting/Leasing Hardware – the Cloud: ➢ Allocate hardware in EC2 or elsewhere. ➢ Usually easier, but possibly harder admin: ➢ Hardware failure more common. ➢ Hardware/network flakiness more common.
  • 15. Prerequisite: Build a Large Cluster Building the Cluster Okay, I've got the hardware. What next?
  • 16. Prerequisite: Build a Large Cluster Building the Cluster Configuring the Hardware. The old dilemma: ➢ Spend days to install/configure DB software? Subsequent management is painful. ➢ Use SSH in “for” loops? Rolling your own configuration management tools is a lot of work. ➢ Learn a configuration management tool? Obvious choice in 2012. Well-documented tools like Chef, Puppet, Ansible.
  • 17. Configuration Management Tools My Experience Puppet: 6 years ago at Digg ➢ Manage/Deploy of hundreds of servers. ➢ Painful, but not as bad as hand-coding it all. Chef: 2 years ago at Drawn to Scale and Riot ➢ Manage/Deploy dozens of servers. ➢ Learning Ruby is a “joy” of its own. Ansible: 6 months ago at Palomino ➢ Manage/Deploy dozens of servers. ➢ First Palomino Cluster Tool subset built.
  • 18. Prerequisite: Build a Large Cluster Configuration Management Options Pick your Configuration Management: ➢ Chef: Popular, use Ruby to “code your infrastructure.” Must learn Ruby. ➢ Puppet: Mature, use data structures to “define your infrastructure.” Less coding. ➢ Ansible: Tiny and modular, similar to Puppet, but with ordering for deployment. Pragmatic. Write/Get Recipes, Manifests, Playbooks? ➢ Writing is tedious. Can take >1 week. ➢ Get from internet? Often incomplete.
  • 19. Prerequisite: Build a Large Cluster The Palomino Cluster Tool Palomino's tool for building large DB clusters: ➢ Chef, Puppet, Ansible modules. ➢ Open-source on Github. ➢ https://github.com/time-palominodb/PalominoClusterTool ➢ Google: “Palomino Cluster Tool.” ➢ Will build a large cluster for you in hours: ➢ Master(s) ➢ Slaves – hundreds of them as easy as two. ➢ MHA – when master fails, a slave takes over. ➢ Previously this would take days.
  • 20. The Palomino Cluster Tool Building the Management Node Cluster Management Node: ➢ Will build the initial cluster. ➢ Will do subsequent cluster management. Tool for Initial Cluster Build: ➢ Palomino Cluster Tool (Ansible subset). Tool for Cluster Management: ➢ Jetpants (Ruby).
  • 21. The Palomino Cluster Tool Building the Management Node Palomino Cluster Tool (Ansible subset). Why Ansible? ➢ No server to set up, simply uses SSH. ➢ Easy-to-understand non-code Playbooks. ➢ Use a language you know for modules. ➢ For demo purposes, obvious choice. ➢ Also production-worthy: ➢ Built by Michael DeHaan, long-time configuration management guru.
  • 22. The Palomino Cluster Tool Building the Management Node Management node lives alongside your cluster. ➢ We are building our cluster in EC2. ➢ Thus management node in EC2. ➢ This tutorial assumes Ubuntu 12.04. ➢ t1.micro is fine for management node. Install basic tools: ➢ apt-get install git (for Ansible/P.C.T.) ➢ apt-get install make python-jinja2 (for Ansible)
  • 23. The Palomino Cluster Tool Configuring the Management Node Install Ansible: ➢ git clone git://github.com/ansible/ansible.git ➢ make install Install Palomino Cluster Tool: ➢ git clone git://github.com/time- palominodb/PalominoClusterTool.git I think we just finished the management node!
  • 24. The Palomino Cluster Tool Allocating Shard Nodes Shard nodes: ➢ m1.small or larger: at least 1.6GB RAM, ➢ :3306, :80, and :22 open between all (one security group in EC2), ➢ Ubuntu 12.04 (other Debian-alikes at your own risk – but may work!). Do not need OS/database configuration: ➢ Ansible will configure them.
  • 25. The Palomino Cluster Tool Building the First Shard – Step 1 From README: edit IP addresses in cluster layout file (PalominoClusterToolLayout.ini): # Alerting/Trending ----- [alertmaster] 10.252.157.110 [trendmaster] 10.252.157.110 # Servers ----- [mhamanager] 10.252.157.110 This section identical for all Shards.
  • 26. The Palomino Cluster Tool Building the First Shard – Step 2 From README: edit IP addresses in cluster layout file (PalominoClusterToolLayout.ini): [mysqlmasters] 10.244.17.6 [mysqlslaves] 10.244.26.199 10.244.18.178 [mysqls:vars] master_host=10.244.17.6 This section different for every Shard.
  • 27. The Palomino Cluster Tool Building the First Shard – Step 3 Run setup command to put configuration and SSH keys into /etc: $ cd PalominoClusterTool/AnsiblePlaybooks/Ubuntu-12.04 $ ./00-Setup_PalominoClusterTool.sh ShardA Run build command – it's a wrapper around Ansible Playbooks: $ ./10-MySQL_MHA_Manager.sh ShardA
  • 28. The Palomino Cluster Tool Building the Second Shard Just make one shard with a master and many slaves. In real life, you might do something like this instead: for i in ShardB ShardC ShardD ; do (manual step): vim PalominoClusterToolLayout.ini (scriptable steps): ./00-Setup_PalominoClusterTool.sh $i ./10-MySQL_MHA_Manager.sh $i done Run them in separate terminals to save time.
  • 29. Make the Cluster Real Data makes Shard Split Interesting Fill ShardA using random data script.* Palomino Cluster Tool includes such a tool. ➢ HelperScripts/makeGiantDatafile.pl $ ssh root@sharda-master # cd PalominoClusterTool/HelperScripts # mysql -e 'create database palomino' # ./makeGiantDatafile.pl 1200000 3 | mysql -f palomino Install Jetpants, do shard split now. * Be sure /var/lib/mysql is on large partition!
  • 30. Administering the Cluster Install Jetpants General idea: Install Ruby >=1.9.2 and RubyGems, then Jetpants via RubyGems. On my systems, /etc/alternatives always incorrect, ln the proper binaries for Jetpants. # apt-get install ruby1.9.3 rubygems libmysqlclient-dev # ln -sf /usr/bin/ruby1.9.3 /etc/alternatives/ruby # ln -sf /usr/bin/gem1.9.3 /etc/alternatives/gem # gem install jetpants
  • 31. Administering the Cluster Configure Jetpants General idea: edit /etc/jetpants.yaml and create/own Jetpants inventory and application configuration to Jetpants user: # vim /etc/jetpants.yaml # mkdir -p /var/jetpants # touch /var/jetpants/assets.json # chown jetpantsusr: /var/jetpants/assets.json # mkdir -p /var/www # touch /var/www/databases.yaml # chown jetpantsusr: /var/www/databases.yaml
  • 32. Administering the Cluster Jetpants Shard Splits Tell Jetpants Console about your ShardA: Jetpants> s = Shard.new(1, 999999999, '10.12.34.56', :ready) #10.12.34.56==ShardA master Jetpants> s.sync_configuration Create spares within Console for all others (improved workflow in Jetpants 0.7.8): Jetpants> topology.tracker.spares << '10.23.45.67' Jetpants> topology.tracker.spares << '10.23.45.68' Jetpants> topology.tracker.spares << '10.23.45.69' Jetpants> topology.write_config Jetpants> topology.update_tracker_data
  • 33. Administering the Cluster Jetpants Shard Splits Just for this tutorial: ➢ Create the “palomino” database, ➢ Break the replication on all the spares, ➢ Be sure spares are read/write: ➢ Edit my.cnf, ➢ service mysql restart ➢ Ensure “jetpants pools” proper: ➢ One master, ➢ Two slaves.
  • 34. Administering the Cluster Jetpants Shard Splits How to perform an actual Shard Split: $ jetpants shard_split --min-id=1 --max-id=999999999 Notes: ➢ Process takes hours. Use screen or nohup. ➢ LeftID == parent's first, RightID == parent's last, no overlap/gap. ➢ Make children 1-300000,300001-999999999.
  • 35. Jetpants Shard Splitting The Gory Details After “jetpants shard_split”: ubuntu@ip-10-252-157-110:~$ jetpants pools shard-1-999999999 [3GB] master = 10.244.136.107 ip-10-244-136-107 standby slave 1 = 10.244.143.195 ip-10-244-143-195 standby slave 2 = 10.244.31.91 ip-10-244-31-91 shard-1-400000 (state: replicating) [2GB] master = 10.244.144.183 ip-10-244-144-183 shard-400001-999999999 (state: replicating) [1GB] master = 10.244.146.27 ip-10-244-146-27 0 global pools 3 shard pools ---- -------------- 3 total pools 3 masters 0 active slaves 2 standby slaves 0 backup slaves ---- -------------- 5 total nodes
  • 36. Jetpants Improvements The Result of an Experiment Jetpants only well-tested on RHEL/CentOS. Palomino Cluster Tool only well-tested to build Ubuntu 12.04 clusters. Little effort to fix Jetpants: ➢ /sbin/service location different, ➢ service mysql status output different.
  • 37. Jetpants Improvements The Result of an Experiment Jetpants only well-tested on MySQL 5.1. I built a cluster of MySQL 5.5. A little more effort to fix Jetpants: ➢ Set master_host=' ' is syntax error, ➢ reset slave needs keyword “all” appended.
  • 38. Jetpants Improvements The Result of an Experiment Jetpants only well-tested on large datasets. I built a cluster with only hundreds of MB. A wee tad more effort to fix Jetpants: ➢ Some timings assumed large datasets, ➢ Edge cases for small/quick operations reported back to the author.
  • 39. Jetpants Improvements OSS Collaboration and Win Evan Elias implemented these fixes last week! ➢ jetpants add_pool, ➢ jetpants add_shard, ➢ jetpants add_spare (with sanity-check spare), ➢ Shards with 1 slave (not for prod!), ➢ read_only spares not fatal, ➢ Debian-alike (Ubuntu) fixes, ➢ MySQL 5.5 fixes, ➢ Mid-split Jetpants pools output simpler. Really responsive ownership of project!
  • 40. Twitter's Gizzard What is it? General Framework for distributed database. ➢ Hides sharding from you. ➢ Literally, it is middleware. ➢ Applications connect to Gizzard, ➢ Gizzard sends connections to proper place, ➢ Shard splits and hardware failure taken care of. ➢ Created at Twitter by rogue cowboys. ➢ Not completely production-ready. ➢ Better than rolling your own!
  • 41. Twitter's Gizzard Why should I use it? You've settled on row-based partition scheme: ➢ Master nearing I/O capacity, won't scale up, ➢ Can't move some tables to their own pool, ➢ Can't split the columns/indexes out, ➢ You want to keep using the DBMS you already know and love: Percona Server.* ➢ Don't want to think about fault-tolerance or shard splits (much), * Actually use any storage back-end.
  • 42. Twitter's Gizzard The Fine Print This sounds perfect. Why not Gizzard? Writes must follow strict diet. Must be: ➢ Idempotent*, ➢ Commutative**, ➢ Must not have tuberculosis. * Pfizer cannot remove the idempotency requirement of Gizzard. ** Even on evenings and weekends.
  • 43. Twitter's Gizzard Expanding the Fine Print Idempotency: ➢ Submit a write. Again. And again. ➢ Must be identical to doing it once. ➢ Bad: “update set col = col + 1” Commutative – writes in arbitrary order: ➢ WriteA→WriteB→WriteC on Node1. ➢ WriteB→WriteC→WriteA on Node2. ➢ Bad: “update set col1 = 42”→“update set col2 = col1 + 5”
  • 44. Twitter's Gizzard Expanding the Fine Print Cluster is Eventually Consistent: ➢ May return old values for reads. ➢ Unknown when consistency will occur. Like a politician's position on the budget: ➢ Might be consistent in the future. ➢ Just not right now. ➢ Or now.
  • 45. Twitter's Gizzard Working Around the Shortcomings Gizzard work-around: ➢ Add timestamp to every transaction. ➢ Good: ➢ “col1.ts=1; update set col1=42” → ➢ “update set col2=col1 + 5 where col1.ts=1” ➢ Implementation trickier if DBMS doesn't support column attributes. Cannot escape: must radically re-think schema and application/DBMS interaction.
  • 46. Twitter's Gizzard Trying it Out I'm convinced! How do I begin? ➢ Learn Scala. ➢ Clone “rowz” from Github. ➢ https://github.com/twitter/Rowz ➢ Modify it to suit your needs. ➢ Learn how it interacts with existing tools. ➢ Write new monitoring/alerting plugins. ➢ Write unit tests! ➢ You should OSS it to help with overhead.
  • 47. Twitter's Gizzard Trying it Out Sounds daunting. Maybe I'll roll my own? Learn from others' mistakes: ➢ Digg: 2 engineers 6 months. Code thrown away. Digg out of business. ➢ Countless identical stories in Silicon Valley. NIHS attitude == Go out of business*. * 8-figure R&D budgets excepted.
  • 48. Youtube's Vitess/Vtocc What is it? Vitess is a library. Vtocc is an implemenation using it. Vtocc is another middleware solution. ➢ Sharding, ➢ Caching, ➢ Connection-pooling, ➢ In-use at Youtube, ➢ Built-in fail-safe features.
  • 49. Youtube's Vtocc Why use it? Proven high-volume sharding solution. Interesting feature-list: ➢ Auto query/transaction over-limit killing. ➢ Better query-cache implementation. ➢ Query comment-stripping for query cache. ➢ Query consolidation. ➢ Zero downtime restarts. Less coding than Gizzard (more plug-in).
  • 50. Youtube's Vtocc Hold on, Zero Downtime Restarts? Just start new Vtocc instance. ➢ Instance1 passes new requests to Instance2, ➢ Instance1's connections get 30s to complete, ➢ Instance2 kills Instance1 and takes over. Vtocc Instance 1 Vtocc Instance 2
  • 51. Youtube's Vtocc The Fine Print Requires Particular Primary Keys: ➢ varbinary datatype, ➢ Choose carefully to prevent hot-spots. Max result-set size: larger resultsets fail. Additional administration burden: ➢ “My query was killed. Why?” ➢ Middleware adds spooky hard-to-diagnose failure modes.
  • 52. Youtube's Vtocc Implementation Details ➢ Run Vtocc on same server as MySQL. ➢ Configure Vtocc fail-safes for expected load: ➢ Pool Size (connection count), ➢ Max Transactions (has own connection pool), ➢ Query Timeout (before killed), ➢ Transaction Timeout (before killed), ➢ Max Resultset Size in rows ➢ Go language doesn't free allocated memory, so pick this value carefully. ➢ More details: http://code.google.com/p/vitess/wiki/Operations
  • 53. HAproxy Re-thinking Proxy Topology Old-school Proxy Topology: ➢ DB Clients one one side, ➢ DB Servers on the other, ➢ Proxy in-between. Single Point of Failure
  • 54. HAproxy Re-thinking Proxy Topology Free proxy provides new architecture option: ➢ Proxy on every DB client node. ➢ Good-bye single-point-of-failure. ➢ Hello configuration management for proxy. HAproxy HAproxy HAproxy HAproxy HAproxy
  • 55. Methods of Sharding MySQL Q&A Questions? Suggestions: ➢ Interesting stuff. Got a job for me? ➢ Well I got a job for you. Interested? ➢ Warn me next time so I can sleep in the back row. ➢ Was that a question? Thank you! Emails to domain palominodb, username time. Percona Live 2012 in New York City. Enjoy the rest of the show!