SlideShare a Scribd company logo
Retaining globally
distributed high
availability
Art van Scheppingen
Head of Database Engineering
2	
  
1.  Who	
  is	
  Spil	
  Games?	
  
2.  Theory	
  
3.  Spil	
  Storage	
  Pla9orm	
  
4.  Ques=ons?	
  
Overview
Who are we?
Who	
  is	
  Spil	
  Games?	
  
	
  
4	
  
•  Company	
  founded	
  in	
  2001	
  
•  350+	
  employees	
  world	
  wide	
  
•  180M+	
  unique	
  visitors	
  per	
  month	
  
•  45	
  portals	
  in	
  19	
  languages	
  
•  Casual	
  games	
  
•  Social	
  games	
  
•  Real	
  =me	
  mul=player	
  games	
  
•  Mobile	
  games	
  
•  35+	
  MySQL	
  clusters	
  
•  60k	
  queries	
  per	
  second	
  (3.5	
  billion	
  qpd)	
  
Facts
5	
  
Geographic Reach
180	
  Million	
  Monthly	
  Ac=ve	
  Users(*)	
  
Source:	
  (*)	
  Google	
  Analy3cs,	
  August	
  2012	
  
	
  
•  Over	
  45	
  localized	
  portals	
  in	
  19	
  languages	
  
•  Mul=	
  pla9orm:	
  web,	
  mobile,	
  tablet	
  
•  Focus	
  on	
  casual	
  and	
  social	
  games	
  
•  180M	
  MAU	
  per	
  month	
  (30M	
  YoY	
  growth)	
  
•  Over	
  50M	
  registered	
  users	
  
6	
  
Girls,	
  Teens	
  and	
  Family	
  
	
  
spielen.com	
  
juegos.com	
  
gamesgames.com	
  
games.co.uk	
  
Brands
Foundations
The	
  exci2ng	
  theory	
  
	
  
8	
  
•  What	
  does	
  it	
  exactly	
  mean?	
  
Retaining globally distributed HA
9	
  
Wikipedia:	
  
High	
  availability	
  is	
  a	
  system	
  design	
  approach	
  and	
  
associated	
  service	
  implementa=on	
  that	
  ensures	
  a	
  
prearranged	
  level	
  of	
  opera=onal	
  performance	
  will	
  be	
  
met	
  during	
  a	
  contractual	
  measurement	
  period.	
  
	
  
Oracle:	
  
•  Availability	
  of	
  resources	
  in	
  a	
  computer	
  system	
  	
  
What is high availability?
10	
  
•  Master	
  with	
  (many)	
  slave(s)	
  
How do we reach HA with MySQL?
Master
Slave Slave Slave
11	
  
•  Master	
  with	
  (many)	
  slave(s)	
  
•  Mul=	
  Master	
  
How do we reach HA with MySQL?
Master
Slave
Master
Slave
12	
  
•  Master	
  with	
  (many)	
  slave(s)	
  
•  Mul=	
  Master	
  
•  Clustering	
  
How do we reach HA with MySQL?
MysqldMysqld
ndbd
ndbd ndbd
ndbd
ndbd
mgmt
13	
  
•  Master	
  with	
  (many)	
  slave(s)	
  
•  Mul=	
  Master	
  
•  Clustering	
  
•  Geographical	
  redundancy	
  
	
  
How do we reach HA with MySQL?
Master
local DC
Slave local
DC
Slave Asia Slave US
14	
  
•  Scale	
  up	
  
•  Ver=cal	
  
•  Faster	
  CPU/Memory/disks	
  
•  Expensive	
  
•  Costs	
  mul=ply	
  in	
  same	
  rate	
  as	
  #	
  of	
  nodes	
  
•  Scale	
  out	
  
•  Horizontal	
  
•  More	
  (small)	
  machines	
  
•  Inexpensive	
  
•  Par==oning/federa=ng	
  (sharding)	
  
What if we keep growing?
15	
  
•  Func=onal	
  
•  Shard	
  your	
  database	
  func=onally	
  
•  Reads	
  
•  Add	
  more	
  slaves	
  (keep	
  them	
  coming!)	
  
•  Writes	
  
•  More	
  disks	
  
•  Horizontal	
  par==oning	
  
•  Federated	
  par==ons	
  
Scale out
16	
  
•  Breaking	
  up	
  tables	
  in	
  small	
  parts	
  on	
  the	
  same	
  host	
  
•  Par==oned	
  on	
  a	
  column	
  
•  Infinite	
  growth	
  (as	
  long	
  as	
  you	
  add	
  diskspace)	
  
•  Less	
  used	
  data	
  to	
  slower	
  (cheaper)	
  disks	
  
•  No	
  stored	
  procedures,	
  func=ons,	
  etc	
  
•  Uneven	
  usage	
  of	
  par==ons	
  (hash	
  par==on	
  may	
  help)	
  
•  Once	
  wrihen,	
  data	
  remains	
  on	
  the	
  par==on	
  
Horizontal partitioning
17	
  
•  Breaking	
  up	
  your	
  table	
  in	
  parts	
  on	
  mul=ple	
  hosts	
  
•  Par==oned	
  on	
  a	
  column	
  
•  Infinite	
  growth	
  (as	
  long	
  as	
  you	
  add	
  hosts)	
  
•  Less	
  used	
  data	
  on	
  slower	
  hosts	
  
•  Not	
  supported	
  in	
  (standard)	
  MySQL	
  
•  Par==oning	
  on	
  applica=on	
  level	
  (or	
  proxy)	
  
•  Alterna=vely:	
  NDB	
  
•  Uneven	
  usage	
  of	
  par==ons	
  
•  Once	
  wrihen	
  data	
  (mostly)	
  remains	
  on	
  the	
  par==on	
  
•  Parallel	
  queries	
  to	
  retrieve	
  data	
  from	
  all	
  shards	
  
Federated partitions (sharding)
18	
  
•  Parallel	
  execu=on	
  of	
  sequen=al	
  jobs	
  
•  Limited	
  by	
  the	
  weakest	
  link	
  
•  As	
  fast	
  as	
  the	
  slowest	
  node	
  
•  Fix:	
  nonsequen=al	
  (asynchronous)	
  execu=on	
  
Amdahl's law
19	
  
Typical LAMP stack
Client	
  
Webserver	
  
PHP	
  
MySQL	
  
Memcache	
  
Webserver	
  
PHP	
  
Loadbalancer	
  
20	
  
A-typical LAMP stack
Client	
  
Webserver	
  
PHP	
  
MySQL	
  
Memcache	
  
Webserver	
  
PHP	
  
Loadbalancer	
  
MQ	
  
Jobs	
  
Spil Storage
Platform
Abstrac2ng	
  the	
  storage	
  layer	
  
	
  
22	
  
•  Dependent	
  on	
  one	
  storage	
  pla9orm	
  
•  No	
  more	
  pla9orm-­‐specific	
  query	
  language	
  
•  Differen=ate	
  writes	
  	
  
•  Op=mis=c	
  (asynchronous)	
  
•  Pessimis=c	
  (synchronous)	
  
•  Shard	
  data	
  beher	
  
•  Par==on	
  on	
  user	
  and	
  func=on	
  
•  Cluster	
  informa=on	
  by	
  users,	
  not	
  by	
  func=on	
  
•  Global	
  expansion	
  
•  Par==on	
  on	
  geographic	
  loca=on	
  
•  Solve	
  uneven	
  usage	
  of	
  data	
  storage	
  
•  Move	
  data	
  from	
  shard	
  to	
  shard	
  
•  Anything	
  may/could/will	
  fail	
  eventually	
  
•  Not	
  designed	
  for	
  the	
  “happy”	
  flow	
  
What was our wishlist?
23	
  
Old architecture overview
24	
  
New architecture overview
25	
  
New architecture overview
Server API
Application Model
Storage platform
Client-side API
Presentation layer
Physical storage
26	
  
•  Everything	
  wrihen	
  in	
  Erlang	
  
•  Piqi	
  as	
  protocol	
  
•  binary	
  
•  JSON	
  
•  XML	
  
•  SSP	
  u=lizes	
  local	
  caching	
  (memcache)	
  
•  Flexible	
  (persistent)	
  storage	
  layer	
  
•  MySQL	
  (various	
  flavors)	
  
•  Membase/Couchbase	
  
•  Could	
  be	
  any	
  other	
  storage	
  product	
  
•  MQs	
  (DWH	
  updates)	
  
Our building blocks
27	
  
•  Predictable	
  
•  Reliable	
  
•  Decent	
  performance	
  
•  Easy	
  to	
  comprehend	
  
•  Excellent	
  eco	
  system	
  
•  Libraries	
  
•  Monitoring	
  tools	
  
•  Knowledge	
  
Why choose MySQL?
28	
  
•  Func=onal	
  language	
  
•  High	
  availability:	
  designed	
  for	
  telecom	
  solu=ons	
  
•  Excels	
  at	
  concurrency,	
  distribu=on,	
  fault	
  tolerance	
  
•  Do	
  more	
  with	
  less!	
  
•  Other	
  companies	
  using	
  Erlang:	
  
Why Erlang?
29	
  
•  What	
  is	
  the	
  bucket	
  model?	
  
•  Each	
  record	
  has	
  one	
  unique	
  owner	
  ahribute	
  (GID)	
  
•  GID	
  (Global	
  IDen=fier)	
  iden=fying	
  different	
  types	
  
•  Bucket(s)	
  per	
  func=onality	
  
•  Bucket	
  is	
  structured	
  data	
  
•  Ahributes	
  contain	
  data	
  of	
  records	
  
•  Ahributes	
  do	
  not	
  have	
  to	
  correspond	
  to	
  schema	
  
How do we shard?
30	
  
$	
  curl	
  -­‐X	
  POST	
  -­‐H	
  'Accept:	
  applica=on/json'	
  -­‐H	
  	
  
'Content-­‐Type:	
  applica=on/json'	
  -­‐-­‐data-­‐binary	
  "{"gid":	
  	
  
288511851128422401}"	
  hhp://127.0.0.1:8777/demobucket/get	
  
{	
  
	
  	
  "records":	
  [	
  
	
  	
  	
  	
  {	
  
	
  	
  	
  	
  	
  	
  "gid":	
  288511851128422401,	
  
	
  	
  	
  	
  	
  	
  "given_name":	
  "g",	
  
	
  	
  	
  	
  	
  	
  "registered_on":	
  1,	
  
	
  	
  	
  	
  	
  	
  "email":	
  "mail1",	
  
	
  	
  	
  	
  	
  	
  "gender":	
  "m",	
  
	
  	
  	
  	
  	
  	
  "birthdate":	
  {	
  "year":	
  1963,	
  "month":	
  6,	
  "day":	
  21	
  }	
  
	
  	
  	
  	
  }	
  
	
  	
  ],	
  
	
  	
  "meta_info":	
  {	
  "total_ct":	
  1	
  }	
  
}	
  
Example bucket
31	
  
CREATE	
  TABLE	
  demobucket	
  (	
  
	
  	
  gid	
  bigint(20)	
  unsigned	
  not	
  null,	
  
	
  	
  given_name	
  varchar(64)	
  not	
  null,	
  
	
  	
  registered_on	
  =nyint(3)	
  unsigned	
  default	
  0,	
  
	
  	
  email	
  varchar(255)	
  not	
  null,	
  
	
  	
  gender	
  enum(‘m’,	
  ‘f’,	
  ‘u’)	
  not	
  null	
  default	
  ‘m’,	
  
	
  	
  birthdate	
  date	
  not	
  null,	
  
	
  	
  PRIMARY	
  KEY(gid)	
  
);	
  
Example bucket MySQL 1
32	
  
CREATE	
  TABLE	
  demobucket	
  (	
  
	
  	
  gid	
  bigint(20)	
  unsigned	
  not	
  null,	
  
	
  	
  user_name	
  varchar(64)	
  not	
  null,	
  
	
  	
  user_register	
  =mestamp	
  on	
  update	
  
CURRENT_TIMESTAMP(),	
  
	
  	
  user_emailaddress	
  varchar(255)	
  not	
  null,	
  
	
  	
  user_gender	
  char(1)	
  not	
  null	
  default	
  ‘m’,	
  
	
  	
  user_dob	
  varchar(10)	
  not	
  null,	
  
	
  	
  PRIMARY	
  KEY(gid)	
  
);	
  
Example bucket MySQL 2
33	
  
CREATE	
  COLUMNFAMILY	
  demobucket	
  (	
  
	
  	
  gid	
  int	
  PRIMARY	
  KEY,	
  
	
  	
  given_name	
  varchar,	
  
	
  	
  registered_on	
  =mestamp,	
  
	
  	
  email	
  varchar,	
  
	
  	
  gender	
  varchar,	
  
	
  	
  birth_date	
  varchar	
  
);	
  
Example bucket Cassandra
34	
  
demobucket:get(	
  #demobucket_get_input{	
  gid=12345,	
  filters=	
  [	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  #filter{	
  ahr=	
  <<"gender">>	
  	
  	
  	
  ,	
  op=	
  <<"=">>	
  	
  	
  	
  ,	
  parms=	
  {string,	
  <<"f">>}},	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  #filter{	
  ahr=	
  <<"registered_on">>,	
  op=	
  <<"sort">>,	
  parms=asc	
  },	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  #filter{	
  ahr=	
  <<"gid">>,	
  op=	
  <<"limit">>,	
  	
  parms={int,	
  10	
  }}	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  ]}	
  )	
  
Example Erlang filters
35	
  
Pipeline flow of a bucket
36	
  
•  Nearest	
  datacenter	
  (DC)	
  to	
  the	
  end	
  user	
  
•  Satellite	
  DC	
  
•  Processing	
  and	
  caching	
  
•  Do	
  not	
  own/store	
  data	
  
•  Storage	
  DC	
  	
  
•  Processing,	
  caching	
  and	
  persistent	
  storage	
  
•  Store	
  all	
  same	
  user	
  data	
  in	
  same	
  DC	
  
•  Par==on	
  on	
  user	
  globally	
  
•  Global	
  IDen=fier	
  per	
  user	
  
Global distribution
37	
  
•  Contains	
  GIDs	
  and	
  their	
  master	
  DC	
  
•  GIDs	
  master	
  DC	
  predefined	
  
•  Migrated	
  GIDs	
  get	
  updated	
  
The lookup server
38	
  
•  Globally	
  sharded	
  on	
  GID	
  
•  (local)	
  GID	
  Lookup	
  
How does this work?
GID
lookup
Shard 1 Shard 2
Persistent
storage
39	
  
Master/Satellite DC example
40	
  
•  Spread	
  data	
  even	
  on	
  shards	
  
•  Migra=on	
  of	
  buckets	
  between	
  shards	
  
•  GID	
  migra=on	
  between	
  DCs	
  
•  Crea=ng	
  a	
  new	
  storage	
  DC	
  needs	
  data	
  migra=on	
  
•  Users	
  will	
  automa=cally	
  be	
  migrated	
  a‚er	
  visi=ng	
  
another	
  DC	
  many	
  =mes	
  
Why do we need data migration?
41	
  
•  Versioning	
  on	
  bucket	
  defini=ons	
  	
  
•  GIDs	
  are	
  assigned	
  to	
  a	
  bucket	
  version	
  
•  Data	
  in	
  old	
  bucket	
  versions	
  remain	
  (read	
  only)	
  
•  New	
  data	
  only	
  gets	
  wrihen	
  to	
  new	
  bucket	
  version	
  
•  Updates	
  migrate	
  data	
  to	
  new	
  bucket	
  version	
  
•  Migrates	
  can	
  be	
  triggered	
  
Seamless schema upgrades
42	
  
Seamless schema upgrades
Demobucket	
  v1	
  
GID	
  
1234	
  
1235	
  
1236	
  
1237	
  
1238	
  
1239	
  
name	
  
Roy	
  
Moss	
  
Jen	
  
Douglas	
  
Denholm	
  
Richmond	
  
Demobucket	
  v2	
  
GID	
  
	
  
	
  
	
  
	
  
	
  
	
  
name	
  
	
  
	
  
	
  
	
  
	
  
	
  
gender	
  
	
  
	
  
	
  
	
  
	
  
	
  
GID	
  
1241	
  
	
  
	
  
	
  
	
  
	
  
name	
  
Patricia	
  
	
  
	
  
	
  
	
  
	
  
gender	
  
f	
  
	
  
	
  
	
  
	
  
	
  
GID	
  
1241	
  
1235	
  
	
  
	
  
	
  
	
  
name	
  
Patricia	
  
Moss	
  
	
  
	
  
	
  
	
  
gender	
  
f	
  
m	
  
	
  
	
  
	
  
	
  
GID	
  
1234	
  
	
  
1236	
  
1237	
  
1238	
  
1239	
  
name	
  
Roy	
  
	
  
Jen	
  
Douglas	
  
Denholm	
  
Richmond	
  
GID	
  
1234	
  
	
  
	
  
1237	
  
1238	
  
1239	
  
name	
  
Roy	
  
	
  
	
  
Douglas	
  
Denholm	
  
Richmond	
  
GID	
  
1241	
  
1235	
  
1236	
  
	
  
	
  
	
  
name	
  
Patricia	
  
Moss	
  
Jen	
  
	
  
	
  
	
  
gender	
  
f	
  
m	
  
f	
  
	
  
	
  
	
  
43	
  
•  Every	
  cluster	
  (two	
  masters)	
  will	
  contain	
  two	
  shards	
  
•  Data	
  wrihen	
  interleaved	
  
•  HA	
  for	
  both	
  shards	
  
•  No	
  warmup	
  needed	
  
•  Both	
  masters	
  ac=ve	
  and	
  “warmed	
  up”	
  
•  Slaves	
  added	
  (other	
  DC)	
  for	
  HA	
  and	
  backup	
  
Multi Master writes
SSP	
  
Shard	
  1	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
Shard	
  2	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
	
  	
  
44	
  
•  SPAPI	
  is	
  in	
  place	
  
•  SSP	
  is	
  (mostly)	
  running	
  in	
  shadow	
  mode	
  
•  GID	
  buckets	
  running	
  in	
  produc=on	
  
•  Ac=vity	
  feed	
  system	
  first	
  to	
  produc=on	
  
•  Satellite	
  DC	
  in	
  early	
  2013!	
  
Where do we stand now?
45	
  
Questions?
47	
  
•  Presenta=on	
  can	
  be	
  found	
  at:	
  
hhp://spil.com/perconalondon2012	
  
•  If	
  you	
  wish	
  to	
  contact	
  me:	
  
art@spilgames.com	
  
•  Don’t	
  forget	
  to	
  rate	
  my	
  talk!	
  
Thank you!

More Related Content

PDF
MySQL Performance Monitoring
PDF
Disco workshop
PPT
Spil Games @ FOSDEM: Galera Replicator IRL
PPTX
Percona Live London 2014: Serve out any page with an HA Sphinx environment
PDF
Database TCO
PDF
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
PDF
How to Make Norikra Perfect
PDF
Tale of ISUCON and Its Bench Tools
MySQL Performance Monitoring
Disco workshop
Spil Games @ FOSDEM: Galera Replicator IRL
Percona Live London 2014: Serve out any page with an HA Sphinx environment
Database TCO
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
How to Make Norikra Perfect
Tale of ISUCON and Its Bench Tools

What's hot (20)

PPTX
Simple Works Best
 
PDF
Spark and cassandra (Hulu Talk)
PPTX
Unified Batch & Stream Processing with Apache Samza
PPTX
Drilling into Data with Apache Drill
PDF
Hadoop Robot from eBay at China Hadoop Summit 2015
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
PDF
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
PDF
Presto At Treasure Data
PDF
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
PDF
Introduction to Presto at Treasure Data
PDF
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PDF
Key-Value-Stores -- The Key to Scaling?
PDF
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
PDF
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
PDF
Using Riak for Events storage and analysis at Booking.com
PDF
Real-time data analytics with Cassandra at iland
PDF
To Have Own Data Analytics Platform, Or NOT To
PDF
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
PDF
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Simple Works Best
 
Spark and cassandra (Hulu Talk)
Unified Batch & Stream Processing with Apache Samza
Drilling into Data with Apache Drill
Hadoop Robot from eBay at China Hadoop Summit 2015
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Presto At Treasure Data
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Introduction to Presto at Treasure Data
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Large-Scale Stream Processing in the Hadoop Ecosystem
Key-Value-Stores -- The Key to Scaling?
Mysql NDB Cluster's Asynchronous Parallel Design for High Performance
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Using Riak for Events storage and analysis at Booking.com
Real-time data analytics with Cassandra at iland
To Have Own Data Analytics Platform, Or NOT To
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Ad

Similar to Retaining globally distributed high availability (20)

PDF
The Cassandra Distributed Database
PDF
Scaling Databases On The Cloud
PDF
Scaing databases on the cloud
PDF
MySQL: From Single Instance to Big Data
PDF
My sql tutorial-oscon-2012
PDF
MySQL Cluster Scaling to a Billion Queries
PDF
MySQL High Availability Solutions
PDF
Mysqlhacodebits20091203 1260184765-phpapp02
PDF
MySQL High Availability Solutions
PPTX
Spil Games: outgrowing an internet startup
PDF
Scalability Considerations
PDF
Outside The Box With Apache Cassnadra
PDF
20080611accel
PPTX
DataStax C*ollege Credit: What and Why NoSQL?
PPTX
MySQL High Availibility Solutions
PPTX
Cloud storage
PDF
Outgrowing an internet startup: database administration in a fast growing com...
PDF
Scaling MySQL -- Swanseacon.co.uk
PDF
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
PPTX
A Year in Google - Percona Live Europe 2018
The Cassandra Distributed Database
Scaling Databases On The Cloud
Scaing databases on the cloud
MySQL: From Single Instance to Big Data
My sql tutorial-oscon-2012
MySQL Cluster Scaling to a Billion Queries
MySQL High Availability Solutions
Mysqlhacodebits20091203 1260184765-phpapp02
MySQL High Availability Solutions
Spil Games: outgrowing an internet startup
Scalability Considerations
Outside The Box With Apache Cassnadra
20080611accel
DataStax C*ollege Credit: What and Why NoSQL?
MySQL High Availibility Solutions
Cloud storage
Outgrowing an internet startup: database administration in a fast growing com...
Scaling MySQL -- Swanseacon.co.uk
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
A Year in Google - Percona Live Europe 2018
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Machine Learning_overview_presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine Learning_overview_presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Assigned Numbers - 2025 - Bluetooth® Document
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.

Retaining globally distributed high availability

  • 1. Retaining globally distributed high availability Art van Scheppingen Head of Database Engineering
  • 2. 2   1.  Who  is  Spil  Games?   2.  Theory   3.  Spil  Storage  Pla9orm   4.  Ques=ons?   Overview
  • 3. Who are we? Who  is  Spil  Games?    
  • 4. 4   •  Company  founded  in  2001   •  350+  employees  world  wide   •  180M+  unique  visitors  per  month   •  45  portals  in  19  languages   •  Casual  games   •  Social  games   •  Real  =me  mul=player  games   •  Mobile  games   •  35+  MySQL  clusters   •  60k  queries  per  second  (3.5  billion  qpd)   Facts
  • 5. 5   Geographic Reach 180  Million  Monthly  Ac=ve  Users(*)   Source:  (*)  Google  Analy3cs,  August  2012     •  Over  45  localized  portals  in  19  languages   •  Mul=  pla9orm:  web,  mobile,  tablet   •  Focus  on  casual  and  social  games   •  180M  MAU  per  month  (30M  YoY  growth)   •  Over  50M  registered  users  
  • 6. 6   Girls,  Teens  and  Family     spielen.com   juegos.com   gamesgames.com   games.co.uk   Brands
  • 8. 8   •  What  does  it  exactly  mean?   Retaining globally distributed HA
  • 9. 9   Wikipedia:   High  availability  is  a  system  design  approach  and   associated  service  implementa=on  that  ensures  a   prearranged  level  of  opera=onal  performance  will  be   met  during  a  contractual  measurement  period.     Oracle:   •  Availability  of  resources  in  a  computer  system     What is high availability?
  • 10. 10   •  Master  with  (many)  slave(s)   How do we reach HA with MySQL? Master Slave Slave Slave
  • 11. 11   •  Master  with  (many)  slave(s)   •  Mul=  Master   How do we reach HA with MySQL? Master Slave Master Slave
  • 12. 12   •  Master  with  (many)  slave(s)   •  Mul=  Master   •  Clustering   How do we reach HA with MySQL? MysqldMysqld ndbd ndbd ndbd ndbd ndbd mgmt
  • 13. 13   •  Master  with  (many)  slave(s)   •  Mul=  Master   •  Clustering   •  Geographical  redundancy     How do we reach HA with MySQL? Master local DC Slave local DC Slave Asia Slave US
  • 14. 14   •  Scale  up   •  Ver=cal   •  Faster  CPU/Memory/disks   •  Expensive   •  Costs  mul=ply  in  same  rate  as  #  of  nodes   •  Scale  out   •  Horizontal   •  More  (small)  machines   •  Inexpensive   •  Par==oning/federa=ng  (sharding)   What if we keep growing?
  • 15. 15   •  Func=onal   •  Shard  your  database  func=onally   •  Reads   •  Add  more  slaves  (keep  them  coming!)   •  Writes   •  More  disks   •  Horizontal  par==oning   •  Federated  par==ons   Scale out
  • 16. 16   •  Breaking  up  tables  in  small  parts  on  the  same  host   •  Par==oned  on  a  column   •  Infinite  growth  (as  long  as  you  add  diskspace)   •  Less  used  data  to  slower  (cheaper)  disks   •  No  stored  procedures,  func=ons,  etc   •  Uneven  usage  of  par==ons  (hash  par==on  may  help)   •  Once  wrihen,  data  remains  on  the  par==on   Horizontal partitioning
  • 17. 17   •  Breaking  up  your  table  in  parts  on  mul=ple  hosts   •  Par==oned  on  a  column   •  Infinite  growth  (as  long  as  you  add  hosts)   •  Less  used  data  on  slower  hosts   •  Not  supported  in  (standard)  MySQL   •  Par==oning  on  applica=on  level  (or  proxy)   •  Alterna=vely:  NDB   •  Uneven  usage  of  par==ons   •  Once  wrihen  data  (mostly)  remains  on  the  par==on   •  Parallel  queries  to  retrieve  data  from  all  shards   Federated partitions (sharding)
  • 18. 18   •  Parallel  execu=on  of  sequen=al  jobs   •  Limited  by  the  weakest  link   •  As  fast  as  the  slowest  node   •  Fix:  nonsequen=al  (asynchronous)  execu=on   Amdahl's law
  • 19. 19   Typical LAMP stack Client   Webserver   PHP   MySQL   Memcache   Webserver   PHP   Loadbalancer  
  • 20. 20   A-typical LAMP stack Client   Webserver   PHP   MySQL   Memcache   Webserver   PHP   Loadbalancer   MQ   Jobs  
  • 21. Spil Storage Platform Abstrac2ng  the  storage  layer    
  • 22. 22   •  Dependent  on  one  storage  pla9orm   •  No  more  pla9orm-­‐specific  query  language   •  Differen=ate  writes     •  Op=mis=c  (asynchronous)   •  Pessimis=c  (synchronous)   •  Shard  data  beher   •  Par==on  on  user  and  func=on   •  Cluster  informa=on  by  users,  not  by  func=on   •  Global  expansion   •  Par==on  on  geographic  loca=on   •  Solve  uneven  usage  of  data  storage   •  Move  data  from  shard  to  shard   •  Anything  may/could/will  fail  eventually   •  Not  designed  for  the  “happy”  flow   What was our wishlist?
  • 25. 25   New architecture overview Server API Application Model Storage platform Client-side API Presentation layer Physical storage
  • 26. 26   •  Everything  wrihen  in  Erlang   •  Piqi  as  protocol   •  binary   •  JSON   •  XML   •  SSP  u=lizes  local  caching  (memcache)   •  Flexible  (persistent)  storage  layer   •  MySQL  (various  flavors)   •  Membase/Couchbase   •  Could  be  any  other  storage  product   •  MQs  (DWH  updates)   Our building blocks
  • 27. 27   •  Predictable   •  Reliable   •  Decent  performance   •  Easy  to  comprehend   •  Excellent  eco  system   •  Libraries   •  Monitoring  tools   •  Knowledge   Why choose MySQL?
  • 28. 28   •  Func=onal  language   •  High  availability:  designed  for  telecom  solu=ons   •  Excels  at  concurrency,  distribu=on,  fault  tolerance   •  Do  more  with  less!   •  Other  companies  using  Erlang:   Why Erlang?
  • 29. 29   •  What  is  the  bucket  model?   •  Each  record  has  one  unique  owner  ahribute  (GID)   •  GID  (Global  IDen=fier)  iden=fying  different  types   •  Bucket(s)  per  func=onality   •  Bucket  is  structured  data   •  Ahributes  contain  data  of  records   •  Ahributes  do  not  have  to  correspond  to  schema   How do we shard?
  • 30. 30   $  curl  -­‐X  POST  -­‐H  'Accept:  applica=on/json'  -­‐H     'Content-­‐Type:  applica=on/json'  -­‐-­‐data-­‐binary  "{"gid":     288511851128422401}"  hhp://127.0.0.1:8777/demobucket/get   {      "records":  [          {              "gid":  288511851128422401,              "given_name":  "g",              "registered_on":  1,              "email":  "mail1",              "gender":  "m",              "birthdate":  {  "year":  1963,  "month":  6,  "day":  21  }          }      ],      "meta_info":  {  "total_ct":  1  }   }   Example bucket
  • 31. 31   CREATE  TABLE  demobucket  (      gid  bigint(20)  unsigned  not  null,      given_name  varchar(64)  not  null,      registered_on  =nyint(3)  unsigned  default  0,      email  varchar(255)  not  null,      gender  enum(‘m’,  ‘f’,  ‘u’)  not  null  default  ‘m’,      birthdate  date  not  null,      PRIMARY  KEY(gid)   );   Example bucket MySQL 1
  • 32. 32   CREATE  TABLE  demobucket  (      gid  bigint(20)  unsigned  not  null,      user_name  varchar(64)  not  null,      user_register  =mestamp  on  update   CURRENT_TIMESTAMP(),      user_emailaddress  varchar(255)  not  null,      user_gender  char(1)  not  null  default  ‘m’,      user_dob  varchar(10)  not  null,      PRIMARY  KEY(gid)   );   Example bucket MySQL 2
  • 33. 33   CREATE  COLUMNFAMILY  demobucket  (      gid  int  PRIMARY  KEY,      given_name  varchar,      registered_on  =mestamp,      email  varchar,      gender  varchar,      birth_date  varchar   );   Example bucket Cassandra
  • 34. 34   demobucket:get(  #demobucket_get_input{  gid=12345,  filters=  [                            #filter{  ahr=  <<"gender">>        ,  op=  <<"=">>        ,  parms=  {string,  <<"f">>}},                            #filter{  ahr=  <<"registered_on">>,  op=  <<"sort">>,  parms=asc  },                            #filter{  ahr=  <<"gid">>,  op=  <<"limit">>,    parms={int,  10  }}                    ]}  )   Example Erlang filters
  • 35. 35   Pipeline flow of a bucket
  • 36. 36   •  Nearest  datacenter  (DC)  to  the  end  user   •  Satellite  DC   •  Processing  and  caching   •  Do  not  own/store  data   •  Storage  DC     •  Processing,  caching  and  persistent  storage   •  Store  all  same  user  data  in  same  DC   •  Par==on  on  user  globally   •  Global  IDen=fier  per  user   Global distribution
  • 37. 37   •  Contains  GIDs  and  their  master  DC   •  GIDs  master  DC  predefined   •  Migrated  GIDs  get  updated   The lookup server
  • 38. 38   •  Globally  sharded  on  GID   •  (local)  GID  Lookup   How does this work? GID lookup Shard 1 Shard 2 Persistent storage
  • 40. 40   •  Spread  data  even  on  shards   •  Migra=on  of  buckets  between  shards   •  GID  migra=on  between  DCs   •  Crea=ng  a  new  storage  DC  needs  data  migra=on   •  Users  will  automa=cally  be  migrated  a‚er  visi=ng   another  DC  many  =mes   Why do we need data migration?
  • 41. 41   •  Versioning  on  bucket  defini=ons     •  GIDs  are  assigned  to  a  bucket  version   •  Data  in  old  bucket  versions  remain  (read  only)   •  New  data  only  gets  wrihen  to  new  bucket  version   •  Updates  migrate  data  to  new  bucket  version   •  Migrates  can  be  triggered   Seamless schema upgrades
  • 42. 42   Seamless schema upgrades Demobucket  v1   GID   1234   1235   1236   1237   1238   1239   name   Roy   Moss   Jen   Douglas   Denholm   Richmond   Demobucket  v2   GID               name               gender               GID   1241             name   Patricia             gender   f             GID   1241   1235           name   Patricia   Moss           gender   f   m           GID   1234     1236   1237   1238   1239   name   Roy     Jen   Douglas   Denholm   Richmond   GID   1234       1237   1238   1239   name   Roy       Douglas   Denholm   Richmond   GID   1241   1235   1236         name   Patricia   Moss   Jen         gender   f   m   f        
  • 43. 43   •  Every  cluster  (two  masters)  will  contain  two  shards   •  Data  wrihen  interleaved   •  HA  for  both  shards   •  No  warmup  needed   •  Both  masters  ac=ve  and  “warmed  up”   •  Slaves  added  (other  DC)  for  HA  and  backup   Multi Master writes SSP   Shard  1                                       Shard  2                                      
  • 44. 44   •  SPAPI  is  in  place   •  SSP  is  (mostly)  running  in  shadow  mode   •  GID  buckets  running  in  produc=on   •  Ac=vity  feed  system  first  to  produc=on   •  Satellite  DC  in  early  2013!   Where do we stand now?
  • 45. 45  
  • 47. 47   •  Presenta=on  can  be  found  at:   hhp://spil.com/perconalondon2012   •  If  you  wish  to  contact  me:   art@spilgames.com   •  Don’t  forget  to  rate  my  talk!   Thank you!