SlideShare a Scribd company logo
Scaling social games
   “the order of magnitude
          challenge”
  Paolo Negri @hungryblank
Order of magnitude
                                      DAU

                     1000000

                      750000
DAU:
                      500000
daily active users
                      250000

                           0
                               July         December
Social Games
Flash client (game)    HTTP API




                      http://www.flickr.com/photos/stars6/4381851322
Social Games
Flash client


               • Game actions need to be
                 persisted and validated

               • 1 API call every few secs
Social Games
                          HTTP API



• 5000 HTTP reqs/sec
• more than 90% writes
• 60K queries/sec


                         http://www.flickr.com/photos/stars6/4381851322
July 2010
                                HAproxy
• ~ 170 000 daily users
• Plain Ruby on Rails app
• Persistency 100% SQL        Ruby on Rails




                                MySQL
July 2010
• 1 haproxy server              HAproxy


• multiple RoR servers
• 4 mysql servers             Ruby on Rails
  (sharded dataset)

                                MySQL
July 2010
                      HAproxy




                    Ruby on Rails




Slow down             MySQL
July 2010
                            HAproxy



High queries/request      Ruby on Rails
       ratio


    Slow down               MySQL
Queries/request


• Which code is triggering extra queries?
• Why in our test environment the ratio is
  lower than live?
Queries/request
       Running code of live system


Application   Plugins   Ruby on Rails
Queries/request
 Source of extra queries

              •   sharding plugin “breaks” std
                  Rails query cache
    Plugins
              •   Flash wire protocol plugin
                  generates extra queries
Plugins

• Deceiving “feature for free”
• Might provide the right feature
• But might not meet scaling need
Plugins

• Instant code legacy, for new projects also!
• Once added it’s your code
• Even if it’s maintained, will it follow your
  needs?
Plugins


• Assess code quality when you add it
• Can you afford to maintain/change it?
Plugins


• We fixed it!
• Query cut up to 40% on some requests
Early August
 30

22.5

 15

 7.5

  0
   6:00 6:10 6:20 6:30 6:40 6:50 7:00 7:10 7:20 7:30 7:40 7:50 8:00 8:10
                                                    query time in ms


• The MySQL hiccup
• every 70 mins query time spikes x7
Hiccup causes
    Who is periodically blocking MySQL

• Code (app + plugins + Rails)?
• Some periodic job?
• The devil (AWS)?
Hiccup quick fix
• We shard out the top queried table
  (40% of all queries)

                 MySQL servers

      shard 1   shard 2   shard 3   shard 4
Hiccup quick fix
• We shard out the top queried table
  (40% of all queries)

     Top table      Top table      Top table      Top table
      shard 1        shard 2        shard 3        shard 4



    Other tables   Other tables   Other tables   Other tables
      shard 1        shard 2        shard 3        shard 4
Hiccup quick fix
• Mysql likes it
• “top table” shards will go a long way in the
  scaling process

     Top table      Top table      Top table      Top table
      shard 1        shard 2        shard 3        shard 4



    Other tables   Other tables   Other tables   Other tables
      shard 1        shard 2        shard 3        shard 4
Hiccup causes
    Who is periodically blocking MySQL

• Code (app + plugins + Rails)?
• Some periodic job?
• The devil (AWS)?
             None of the Above
Hiccup real cause

• Emerging MySQL internal at high volume
• MySQL flushes its buffer
• Under heavy write IO it’s blocking
Hiccup solution

• Percona MySQL patches (XtraDB) avoid
  blocking behavior
• Query time profile gets smooth
• IO capacity limit manifested with gradual
  performance decay
Write through cache

• Memcache in front of MySQL
• Evaluated before sharding
• Was discarded
• Because of our read/write reatio
Write through cache


 90% of the times we read data
     in order to modify it
Write through cache

  It means 90% of the times

      1. read cache
      2. write cache
      3. write SQL
Write through cache
                   Bound to

   Read heavy                  Write heavy

                         • Mysql write
                              (unless async)
• memcache perfs
                         • Write through lib
                              optimized for
                              writes?
MySQL

• Sharding SQL is a painful way to scale
• Data migrations at high load imply
  downtime
• ACID benefits all lost because of sharding
  or in name of performance
Redis
• A persistent cache
• Fast 60000 qps on AWS hardware
• Interesting data structures, not only KV
• Already some small scale experince in
  house
Redis adoption

• Which data to start from?
• How do we migrate without downtime?
• Which Ruby object - Redis structure lib?
Redis adoption
• Which data to start from?
• Best data fit for Redis hashes
• Top 3rd queried table
• a collection of integer fields that need only
  increment / decrement
Redis adoption
• How do we migrate without downtime?
• Migrate one user at a time
• Use a Redis set to keep note of migrated/
  non migrated
• No downtime, transparent to users
Redis adoption
• How do we migrate without downtime?
                            MySQL
User 123
              RoR
             Server



                             Redis
Redis adoption
• How do we migrate without downtime?
              read original data
                                   MySQL
User 123
              RoR
             Server



                                   Redis
Redis adoption
• How do we migrate without downtime?
                                  MySQL
User 123
               RoR
              Server



                                  Redis
            write migrated data
Redis adoption
• How do we migrate without downtime?

• Migration might never complete
• SQL + Redis set information to generate
  final batch migration
Redis 1st result

10% query load from 4 MySQL server
is moved to 1 Redis server
Redis server load is 0.05
Redis


• Becomes the tool to use
• Migration plan for all write intensive data
• Migrate one “class” at a time
Redis honeymoon end

• Memory usage grows more than data
• Snapshot to disk causes spikes in query
  time
• Starting new slaves eats memory on the
  master node
Redis honeymoon end
           Russian Roulette Feeling



• Redis machine sized with overabundant
  RAM
• Rigorous slave/master starting plan
Redis


• Redis team acknowledges persistency/
  replication problems
• Redis 2.4 diskstore plan starts
1.000.000


And counting...
1.000.000
painless scaling          HAproxy




                        Ruby on Rails




                         Persistency
1.000.000
                            HAproxy



just add servers          Ruby on Rails
 as load grows


                           Peristency
1.000.000
                         HAproxy




                       Ruby on Rails



 Painful and            Peristency
troublesome
Infrastructure

• AWS
• Chef - through Scalarium
• Ganglia
Thanks
  ...
wooga
        Is looking for
Business Intelligence Engineer

   http://wooga.com/jobs

More Related Content

PDF
Mongrel2, a short introduction
PDF
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
PDF
lessons from managing a pulsar cluster
KEY
大規模環境でRailsと4年間付き合ってきて@ クックパッド * 食べログ合同勉強会
PDF
Xen_and_Rails_deployment
PDF
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
PDF
Cassandra: An Alien Technology That's not so Alien
PDF
XMPP/Jingle(VoIP)/Perl Ocean 2012/03
Mongrel2, a short introduction
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
lessons from managing a pulsar cluster
大規模環境でRailsと4年間付き合ってきて@ クックパッド * 食べログ合同勉強会
Xen_and_Rails_deployment
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
Cassandra: An Alien Technology That's not so Alien
XMPP/Jingle(VoIP)/Perl Ocean 2012/03

What's hot (17)

PDF
Training Slides: 101 - Basics: Tungsten Clustering - Under The Hood
PDF
How you can contribute to Apache Cassandra
PPTX
A multi-tenant architecture for Apache Axis2
PPTX
Maria DB Galera Cluster for High Availability
PDF
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
PDF
Client Drivers and Cassandra, the Right Way
PDF
Achieving Infrastructure Portability with Chef
PDF
under the covers -- chef in 20 minutes or less
PDF
Introduction to Apache Kafka
PDF
Redis everywhere - PHP London
KEY
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
DOCX
Master master vs master-slave database
PDF
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
PDF
Speed up your Symfony2 application and build awesome features with Redis
PPTX
Developing with the Go client for Apache Kafka
PPTX
Hashicorp: Delivering the Tao of DevOps
PPTX
Introduction Apache Kafka
Training Slides: 101 - Basics: Tungsten Clustering - Under The Hood
How you can contribute to Apache Cassandra
A multi-tenant architecture for Apache Axis2
Maria DB Galera Cluster for High Availability
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Client Drivers and Cassandra, the Right Way
Achieving Infrastructure Portability with Chef
under the covers -- chef in 20 minutes or less
Introduction to Apache Kafka
Redis everywhere - PHP London
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Master master vs master-slave database
События, шины и интеграция данных в непростом мире микросервисов / Валентин Г...
Speed up your Symfony2 application and build awesome features with Redis
Developing with the Go client for Apache Kafka
Hashicorp: Delivering the Tao of DevOps
Introduction Apache Kafka
Ad

Viewers also liked (20)

PDF
A Documentation Crash Course, LinuxCon 2016
PPTX
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
PDF
Electron - Solving our cross platform dreams?
PPTX
Entrez dans le mouvement Maker à l’aide des technologies Microsoft
PDF
SimpleDb, an introduction
PPT
Why you should come to DrupalSouth
PDF
Offre développeur Javascript Back-end
PDF
Automate your docs, automate yourself
PDF
Erlang introduction geek2geek Berlin
PPTX
Contentful Berlin Offices
PDF
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
PDF
AWS Lambda in infrastructure
PDF
Le futur de Drupal et des applications web
PDF
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
PDF
Devoxx France 2015 - Se préparer à l'arrivée d'Angular 2
PDF
Back to the future with static site generators
PDF
ParisJS #10 : PhantomJs
PDF
Erlang as a cloud citizen, a fractal approach to throughput
PDF
Google : Prise en charge de l'Ajax et de l'Angular JS
PDF
API Days Australia - Automatic Testing of (RESTful) API Documentation
A Documentation Crash Course, LinuxCon 2016
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Electron - Solving our cross platform dreams?
Entrez dans le mouvement Maker à l’aide des technologies Microsoft
SimpleDb, an introduction
Why you should come to DrupalSouth
Offre développeur Javascript Back-end
Automate your docs, automate yourself
Erlang introduction geek2geek Berlin
Contentful Berlin Offices
The Anatomy of Content Management (workshop by J Gollner at Intelligent Conte...
AWS Lambda in infrastructure
Le futur de Drupal et des applications web
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Devoxx France 2015 - Se préparer à l'arrivée d'Angular 2
Back to the future with static site generators
ParisJS #10 : PhantomJs
Erlang as a cloud citizen, a fractal approach to throughput
Google : Prise en charge de l'Ajax et de l'Angular JS
API Days Australia - Automatic Testing of (RESTful) API Documentation
Ad

Similar to Scaling Social Games (20)

PPTX
Clustrix Database Percona Ruby on Rails benchmark
PDF
My Sql And Search At Craigslist
ODP
redis
PPS
Scalable Web Arch
PPS
Scalable Web Architectures - Common Patterns & Approaches
PDF
Your backend architecture is what matters slideshare
PPTX
EEDC 2010. Scaling Web Applications
PDF
Games for the Masses (Jax)
PPS
Web20expo Scalable Web Arch
PPS
Web20expo Scalable Web Arch
PPS
Web20expo Scalable Web Arch
PPTX
Executing Queries on a Sharded Database
PDF
Advanced Deployment
PDF
Scalable, good, cheap
PDF
No sql findings
PDF
What every developer should know about database scalability, PyCon 2010
PDF
Developing polyglot persistence applications (SpringOne China 2012)
PDF
Developing polyglot persistence applications #javaone 2012
PDF
MySQL Cluster Scaling to a Billion Queries
PPTX
CodeFutures - Scaling Your Database in the Cloud
Clustrix Database Percona Ruby on Rails benchmark
My Sql And Search At Craigslist
redis
Scalable Web Arch
Scalable Web Architectures - Common Patterns & Approaches
Your backend architecture is what matters slideshare
EEDC 2010. Scaling Web Applications
Games for the Masses (Jax)
Web20expo Scalable Web Arch
Web20expo Scalable Web Arch
Web20expo Scalable Web Arch
Executing Queries on a Sharded Database
Advanced Deployment
Scalable, good, cheap
No sql findings
What every developer should know about database scalability, PyCon 2010
Developing polyglot persistence applications (SpringOne China 2012)
Developing polyglot persistence applications #javaone 2012
MySQL Cluster Scaling to a Billion Queries
CodeFutures - Scaling Your Database in the Cloud

More from Paolo Negri (6)

PDF
Turning the web stack upside down rethinking how data flows through systems
PDF
Getting real with erlang
PDF
Erlang factory 2011 london
PDF
Erlang factory SF 2011 "Erlang and the big switch in social games"
PDF
RabbitMQ with python and ruby RuPy 2009
PDF
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
Turning the web stack upside down rethinking how data flows through systems
Getting real with erlang
Erlang factory 2011 london
Erlang factory SF 2011 "Erlang and the big switch in social games"
RabbitMQ with python and ruby RuPy 2009
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
Teaching material agriculture food technology
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation_ Review paper, used for researhc scholars
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Teaching material agriculture food technology
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Scaling Social Games

  • 1. Scaling social games “the order of magnitude challenge” Paolo Negri @hungryblank
  • 2. Order of magnitude DAU 1000000 750000 DAU: 500000 daily active users 250000 0 July December
  • 3. Social Games Flash client (game) HTTP API http://www.flickr.com/photos/stars6/4381851322
  • 4. Social Games Flash client • Game actions need to be persisted and validated • 1 API call every few secs
  • 5. Social Games HTTP API • 5000 HTTP reqs/sec • more than 90% writes • 60K queries/sec
 http://www.flickr.com/photos/stars6/4381851322
  • 6. July 2010 HAproxy • ~ 170 000 daily users • Plain Ruby on Rails app • Persistency 100% SQL Ruby on Rails MySQL
  • 7. July 2010 • 1 haproxy server HAproxy • multiple RoR servers • 4 mysql servers Ruby on Rails (sharded dataset) MySQL
  • 8. July 2010 HAproxy Ruby on Rails Slow down MySQL
  • 9. July 2010 HAproxy High queries/request Ruby on Rails ratio Slow down MySQL
  • 10. Queries/request • Which code is triggering extra queries? • Why in our test environment the ratio is lower than live?
  • 11. Queries/request Running code of live system Application Plugins Ruby on Rails
  • 12. Queries/request Source of extra queries • sharding plugin “breaks” std Rails query cache Plugins • Flash wire protocol plugin generates extra queries
  • 13. Plugins • Deceiving “feature for free” • Might provide the right feature • But might not meet scaling need
  • 14. Plugins • Instant code legacy, for new projects also! • Once added it’s your code • Even if it’s maintained, will it follow your needs?
  • 15. Plugins • Assess code quality when you add it • Can you afford to maintain/change it?
  • 16. Plugins • We fixed it! • Query cut up to 40% on some requests
  • 17. Early August 30 22.5 15 7.5 0 6:00 6:10 6:20 6:30 6:40 6:50 7:00 7:10 7:20 7:30 7:40 7:50 8:00 8:10 query time in ms • The MySQL hiccup • every 70 mins query time spikes x7
  • 18. Hiccup causes Who is periodically blocking MySQL • Code (app + plugins + Rails)? • Some periodic job? • The devil (AWS)?
  • 19. Hiccup quick fix • We shard out the top queried table (40% of all queries) MySQL servers shard 1 shard 2 shard 3 shard 4
  • 20. Hiccup quick fix • We shard out the top queried table (40% of all queries) Top table Top table Top table Top table shard 1 shard 2 shard 3 shard 4 Other tables Other tables Other tables Other tables shard 1 shard 2 shard 3 shard 4
  • 21. Hiccup quick fix • Mysql likes it • “top table” shards will go a long way in the scaling process Top table Top table Top table Top table shard 1 shard 2 shard 3 shard 4 Other tables Other tables Other tables Other tables shard 1 shard 2 shard 3 shard 4
  • 22. Hiccup causes Who is periodically blocking MySQL • Code (app + plugins + Rails)? • Some periodic job? • The devil (AWS)? None of the Above
  • 23. Hiccup real cause • Emerging MySQL internal at high volume • MySQL flushes its buffer • Under heavy write IO it’s blocking
  • 24. Hiccup solution • Percona MySQL patches (XtraDB) avoid blocking behavior • Query time profile gets smooth • IO capacity limit manifested with gradual performance decay
  • 25. Write through cache • Memcache in front of MySQL • Evaluated before sharding • Was discarded • Because of our read/write reatio
  • 26. Write through cache 90% of the times we read data in order to modify it
  • 27. Write through cache It means 90% of the times 1. read cache 2. write cache 3. write SQL
  • 28. Write through cache Bound to Read heavy Write heavy • Mysql write (unless async) • memcache perfs • Write through lib optimized for writes?
  • 29. MySQL • Sharding SQL is a painful way to scale • Data migrations at high load imply downtime • ACID benefits all lost because of sharding or in name of performance
  • 30. Redis • A persistent cache • Fast 60000 qps on AWS hardware • Interesting data structures, not only KV • Already some small scale experince in house
  • 31. Redis adoption • Which data to start from? • How do we migrate without downtime? • Which Ruby object - Redis structure lib?
  • 32. Redis adoption • Which data to start from? • Best data fit for Redis hashes • Top 3rd queried table • a collection of integer fields that need only increment / decrement
  • 33. Redis adoption • How do we migrate without downtime? • Migrate one user at a time • Use a Redis set to keep note of migrated/ non migrated • No downtime, transparent to users
  • 34. Redis adoption • How do we migrate without downtime? MySQL User 123 RoR Server Redis
  • 35. Redis adoption • How do we migrate without downtime? read original data MySQL User 123 RoR Server Redis
  • 36. Redis adoption • How do we migrate without downtime? MySQL User 123 RoR Server Redis write migrated data
  • 37. Redis adoption • How do we migrate without downtime? • Migration might never complete • SQL + Redis set information to generate final batch migration
  • 38. Redis 1st result 10% query load from 4 MySQL server is moved to 1 Redis server Redis server load is 0.05
  • 39. Redis • Becomes the tool to use • Migration plan for all write intensive data • Migrate one “class” at a time
  • 40. Redis honeymoon end • Memory usage grows more than data • Snapshot to disk causes spikes in query time • Starting new slaves eats memory on the master node
  • 41. Redis honeymoon end Russian Roulette Feeling • Redis machine sized with overabundant RAM • Rigorous slave/master starting plan
  • 42. Redis • Redis team acknowledges persistency/ replication problems • Redis 2.4 diskstore plan starts
  • 44. 1.000.000 painless scaling HAproxy Ruby on Rails Persistency
  • 45. 1.000.000 HAproxy just add servers Ruby on Rails as load grows Peristency
  • 46. 1.000.000 HAproxy Ruby on Rails Painful and Peristency troublesome
  • 47. Infrastructure • AWS • Chef - through Scalarium • Ganglia
  • 49. wooga Is looking for Business Intelligence Engineer http://wooga.com/jobs