SlideShare a Scribd company logo
The Web Scale
Tuenti architecture to withstand
1500+ million pageviews / day
                           Guillermo Pérez - bisho@tuenti.com
                    Security & Backend Architecture Tech Lead
What is a scalable
    system?
The Web Scale
The Web Scale
What is scalability
Some Tuenti stats
Tuenti Stats


        13M users
     REALLY ACTIVE
   50%+ active weekly
  >1h browsing per DAY!
Tuenti Stats

 - Each month, over:
    40,000 M pageviews
    50,000 M requests
    100 M new photos
    2,000+ Tb served photos
 - On peaks:
    1,600 million pageviews/day
    35,000 requests/second
    6,000 million served photos/day
Tuenti Stats

 - 1200+ servers
    ~500 FEs
    ~300 DBs
    ~100 MCs
    ~100 image servers
    Others: Chat, HBase, Queues, Processors...
How to scale?
No silver bullet
Monitor
Know your tools
Evolve, iterate
    Learn
Monitoring

 - Your crystal ball!
    Glimpse of the future
    Answer questions
 - Detect bottlenecks
 - Detect what needs to be optimized
    The 90/10 Rule
    No premature optimization
 - Detect bad usages
 - Detect browser patterns
 - Detect changes, issues
  
Monitoring
Monitoring
Monitoring
Monitor
Know your tools
 Evolve, iterate
     Learn
Know your tools

 -   Stop reading blogs
 -   Read internals documentation
 -   Test software
 -   Test hardware
 -   Experiment
  
Know your tools

 - Mysql (innoDB) IS fast
    photos table (photo_id, user_id, ...)
       PK photo_id, KEY user_id
       PK user_id, photo_id, KEY photo_id
       Usage: select * from photos where user=X
    sorting
    covering index
    Even No SQL :)
    Hardware limits, replication
Know your tools
Know your tools

 - Memcache
    Tons of persistent TCP conns eats your ram
       UDP performance issues
          Single thread for UDP
          Multiport patch
       proxies
    Stresses the network to the max
       Driver issues, configuration
       Variable performance with net devices
Know your tools

 - No SQL
    Not magic!
    Good for heavy write loads
    Good for data processing
    Still needs tweaking partitioning, schemas
Monitor
Know your tools
Evolve, iterate
    Learn
Evolve, iterate

 - All architectures scale till certain point
 - Then you must rethink everything
    Then, and only then!
    Remember premature optimization?
    Scale != efficient
    Future is hard to predict
  
  
Monitor
Know your tools
Evolve, iterate
    Learn
Learn

        Learn from:
        Experience
          Failure
          Others
Architecture
Architecture

 - Basic rules:
    Static: Add layers (easy caching)
    Dynamic: Move responsibility to edges
    General: Decentralize, redundancy
  
Architecture

 - Design for failure:
    Support disabling
    Nice degradation, fallbacks
    Controlled launches
 - Test with dark launches
 - Think on storage operations
 - Be able to migrate live
 - Focus on your core, use CDNs
Architecture

 - Move work to the browser:
   Request routing
   Templates
   Cache
   Pefetch
 - Move remaining to your FEs:
   Data relations
   Consistency
   Privacy, access check
   Live migrations
   Knowledge of the storage infraestructure
Architecture

 - All teams involved
   Frontend
      Good JS, templating, caching, prefetching
   Backend
      Data design, parallelization, optimizations
   Systems
      Iron benchmarks, tunning, networking
Dynamic site example
Scaling a website

 -   Setup: 1 server
 -   Bottleneck: cpu
  
 -   Solution: Add fronteds
 -   Changes: Share sessions
Scaling a website

 -   Setup: N fronteds, 1 DB
 -   Bottleneck: DB Reads
  
 -   Solution: Add DB slaves
 -   Changes: Split reads to slaves or DB proxy
Scaling a website

 -   Setup: N fronteds, 1 DB Master + N Slaves
 -   Bottleneck: Limited # of slaves, so DB Reads
  
 -   Solution: Chain replication / Add cache layer
 -   Changes: Big ones!
      Some caches in certain places is easy
      But for dynamic app, Memcache as storage
      Makes your DB nor relational
Scaling a website

 -   Setup: N FEs, 1 DB Master + N Slaves, Caches
 -   Bottleneck: DB Writes
  
 -   Solution: Split tables into DB clusters
 -   Changes: Add some DB abstraction
Scaling a website

 -   Setup: N FEs, N DB clusters, Caches
 -   Bottleneck: DB Writes on certain table
  
 -   Solution: Partition tables
 -   Changes: DB abstraction and big changes
      DB no longer relational, more key based
      Partition key limits queries
      Denormalization, duplicity
       
Scaling a website

 -   Setup: N FEs, N partitioned DBs, Caches
 -   Bottleneck: Disk space, DB cost
  
 -   Solution: Archive tables
 -   Changes: DB abstraction + migration scripts
Scaling a website

 - Setup: N FEs, N partition+archive DBs, Cache
 - Bottleneck: Internal network traffic
  
 - Solution: 2 level caches, split services, cache
 affinity
 - Changes: Cache abstraction, browsers
Scaling a website

 - Setup: N FEs, N partition+archive DBs,
 multilayered Cache, services
 - Bottleneck: Datacenter
  
 - Solution:
    Split services
    Partition users data
 - Changes: Big ones!
    Greater replication lags, inconsistencies
The Tuenti Backend
    Framework
Backend Framework

 - Our mission:
    Provide easy to use, productive, easy to
    debug, testable, fast, extensible,
    customizable, deterministic, reusable,
    instrumentalized (stats) framework and
    tools to ease developers daily work and
    manage the infraestructure.
Backend Framework

 - From Request routing to Storage
 - Simple layers, clean responsibilities
 - Clean, organized codebase
 - Using:
    convention over configuration
    configuration over coding
 - Queuing system for async execution
 - Gathering stats from all levels
Backend Framework

 - Request routing:
    Multiple entry points
    Fast request parsers route to Agents
    Data centric agents
    Printers
Backend Framework

 - Domain Api:
    Expose top-level business actions
    Clean, semantic Api
    No state, no magic, all data in params
    Check privacy (the right place!)
     
Backend Framework

 - Domain Backend:
    Implement public/internal business actions
    Clean, semantic Api
    No state, no magic, all data in params
    Coordinate transactions
    No privacy
     
Backend Framework

 - Domain Storages (ORM like)
    Configure storage access for a table
      Fields, validation, partitioning, primary
      key, caching techniques, custom queries.
    Provide access to storage via standard apis:
      CRUD actions
      Cached Lists
      Cached Queries
      + Custom
    Data container
       
Backend Framework

 - Storage Strategies
    CRUD
    Cached Lists
    Cached Queries
    CUD Observers for custom actions
        
     
Backend Framework

 - Storage Service
    Provides access to the different storage
    services:
       mysql, memcache, hbase...
    Coordinates transactions
    Abstract the infrastructure complexities:
       partitioning, read/write, weights, hosts
    Handles transactions
     
Backend Framework

 - Storage Services (concrete ones)
    Abstract the infrastructure complexities:
       partitioning, read/write, weights, hosts
    Api close to real one:
       Memcache: set, get, cas...
       Mysql: insert, select, update...
Backend Framework

 - Storage Drivers (concrete ones)
    Read config
    Manage PHP drivers
    Enhance API
Love challenges?
We are hiring!
      http://jobs.tuenti.com




      Stay tuned for our
  d...
An Tuenti Challenge 2!
     http://contest.tuenti.net
?
                                              Thanks!
                                    Guillermo Pérez - bisho@tuenti.com
                            Security & Backend Architecture Tech Lead
                                     Images Creative Commons from flickr:
heydanielle, eschipul, deanfotos66, nrbelex, mikolski, fdecomite, guldfisken

More Related Content

DOCX
Nanda Kishore
PPT
Session Handling Using Memcache
PDF
Introduction Mysql
PPT
Datahorse, IT332 Project
PPTX
[Hanoi-August 13] Tech Talk on Caching Solutions
PDF
Web session replication with Hazelcast
DOC
Shree Kolachina
ODP
Caching Strategies
Nanda Kishore
Session Handling Using Memcache
Introduction Mysql
Datahorse, IT332 Project
[Hanoi-August 13] Tech Talk on Caching Solutions
Web session replication with Hazelcast
Shree Kolachina
Caching Strategies

What's hot (8)

PPTX
I <3 Drupal
ODP
Caching technology comparison
PDF
豆瓣技术架构的发展历程 @ QCon Beijing 2009
DOC
Radhin 4+
PPTX
Understanding Web Cache
PDF
In-memory database
PDF
Building low latency java applications with ehcache
PDF
20090309berkeley
I <3 Drupal
Caching technology comparison
豆瓣技术架构的发展历程 @ QCon Beijing 2009
Radhin 4+
Understanding Web Cache
In-memory database
Building low latency java applications with ehcache
20090309berkeley
Ad

Viewers also liked (7)

PPT
Tuenti teams - Php Conference
PDF
Tuenti: Web Application Security
PPT
Software Libre Y Escalabilidad
DOCX
NORMA ISA SP95
PDF
What's Next in Growth? 2016
PDF
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
PDF
Study: The Future of VR, AR and Self-Driving Cars
Tuenti teams - Php Conference
Tuenti: Web Application Security
Software Libre Y Escalabilidad
NORMA ISA SP95
What's Next in Growth? 2016
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
Study: The Future of VR, AR and Self-Driving Cars
Ad

Similar to The Web Scale (20)

PPT
Planning for-high-performance-web-application
PPT
Scaling 101
PPT
Scaling 101 test
PPTX
Membase Meetup 2010
PDF
Super Sizing Youtube with Python
PDF
Os Solomon
PPT
Google Cloud Computing on Google Developer 2008 Day
PDF
System Architecture at DDVE
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
Hadoop and Voldemort @ LinkedIn
PDF
PDF
shan2016_ot_cv
PDF
Distributed Systems: scalability and high availability
PPT
Planning For High Performance Web Application
PPT
SharePoint Advanced Administration with Joel Oleson, Shane Young and Mike Watson
PDF
20080611accel
PPTX
Automated Deployment
PPTX
PHP North-East - Automated Deployment
ODP
Caching and tuning fun for high scalability
ODP
Caching and tuning fun for high scalability
Planning for-high-performance-web-application
Scaling 101
Scaling 101 test
Membase Meetup 2010
Super Sizing Youtube with Python
Os Solomon
Google Cloud Computing on Google Developer 2008 Day
System Architecture at DDVE
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
shan2016_ot_cv
Distributed Systems: scalability and high availability
Planning For High Performance Web Application
SharePoint Advanced Administration with Joel Oleson, Shane Young and Mike Watson
20080611accel
Automated Deployment
PHP North-East - Automated Deployment
Caching and tuning fun for high scalability
Caching and tuning fun for high scalability

The Web Scale

  • 1. The Web Scale Tuenti architecture to withstand 1500+ million pageviews / day Guillermo Pérez - bisho@tuenti.com Security & Backend Architecture Tech Lead
  • 2. What is a scalable system?
  • 7. Tuenti Stats 13M users REALLY ACTIVE 50%+ active weekly >1h browsing per DAY!
  • 8. Tuenti Stats - Each month, over: 40,000 M pageviews 50,000 M requests 100 M new photos 2,000+ Tb served photos - On peaks: 1,600 million pageviews/day 35,000 requests/second 6,000 million served photos/day
  • 9. Tuenti Stats - 1200+ servers ~500 FEs ~300 DBs ~100 MCs ~100 image servers Others: Chat, HBase, Queues, Processors...
  • 13. Monitoring - Your crystal ball! Glimpse of the future Answer questions - Detect bottlenecks - Detect what needs to be optimized The 90/10 Rule No premature optimization - Detect bad usages - Detect browser patterns - Detect changes, issues  
  • 17. Monitor Know your tools Evolve, iterate Learn
  • 18. Know your tools - Stop reading blogs - Read internals documentation - Test software - Test hardware - Experiment  
  • 19. Know your tools - Mysql (innoDB) IS fast photos table (photo_id, user_id, ...) PK photo_id, KEY user_id PK user_id, photo_id, KEY photo_id Usage: select * from photos where user=X sorting covering index Even No SQL :) Hardware limits, replication
  • 21. Know your tools - Memcache Tons of persistent TCP conns eats your ram UDP performance issues Single thread for UDP Multiport patch proxies Stresses the network to the max Driver issues, configuration Variable performance with net devices
  • 22. Know your tools - No SQL Not magic! Good for heavy write loads Good for data processing Still needs tweaking partitioning, schemas
  • 24. Evolve, iterate - All architectures scale till certain point - Then you must rethink everything Then, and only then! Remember premature optimization? Scale != efficient Future is hard to predict    
  • 26. Learn Learn from: Experience Failure Others
  • 28. Architecture - Basic rules: Static: Add layers (easy caching) Dynamic: Move responsibility to edges General: Decentralize, redundancy  
  • 29. Architecture - Design for failure: Support disabling Nice degradation, fallbacks Controlled launches - Test with dark launches - Think on storage operations - Be able to migrate live - Focus on your core, use CDNs
  • 30. Architecture - Move work to the browser: Request routing Templates Cache Pefetch - Move remaining to your FEs: Data relations Consistency Privacy, access check Live migrations Knowledge of the storage infraestructure
  • 31. Architecture - All teams involved Frontend Good JS, templating, caching, prefetching Backend Data design, parallelization, optimizations Systems Iron benchmarks, tunning, networking
  • 33. Scaling a website - Setup: 1 server - Bottleneck: cpu   - Solution: Add fronteds - Changes: Share sessions
  • 34. Scaling a website - Setup: N fronteds, 1 DB - Bottleneck: DB Reads   - Solution: Add DB slaves - Changes: Split reads to slaves or DB proxy
  • 35. Scaling a website - Setup: N fronteds, 1 DB Master + N Slaves - Bottleneck: Limited # of slaves, so DB Reads   - Solution: Chain replication / Add cache layer - Changes: Big ones! Some caches in certain places is easy But for dynamic app, Memcache as storage Makes your DB nor relational
  • 36. Scaling a website - Setup: N FEs, 1 DB Master + N Slaves, Caches - Bottleneck: DB Writes   - Solution: Split tables into DB clusters - Changes: Add some DB abstraction
  • 37. Scaling a website - Setup: N FEs, N DB clusters, Caches - Bottleneck: DB Writes on certain table   - Solution: Partition tables - Changes: DB abstraction and big changes DB no longer relational, more key based Partition key limits queries Denormalization, duplicity  
  • 38. Scaling a website - Setup: N FEs, N partitioned DBs, Caches - Bottleneck: Disk space, DB cost   - Solution: Archive tables - Changes: DB abstraction + migration scripts
  • 39. Scaling a website - Setup: N FEs, N partition+archive DBs, Cache - Bottleneck: Internal network traffic   - Solution: 2 level caches, split services, cache affinity - Changes: Cache abstraction, browsers
  • 40. Scaling a website - Setup: N FEs, N partition+archive DBs, multilayered Cache, services - Bottleneck: Datacenter   - Solution: Split services Partition users data - Changes: Big ones! Greater replication lags, inconsistencies
  • 41. The Tuenti Backend Framework
  • 42. Backend Framework - Our mission: Provide easy to use, productive, easy to debug, testable, fast, extensible, customizable, deterministic, reusable, instrumentalized (stats) framework and tools to ease developers daily work and manage the infraestructure.
  • 43. Backend Framework - From Request routing to Storage - Simple layers, clean responsibilities - Clean, organized codebase - Using: convention over configuration configuration over coding - Queuing system for async execution - Gathering stats from all levels
  • 44. Backend Framework - Request routing: Multiple entry points Fast request parsers route to Agents Data centric agents Printers
  • 45. Backend Framework - Domain Api: Expose top-level business actions Clean, semantic Api No state, no magic, all data in params Check privacy (the right place!)  
  • 46. Backend Framework - Domain Backend: Implement public/internal business actions Clean, semantic Api No state, no magic, all data in params Coordinate transactions No privacy  
  • 47. Backend Framework - Domain Storages (ORM like) Configure storage access for a table Fields, validation, partitioning, primary key, caching techniques, custom queries. Provide access to storage via standard apis: CRUD actions Cached Lists Cached Queries + Custom Data container  
  • 48. Backend Framework - Storage Strategies CRUD Cached Lists Cached Queries CUD Observers for custom actions    
  • 49. Backend Framework - Storage Service Provides access to the different storage services: mysql, memcache, hbase... Coordinates transactions Abstract the infrastructure complexities: partitioning, read/write, weights, hosts Handles transactions  
  • 50. Backend Framework - Storage Services (concrete ones) Abstract the infrastructure complexities: partitioning, read/write, weights, hosts Api close to real one: Memcache: set, get, cas... Mysql: insert, select, update...
  • 51. Backend Framework - Storage Drivers (concrete ones) Read config Manage PHP drivers Enhance API
  • 53. We are hiring! http://jobs.tuenti.com Stay tuned for our d... An Tuenti Challenge 2! http://contest.tuenti.net
  • 54. ? Thanks! Guillermo Pérez - bisho@tuenti.com Security & Backend Architecture Tech Lead Images Creative Commons from flickr: heydanielle, eschipul, deanfotos66, nrbelex, mikolski, fdecomite, guldfisken