SlideShare a Scribd company logo
SCALING DJANGO FOR X FACTOR
             MALCOLM BOX, DJUGL OCTOBER 2012
WHAT I’M TALKING ABOUT
  Scaling Django to >10K request/s
  Caching, Counting and Cassandra
  Toolbox
ME
 Malcolm Box, CTO & Co-Founder

 @malcolmbox

 malcolm@tellybug.com

 http://tellybug.com
Making TV more
 entertaining


Live interaction

 Highly social

Unique content
WHO ARE YOU?
  Technical?


  Running Django?


  Scale?
THE CHALLENGE
THE CHALLENGE
  Millions of people watch the
  shows we work with
THE CHALLENGE
  Millions of people watch the
  shows we work with

  TV tells them to buzz/clap/
  score....
THE CHALLENGE
  Millions of people watch the
  shows we work with

  TV tells them to buzz/clap/
  score....

  A giant DDOS is launched
  against our servers
HOW BIG?
  Peak loads of 10,000 requests/s
  Read/write mix
    Write-heavy workload - lots of user interactions
HOW BIG?

10K REQUESTS/S IS
 25,920,000,000
REQUESTS/MONTH
The Internet


ARCHITECTURE                                                                       Static assets



                                               HAProxy layer

  Entirely cloud
  based                                         Web layer


                       Chef

  Nodes come and                  Cache


  go - frequently!    Monitor
                                             Cassandra Cluster


  Automatic            Task

  deployment direct
                                                                 RDS MySQL
                      Server



  from Github via               Amazon AWS eu-west-1
                                                                   Logs, backups
                                                                                            Amazon S3

  Chef
CACHING
  Cache as speedup or Cache as mission-critical?
  Use Django cache framework
    Pylibmc - consistent hashing and server death patches
  Problems as you scale up...
CACHE PROBLEMS
  Cache miss behaviour         value = cache.get(key)
                               if value is None:
                                 try:
    Thundering herds are bad       lock = cache.add(lock_key(key))
                                   if lock:
  Key overload                       # Do something expensive
                                     new_value = calculate_new_value()
                                     cache.set(key, new_value)
  Server overload                    return new_value
                                 finally:
  Dualcache - https://             if lock:
                                     cache.delete(lock_key(key)
  gist.github.com/953524
                               return value
COUNTING
  Hard to count a few things very fast
  And have real-time access to the latest result
  Things we tried:
    memcache
    Cassandra counters
  Final solution: Sharded counters
SHARDED COUNTERS
  Implemented in about 350 lines of Python
  To provide two basic operations!
    incr()
    get()
  Uses a combination of two layers of memcache and
  Cassandra to provide real-time, scalable counters
CASSANDRA
  Core piece of our infrastructure
  Highly write-scalable
  Reads scaled from cache
  Using Acunu Cassandra for virtual nodes
  “Fake” Django ORM classes to make it feel more natural
    But no automatic join support
TOOLBOX
  Development
    Django Extensions, Celery, Piston (heavily forked), iPython, pycassa
    Tsung (load testing tool)
  Deployment:
    Fabric, Chef, Boto
  Operations
    Sentry, Gargoyle
THINGS THAT STILL SUCK



                Monitoring
Q&A
AND YES, WE’RE HIRING SO IF YOU’RE INTERESTED IN BUILDING EXTREMELY LARGE
                    DJANGO SITES THEN GET IN TOUCH
                        MALCOLM@TELLYBUG.COM

More Related Content

PPTX
Automating aws infrastructure and code deployments using Ansible @WebEngage
ODP
Hosting Drupal on Amazon EC2
ODP
AutoScaling and Drupal
PPTX
Node Summit 2018 - Optimize your Lambda functions
PPTX
Managing AWS infrastructure using CloudFormation
PDF
Quarkus - a shrink ray to your Java Application
PDF
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
PDF
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Automating aws infrastructure and code deployments using Ansible @WebEngage
Hosting Drupal on Amazon EC2
AutoScaling and Drupal
Node Summit 2018 - Optimize your Lambda functions
Managing AWS infrastructure using CloudFormation
Quarkus - a shrink ray to your Java Application
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Amazon Web Services Building Blocks for Drupal Applications and Hosting

What's hot (19)

PDF
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
PPTX
Scaling Drupal & Deployment in AWS
PDF
ITB2019 Serverless CFML on AWS Lambda - Pete Freitag
PDF
Deep Learning with AWS (November 2016)
PDF
Aurora Serverless, 서버리스 RDB의 서막 - 트랙2, Community Day 2018 re:Invent 특집
PPTX
Speeding up R with Parallel Programming in the Cloud
PDF
DrupalCon Barcelona 2015 - Drupal Extreme Scaling
PDF
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
PDF
Ansible and AWS
PDF
Scaling drupal on amazon web services dr
PDF
Integrating systems in the age of Quarkus and Camel
PDF
Puppet and AWS: Getting the best of both worlds
PDF
NetflixOSS Open House Lightning talks
PDF
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
ODP
Bostonrb Amazon Talk
PDF
DevOps in a Regulated World - aka 'Ansible, AWS, and Jenkins'
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PDF
Scripting Embulk Plugins
PDF
London Hug 19/5 - Terraform in Production
[AWSKRUG&JAWS-UG Meetup #1] 70% Cost Reduction with On-demand resizing
Scaling Drupal & Deployment in AWS
ITB2019 Serverless CFML on AWS Lambda - Pete Freitag
Deep Learning with AWS (November 2016)
Aurora Serverless, 서버리스 RDB의 서막 - 트랙2, Community Day 2018 re:Invent 특집
Speeding up R with Parallel Programming in the Cloud
DrupalCon Barcelona 2015 - Drupal Extreme Scaling
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Ansible and AWS
Scaling drupal on amazon web services dr
Integrating systems in the age of Quarkus and Camel
Puppet and AWS: Getting the best of both worlds
NetflixOSS Open House Lightning talks
Red Hat Nordics 2020 - Apache Camel 3 the next generation of enterprise integ...
Bostonrb Amazon Talk
DevOps in a Regulated World - aka 'Ansible, AWS, and Jenkins'
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Scripting Embulk Plugins
London Hug 19/5 - Terraform in Production
Ad

Similar to Scaling Django for X Factor - DJUGL Oct 2012 (20)

PDF
AWS Community Day 2022 Dhiraj Mahapatro_AWS Lambda under the hood _ Best Prac...
PDF
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
PDF
Serverless Architectural Patterns & Best Practices
PDF
Scaling Mapufacture on Amazon Web Services
PDF
CloudFork
PDF
Riga DevDays 2017 - Efficient AWS Lambda
PPTX
Cloud computing & lamp applications
PDF
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
PPTX
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
PPTX
19th February 2013, AWS User Group UK, Meetup #3, Managing your apps on AWS: ...
PPTX
Protect your app from Outages
PDF
Cloud Architectures - Jinesh Varia - GrepTheWeb
PPTX
Netflix and Open Source
PDF
Microservices reativos usando a stack do Netflix na AWS
PDF
JClouds at San Francisco Java User Group
PDF
Adopting Java for the Serverless world at Serverless Meetup New York and Boston
PDF
Developing with Cassandra
PDF
Netflix presents at MassTLC Cloud Summit 2013
PDF
How to improve lambda cold starts
PPT
The Future is Now: Leveraging the Cloud with Ruby
AWS Community Day 2022 Dhiraj Mahapatro_AWS Lambda under the hood _ Best Prac...
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
Serverless Architectural Patterns & Best Practices
Scaling Mapufacture on Amazon Web Services
CloudFork
Riga DevDays 2017 - Efficient AWS Lambda
Cloud computing & lamp applications
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
19th February 2013, AWS User Group UK, Meetup #3, Managing your apps on AWS: ...
Protect your app from Outages
Cloud Architectures - Jinesh Varia - GrepTheWeb
Netflix and Open Source
Microservices reativos usando a stack do Netflix na AWS
JClouds at San Francisco Java User Group
Adopting Java for the Serverless world at Serverless Meetup New York and Boston
Developing with Cassandra
Netflix presents at MassTLC Cloud Summit 2013
How to improve lambda cold starts
The Future is Now: Leveraging the Cloud with Ruby
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.

Scaling Django for X Factor - DJUGL Oct 2012

  • 1. SCALING DJANGO FOR X FACTOR MALCOLM BOX, DJUGL OCTOBER 2012
  • 2. WHAT I’M TALKING ABOUT Scaling Django to >10K request/s Caching, Counting and Cassandra Toolbox
  • 3. ME Malcolm Box, CTO & Co-Founder @malcolmbox malcolm@tellybug.com http://tellybug.com
  • 4. Making TV more entertaining Live interaction Highly social Unique content
  • 5. WHO ARE YOU? Technical? Running Django? Scale?
  • 7. THE CHALLENGE Millions of people watch the shows we work with
  • 8. THE CHALLENGE Millions of people watch the shows we work with TV tells them to buzz/clap/ score....
  • 9. THE CHALLENGE Millions of people watch the shows we work with TV tells them to buzz/clap/ score.... A giant DDOS is launched against our servers
  • 10. HOW BIG? Peak loads of 10,000 requests/s Read/write mix Write-heavy workload - lots of user interactions
  • 11. HOW BIG? 10K REQUESTS/S IS 25,920,000,000 REQUESTS/MONTH
  • 12. The Internet ARCHITECTURE Static assets HAProxy layer Entirely cloud based Web layer Chef Nodes come and Cache go - frequently! Monitor Cassandra Cluster Automatic Task deployment direct RDS MySQL Server from Github via Amazon AWS eu-west-1 Logs, backups Amazon S3 Chef
  • 13. CACHING Cache as speedup or Cache as mission-critical? Use Django cache framework Pylibmc - consistent hashing and server death patches Problems as you scale up...
  • 14. CACHE PROBLEMS Cache miss behaviour value = cache.get(key) if value is None: try: Thundering herds are bad lock = cache.add(lock_key(key)) if lock: Key overload # Do something expensive new_value = calculate_new_value() cache.set(key, new_value) Server overload return new_value finally: Dualcache - https:// if lock: cache.delete(lock_key(key) gist.github.com/953524 return value
  • 15. COUNTING Hard to count a few things very fast And have real-time access to the latest result Things we tried: memcache Cassandra counters Final solution: Sharded counters
  • 16. SHARDED COUNTERS Implemented in about 350 lines of Python To provide two basic operations! incr() get() Uses a combination of two layers of memcache and Cassandra to provide real-time, scalable counters
  • 17. CASSANDRA Core piece of our infrastructure Highly write-scalable Reads scaled from cache Using Acunu Cassandra for virtual nodes “Fake” Django ORM classes to make it feel more natural But no automatic join support
  • 18. TOOLBOX Development Django Extensions, Celery, Piston (heavily forked), iPython, pycassa Tsung (load testing tool) Deployment: Fabric, Chef, Boto Operations Sentry, Gargoyle
  • 19. THINGS THAT STILL SUCK Monitoring
  • 20. Q&A AND YES, WE’RE HIRING SO IF YOU’RE INTERESTED IN BUILDING EXTREMELY LARGE DJANGO SITES THEN GET IN TOUCH MALCOLM@TELLYBUG.COM

Editor's Notes

  • #2: \n
  • #3: \n
  • #4: \n
  • #5: XFactor 2012 app. Also Switch, BGT, Arab Voice, Unzipped...\n
  • #6: Questions for audience:\n\n- Technical?\n- Running Django in production\n- Scale - 10 ... 100 .... 1000 .... 10000 .... 100000 req/s\n
  • #7: XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • #8: XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • #9: XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • #10: XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • #11: XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • #12: XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • #13: \n
  • #14: cf Google serving 34K searches/s worldwide\n
  • #15: \n
  • #16: Cache is either a speedup for your site, or it is mission critical. The deciding factor is whether your DB can handle the load if the cache fails.\nAt > 500 req/s, MySQL on AWS can’t keep up - hence cache is critical\n\n
  • #17: Discuss the code:\n- what happens if you return None? How does that affect upstream bits of code?\n- occasional latency problems if the value expires - everything fails for as long as calculate_new_value() takes to return\n\nGhetto locking - if using to protect e.g. DB writes, the key itself can end up as a problem\n\n
  • #18: \n
  • #19: Describe how sharded counters work\n- and the very interesting challenge of debugging!\n
  • #20: Used for write performance rather than data size - still more data in MySQL than Cassandra\n\n
  • #21: \n
  • #22: Mini rant - trouble finding any tool that copes with a highly scalable infrastructure up and down\n\nTried: Zabbix, Nagios, Cloudwatch, New Relic, Sensu, librato ... and probably some others\nNow building our own :(\n
  • #23: \n