SlideShare a Scribd company logo
THE CHALLENGES OF
SUPPORTING ONLINE LIVE
EVENTS WITH TV
PARTICIPATION NUMBERS
A STARTUP PERSPECTIVE

  Presentation for B.Sc students from IDC
  By Guy Tomer, November 2011
Hello
• I’m Guy Tomer
 • Founding and working in start-ups for the last
   13 years
 • Founder & CTO of attracTV for the last 4 years


• This Presentation is about
 • Building a scalable system for “a lot” of users
   • More specifically for handling usage peaks of live TV
     events on the internet
     • Even more specifically – how we tackle it as a small start-up
attracTV




Web based self-service solution and tools for
    managing viewers’ engagement and
      interaction on the online screen
 Social    Information   Advertisement   eCommerce
Our Use Case – MTV European Music
Awards
• One of the biggest online live streams ever
• Can’t expose precise numbers but
   • 7 digits ( > 1,000,000) – number of streams
   • 6 digits (> 100,000) –
     number of concurrent users
   • 5 digits (> 10,000) –
     number of users joining
     every minute at peak
• In addition
   • International event, 20
     sites, viewers from
     >150 countries
   • 9 languages
What Are The Challenges
1. Scaling for these numbers
2. Handling very steep ramp-up
3. Big data
4. High availability
5. Testing & preparing for such numbers
6. The cost of the above – how to do it and still make
   money

We’ll Discuss mainly 1,5 & 6
Some Big “Internet Scale” Examples
• Google Uses About 900,000 Servers
• (Map-Reduce) Google completed sorting a ten petabyte
    input set took 6 hours and 27 minutes to complete on
    8000 computers.
•   Facebook serves 1 trillion pages per months
•   (2010) 30 billion – Pieces of content (links, notes, photos,
    etc.) shared on Facebook per month.
•   (2010) 2 billion – The number of videos watched per day
    on YouTube.
•   Akamai, the “CDN to the starts” has 95811 (Q2 2011)
    servers, 1000 networks, 70 countries
Challenge 1 – Handling The Scale
• We are prepared for 400,000 concurrent viewers
• HTTP polling every 10<=N<=30 seconds
• This means ~20,000 HTTP R/S (requests per
  second)
• For comparison
 • Stack overflow recently reported 800 R/S
 • Sify.com (leading portal in India)
   reported 3900 R/S
 • Jobs' death resulted in a record
   breaking 10,000 tweets/s
  (they do have a lot more requests,
  that’s just to feel the scale)
What Is Scalability
• From Wikipedia“Scalability is the ability of a
 system, network, or process, to handle growing
 amounts of work in a graceful manner or its
 ability to be enlarged to accommodate that
 growth.”
            Performance ≠ Scalability
The fact that your code runs very fast for
X users doesn’t mean your architecture
supports 100*X users.
Vertical Scalability (scale up)
• “Get a bigger server”
• “Use faster CPUs”
• Cons
  • Can only help so much (with bad scale/$ value).
  • A server twice as fast is more than
    twice as expensive
• Pros
  • Easier to manager less computers
  • Can use virtualization
Horizontal Scalability (scale out)
• “Just add another box” (or another thousand or
  ...)
• Plan the architecture right first, do micro
  optimizations later
• Pros
 • Unlimited theoretically
 • Works well with the cloud services elasticity
• Cons
  • More complex to manage
  • More complex programming models
Challenge #2 – Steep Ramp-up
• Live Event - Everyone comes at the same time
 Steep ramp-up                Standard website example (wikimedia)




• A car can drive 250k/h doesn’t mean it can do 0-
 100km/h in 4 seconds


                          ≠
Challenge #3 – Big Data
• From Wikipedia:

 “Big data are datasets that grow so large
  that they become awkward to work with
using on-hand database management tools”
• One of the biggest hypes in the industry today
• During this even we had ~10,000,000 records written to
  our analytics system per hour
• We’re not “Big Data” yet but
  it’s coming
Challenge #4 – High Availability
   “High availability refers to a system or
 component that is continuously operational
    for a desirably long length of time.”

• We need to meet a Service   High availability in the cloud
  Level of 99.9%
• Backup, failover systems
  are expensive
• The cloud is at our help
Challenge #5 – Testing
• Simulating 100s of thousands of concurrent
  users… not trivial
• Requires 10s of strong servers
• Very difficult to collect the data
• The cloud is at our help
Challenge #6 – Handling The Costs
Of Such Event (Hint- Elasticity)
• For production we used ~50 servers that have 4 cores
  with 2GH and 15GB RAM (m1.xl)
• Some options (rough estimation) for this are:
  • Buy - ~$3500 per box = $175,000. Not for us…

  • Dedicated server for a month - ~$1000 per instance =  $50,000
  • VPS (Virtual private server) monthly - ~300$ per box = $15,000

• Solution: Cloud on-demand (Amazon AWS) - ~$500 per
 instance = $25,000 for a month…. BUT
 … no need to take it for a month,
 we activate it on demand for 12 hours
 and it costs $416!
Our #1 Lesson - Think Horizontal!
• Why not vertical?
  • We don’t want it to be our business’s bottleneck at any
    point in time
  • We don’t want to buy giant servers
  • We wanted a cheap start
  • We want elasticity
  • We don’t want to buy anything at this point
• How? (deserves a separate lecture)
  • Everything in the architecture
  • No state shared between the web/app
    servers
  • No relation between the # of users
    and the load on the Database
Lesson #2 KISS
• Keep It Simple Stupid
• Your system architecture
• Your code
• Your features              Hug out all the complexity in your system
• Your business model


• If you don’t
 you won’t scale,
 from personal
 experience
Lesson #3 – Load Test Everything, Focus
    On Real World Usage Patterns
• We did massive stress testing
• We launched tens of servers just for stress testing
• Automated with Jmeter and monitored the same way as
    production
                                Why?
•   The only way to test your scaling capabilities
•   Looking at the code and manual tests are irrelevant
•   Measure the capacity of a single app server
•   Test the specific ramp-up scenario because
•   Example 1 app server = 5000 users, we need
    to support 200,000 users so we need to
    prepare at least 40 servers
Lesson #4 – S*t Happens, Don’t Save On
  Real-Time Monitoring and Support
• We had a series of successful big events before this one
• We launched tens of servers just for the stress testing
• And yet we had two problems during the event 
                                Why?
• Murphy is always (eventually) right…
• Because of a feature no one uses (see lesson #2 - KISS)
  that wasn’t active in the stress tests
• The specific usage of 9 languages caused unexpected load
  (see lesson #3 – stress real world scenarios)
Luckily the whole team was in
monitoring mode and the issues
were quickly handled on the fly.
Lesson #5 – Use The Cloud (startups)
• It’s Elastic, pay on demand
• Flexible when you don’t know your parameters
• Solution for affordable High Availability & Testing
• Focus on development
• I am not getting paid by Amazon – check
 others as well!
Summary - What To Remember?
• Scalability is the ability of a system to handle growing
    amount of work with additional resources
•   Think horizontal
•   Keep It Simple (Stupid) – everything
•   Stress test everything, focus on real world scenarios
•   Monitor and Real-Time support
•   Cloud is great for start-ups
The End

• Questions? Comments? Consulting           Preguntas?
  问题 ?
• Just Shy? Think you should be working in attracTV?
  Contact me:
guy.tomer@gmail.com
www.guytomer.com
Special Thanks (presentations, websites I “borrowed” from)
• Ask Bjørn Hansen
    (http://groups.google.com/group/scalable)
•   High Scalability blog http://highscalability.com/
•   http://royal.pingdom.com
•   Google images
•   Entourage (http://www.hbo.com/entourage/index.html)

More Related Content

PDF
Serverless is the future... or is it?
PDF
Electron performance and C++ in Mailspring
PDF
Under the hood daum ucc.20071105
PDF
Coates bosc2010 clouds-fluff-and-no-substance
PPT
Best Practices for Large-Scale Websites -- Lessons from eBay
PPT
Elatt Presentation
PPTX
Rendering Takes Flight
PDF
Kanban Basics for Beginners Revised
Serverless is the future... or is it?
Electron performance and C++ in Mailspring
Under the hood daum ucc.20071105
Coates bosc2010 clouds-fluff-and-no-substance
Best Practices for Large-Scale Websites -- Lessons from eBay
Elatt Presentation
Rendering Takes Flight
Kanban Basics for Beginners Revised

What's hot (13)

PDF
Scaling the guardian
PPTX
Introduction To Serverless Architecture
PPTX
Cloud fail scaling to infinity but not beyond
PPTX
Site reliability in the serverless age - Serverless Boston Meetup
PPTX
Engineering Netflix Global Operations in the Cloud
PDF
Quarterly Technology Briefing, Manchester, UK September 2013
PPTX
Embracing Failure - Fault Injection and Service Resilience at Netflix
PDF
The Java Evolution Mismatch - Why You Need a Better JVM
PDF
Kanban - A Crash Course
PDF
Ciso executive summit 2012
KEY
The guardian and app engine
PPTX
Accelerate application delivery with docker containers and windows server 2016
PDF
1Spatial: Cardiff FME World Tour: Getting started with FME
Scaling the guardian
Introduction To Serverless Architecture
Cloud fail scaling to infinity but not beyond
Site reliability in the serverless age - Serverless Boston Meetup
Engineering Netflix Global Operations in the Cloud
Quarterly Technology Briefing, Manchester, UK September 2013
Embracing Failure - Fault Injection and Service Resilience at Netflix
The Java Evolution Mismatch - Why You Need a Better JVM
Kanban - A Crash Course
Ciso executive summit 2012
The guardian and app engine
Accelerate application delivery with docker containers and windows server 2016
1Spatial: Cardiff FME World Tour: Getting started with FME
Ad

Viewers also liked (20)

PPT
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
PPTX
Storage Systems for High Scalable Systems Presentation
PPTX
Modernizing with microservices and fast data
ODP
Gibtalk aws
ODP
Diaporama Amazon Cloud Drive
PPTX
Reactive Architecture
PPTX
Dans les coulisses de Google BigQuery - Meetup Toulouse Data Science
PPT
Complex Event Processing
KEY
The Art of Scalability - Managing growth
PDF
Intro to functional programming
KEY
Scaling Teams, Processes and Architectures
PDF
Facebook chat architecture
PDF
Architecture Patterns - Open Discussion
PDF
Une introduction à Hive
PDF
Spotify: Data center & Backend buildout
PPTX
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
PDF
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
PDF
eBay Architecture
PDF
Cassandra Introduction & Features
PDF
Facebook Architecture - Breaking it Open
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
Storage Systems for High Scalable Systems Presentation
Modernizing with microservices and fast data
Gibtalk aws
Diaporama Amazon Cloud Drive
Reactive Architecture
Dans les coulisses de Google BigQuery - Meetup Toulouse Data Science
Complex Event Processing
The Art of Scalability - Managing growth
Intro to functional programming
Scaling Teams, Processes and Architectures
Facebook chat architecture
Architecture Patterns - Open Discussion
Une introduction à Hive
Spotify: Data center & Backend buildout
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
eBay Architecture
Cassandra Introduction & Features
Facebook Architecture - Breaking it Open
Ad

Similar to The challenges of live events scalability (20)

PPTX
Eric Proegler Oredev Performance Testing in New Contexts
PDF
Microservices: The Best Practices
PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
PDF
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
PDF
Canary Analyze All The Things: How We Learned to Keep Calm and Release Often
PPTX
Using Hystrix to Build Resilient Distributed Systems
PDF
Microservices - Scaling Development and Service
PPTX
Scaling Systems: Architectures that grow
PPTX
Microservices vs monolithics betabeers
PPTX
Serverless Toronto helps Startups
PDF
QCon 2015 - Microservices Track Notes
PPTX
The Hard Problems of Continuous Deployment
PPT
IBM and Node.js - Old Doge, New Tricks
PPTX
PPTX
Scaling a High Traffic Web Application: Our Journey from Java to PHP
PPTX
Scaling High Traffic Web Applications
PDF
Designing Scalable Applications
PPTX
Effective Microservices In a Data-centric World
PPTX
Sql azure cluster dashboard public.ppt
PDF
AWS Innovate: Smaller IS Better – Exploiting Microservices on AWS, Craig Dickson
Eric Proegler Oredev Performance Testing in New Contexts
Microservices: The Best Practices
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Canary Analyze All The Things: How We Learned to Keep Calm and Release Often
Using Hystrix to Build Resilient Distributed Systems
Microservices - Scaling Development and Service
Scaling Systems: Architectures that grow
Microservices vs monolithics betabeers
Serverless Toronto helps Startups
QCon 2015 - Microservices Track Notes
The Hard Problems of Continuous Deployment
IBM and Node.js - Old Doge, New Tricks
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling High Traffic Web Applications
Designing Scalable Applications
Effective Microservices In a Data-centric World
Sql azure cluster dashboard public.ppt
AWS Innovate: Smaller IS Better – Exploiting Microservices on AWS, Craig Dickson

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.

The challenges of live events scalability

  • 1. THE CHALLENGES OF SUPPORTING ONLINE LIVE EVENTS WITH TV PARTICIPATION NUMBERS A STARTUP PERSPECTIVE Presentation for B.Sc students from IDC By Guy Tomer, November 2011
  • 2. Hello • I’m Guy Tomer • Founding and working in start-ups for the last 13 years • Founder & CTO of attracTV for the last 4 years • This Presentation is about • Building a scalable system for “a lot” of users • More specifically for handling usage peaks of live TV events on the internet • Even more specifically – how we tackle it as a small start-up
  • 3. attracTV Web based self-service solution and tools for managing viewers’ engagement and interaction on the online screen Social Information Advertisement eCommerce
  • 4. Our Use Case – MTV European Music Awards • One of the biggest online live streams ever • Can’t expose precise numbers but • 7 digits ( > 1,000,000) – number of streams • 6 digits (> 100,000) – number of concurrent users • 5 digits (> 10,000) – number of users joining every minute at peak • In addition • International event, 20 sites, viewers from >150 countries • 9 languages
  • 5. What Are The Challenges 1. Scaling for these numbers 2. Handling very steep ramp-up 3. Big data 4. High availability 5. Testing & preparing for such numbers 6. The cost of the above – how to do it and still make money We’ll Discuss mainly 1,5 & 6
  • 6. Some Big “Internet Scale” Examples • Google Uses About 900,000 Servers • (Map-Reduce) Google completed sorting a ten petabyte input set took 6 hours and 27 minutes to complete on 8000 computers. • Facebook serves 1 trillion pages per months • (2010) 30 billion – Pieces of content (links, notes, photos, etc.) shared on Facebook per month. • (2010) 2 billion – The number of videos watched per day on YouTube. • Akamai, the “CDN to the starts” has 95811 (Q2 2011) servers, 1000 networks, 70 countries
  • 7. Challenge 1 – Handling The Scale • We are prepared for 400,000 concurrent viewers • HTTP polling every 10<=N<=30 seconds • This means ~20,000 HTTP R/S (requests per second) • For comparison • Stack overflow recently reported 800 R/S • Sify.com (leading portal in India) reported 3900 R/S • Jobs' death resulted in a record breaking 10,000 tweets/s (they do have a lot more requests, that’s just to feel the scale)
  • 8. What Is Scalability • From Wikipedia“Scalability is the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth.” Performance ≠ Scalability The fact that your code runs very fast for X users doesn’t mean your architecture supports 100*X users.
  • 9. Vertical Scalability (scale up) • “Get a bigger server” • “Use faster CPUs” • Cons • Can only help so much (with bad scale/$ value). • A server twice as fast is more than twice as expensive • Pros • Easier to manager less computers • Can use virtualization
  • 10. Horizontal Scalability (scale out) • “Just add another box” (or another thousand or ...) • Plan the architecture right first, do micro optimizations later • Pros • Unlimited theoretically • Works well with the cloud services elasticity • Cons • More complex to manage • More complex programming models
  • 11. Challenge #2 – Steep Ramp-up • Live Event - Everyone comes at the same time Steep ramp-up Standard website example (wikimedia) • A car can drive 250k/h doesn’t mean it can do 0- 100km/h in 4 seconds ≠
  • 12. Challenge #3 – Big Data • From Wikipedia: “Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools” • One of the biggest hypes in the industry today • During this even we had ~10,000,000 records written to our analytics system per hour • We’re not “Big Data” yet but it’s coming
  • 13. Challenge #4 – High Availability “High availability refers to a system or component that is continuously operational for a desirably long length of time.” • We need to meet a Service High availability in the cloud Level of 99.9% • Backup, failover systems are expensive • The cloud is at our help
  • 14. Challenge #5 – Testing • Simulating 100s of thousands of concurrent users… not trivial • Requires 10s of strong servers • Very difficult to collect the data • The cloud is at our help
  • 15. Challenge #6 – Handling The Costs Of Such Event (Hint- Elasticity) • For production we used ~50 servers that have 4 cores with 2GH and 15GB RAM (m1.xl) • Some options (rough estimation) for this are: • Buy - ~$3500 per box = $175,000. Not for us… • Dedicated server for a month - ~$1000 per instance = $50,000 • VPS (Virtual private server) monthly - ~300$ per box = $15,000 • Solution: Cloud on-demand (Amazon AWS) - ~$500 per instance = $25,000 for a month…. BUT … no need to take it for a month, we activate it on demand for 12 hours and it costs $416!
  • 16. Our #1 Lesson - Think Horizontal! • Why not vertical? • We don’t want it to be our business’s bottleneck at any point in time • We don’t want to buy giant servers • We wanted a cheap start • We want elasticity • We don’t want to buy anything at this point • How? (deserves a separate lecture) • Everything in the architecture • No state shared between the web/app servers • No relation between the # of users and the load on the Database
  • 17. Lesson #2 KISS • Keep It Simple Stupid • Your system architecture • Your code • Your features Hug out all the complexity in your system • Your business model • If you don’t you won’t scale, from personal experience
  • 18. Lesson #3 – Load Test Everything, Focus On Real World Usage Patterns • We did massive stress testing • We launched tens of servers just for stress testing • Automated with Jmeter and monitored the same way as production Why? • The only way to test your scaling capabilities • Looking at the code and manual tests are irrelevant • Measure the capacity of a single app server • Test the specific ramp-up scenario because • Example 1 app server = 5000 users, we need to support 200,000 users so we need to prepare at least 40 servers
  • 19. Lesson #4 – S*t Happens, Don’t Save On Real-Time Monitoring and Support • We had a series of successful big events before this one • We launched tens of servers just for the stress testing • And yet we had two problems during the event  Why? • Murphy is always (eventually) right… • Because of a feature no one uses (see lesson #2 - KISS) that wasn’t active in the stress tests • The specific usage of 9 languages caused unexpected load (see lesson #3 – stress real world scenarios) Luckily the whole team was in monitoring mode and the issues were quickly handled on the fly.
  • 20. Lesson #5 – Use The Cloud (startups) • It’s Elastic, pay on demand • Flexible when you don’t know your parameters • Solution for affordable High Availability & Testing • Focus on development • I am not getting paid by Amazon – check others as well!
  • 21. Summary - What To Remember? • Scalability is the ability of a system to handle growing amount of work with additional resources • Think horizontal • Keep It Simple (Stupid) – everything • Stress test everything, focus on real world scenarios • Monitor and Real-Time support • Cloud is great for start-ups
  • 22. The End • Questions? Comments? Consulting Preguntas? 问题 ? • Just Shy? Think you should be working in attracTV? Contact me: guy.tomer@gmail.com www.guytomer.com
  • 23. Special Thanks (presentations, websites I “borrowed” from) • Ask Bjørn Hansen (http://groups.google.com/group/scalable) • High Scalability blog http://highscalability.com/ • http://royal.pingdom.com • Google images • Entourage (http://www.hbo.com/entourage/index.html)

Editor's Notes

  • #17: If you only remember one thing from this presentation this should be it
  • #18: If you only remember two thingsfrom this presentation this should be it
  • #19: If you only remember two thingsfrom this presentation this should be it
  • #20: If you only remember two thingsfrom this presentation this should be it