Varnish at the BBC
Winning Gold in the London 2012 Olympic Games




                                  Graham Lyons
Varnish at the BBC
● First deployed in 2009
   ○ Specifically caching layer for iPlayer
   ○ New dynamic Platform
● Platform has grown to 100s of applications

How do we scale Varnish across the Platform?


(It served LOTS of traffic during the Olympics)
In the BBC Infrastructure
● bbc.co.uk is made up of lots of applications
● Load balancer in front
● Sends request to Varnish
● Varnish sends request to another load
  balancer
● Second layer of load balancer distributes
  load across application servers
    ○ All applications installed on all servers
How it looks
Routing
● First load balancer adds header with name of
  a pool of servers
● Varnish forwards it on
● Second load balancer knows what to do with
  the header to route the request
How do we use Varnish
● General HTTP cache
● Make use of header manipulation for more
  efficient caching, e.g.
  ○ GeoIP
  ○ Device detection
  ○ Cookie decomposition
In 2009...
● Application logic in VCL
● Very small number of applications so it was
  manageable
Where should we take it?
● BBC Platform HTTP cache
● Platform-wide features
● Different requirements to application-
  specific Varnish
...2012 (What we changed)
● Removed application logic (mostly)
● Added features to be used generally
  ○ e.g. GeoIP, Device detection
● Features on by default - no special
  configuration
● Try to stay vanilla and RFC2616(ish)
Features? What features?
● GeoIP lookup
● Device meta information
● Cookie decomposition
  ○ 'Signed in' header


All exposed as headers added to the request

Companion PHP libraries to manage header
access and Vary header on response
Geo and Device Information
● Looked up via an HTTP call to respective
  services
● Logic in C library
● Cached locally (in process, in memory cache)
  ○ 70% hit for geoip
  ○ >95% hit for device data
Cookies?
● Incoming Cookie header split into a header
  for each value
● e.g. Cookie: UID=4321...
   ○ ...becomes: X-Cookie-UID: 4321


Actually only operates on cookie values with
particular prefixes (introduced for the Great EU
Cookie Debacle)
'Signed in' header
● Boolean
   ○ Signed in
   ○ Not signed in
● Allows caching of page for 'not signed in'
  state
Cache Variations
All these features allow more efficient cache
variations.

Can cache variations based on:
● where the user is
● what type of device they're using
● any personalisations

e.g. Norwegian Android user who loves
Eastenders gets served straight from the cache
Response to outside world
● External caches don't know about request
  headers Varnish adds
● Responses have to be reduced to being
  privately cacheable
● GeoIP exception
  ○ lookup is done on the last step outside our
    infrastructure
Vary: Cookie?
● Originally planned to send this out for
  responses using X-Cookie-...
● Analytics cookie on site
● Changes on each page...
● Send responses out as uncacheable
Setting a Unique Cookie
● Previously sent from backend
● Generate unique ID cookie in Varnish
● Allows cookie to be set and content served
  from cache
Feedback features...
● How well is the cache being used?
● Record per application hit/miss ratios
Big Sporting Event, 2012
Big Sporting Event, 2012
"Don't f*** up the Olympics..."
Olympic Requirements
● UK and non-UK versions
● Mobile and Desktop versions
● Traffic served by multiple applications
Olympic Requirements
● UK and non-UK versions
● Mobile and Desktop versions
● Traffic served by multiple applications

I think we can handle this...
Special preparations?
Special preparations?

Reduced number of Varnish servers
Crosstown Traffic
Olympics Daily Peak:
● 10.4 million browsers to bbc.co.uk/sport
● 8 million UK
● 2.4 million International
● (Record numbers)
Device Split
Mainly mobile and desktop
So that went well
What didn't work for us?
Varnish and HD Streaming
●   24 HD streams
●   Planned to use Varnish at the front
●   Cached very, very well
●   Needed to be highly available
●   HA layer didn't hold up
●   Had to use a load balancer instead and use
    the cache there
What else has hurt?
ESI
● Increase in complexity
● Working out 'best practice'
● Seg faults!
  ○ Overflow of sess_workspace
However...
● Synthetic end point generated in Varnish
● Included as ESI
● Very good performance...
  ○ Almost 4 times previous load
Other pains
● No Saint mode
  ○ Load balancing behind and multiple apps
● Network bandwidth
  ○ As few boxes as possible
Next?
● Everywhere!
  ○ Ubiquitous caching layer
  ○ Already have most big players
● More monitoring
● Version 3
  ○ VMODs?
● Make it simpler
  ○ Remove anything we can
tl;dr
Took Varnish from being an application-
specific component to a Platform-wide essential
Questions?




             Graham Lyons
Questions?
(Yes, we're hiring...)




                         Graham Lyons

More Related Content

PDF
Ceph Block Devices: A Deep Dive
PDF
Step by Step - Reusing old features to build new ones
ODP
Ovirt and gluster_hyperconvergence_devconf-2016
ODP
Storage best practices
PPTX
Improving hyperconverged performance
PDF
Kvm forum 2013 - future integration points for oVirt storage
PDF
Thierry carrez openly developing open infrastructure
PDF
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
Ceph Block Devices: A Deep Dive
Step by Step - Reusing old features to build new ones
Ovirt and gluster_hyperconvergence_devconf-2016
Storage best practices
Improving hyperconverged performance
Kvm forum 2013 - future integration points for oVirt storage
Thierry carrez openly developing open infrastructure
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek

What's hot (20)

PDF
Disaster Recovery Strategies Using oVirt's new Storage Connection Management ...
PDF
nebulaconf
ODP
Managing ceph through_oVirt_using_Cinder
PDF
Virtualization Management The oVirt Way (August Penguin 2015)
PDF
Using Ceph in OStack.de - Ceph Day Frankfurt
ODP
Disaster Recovery in oVirt
PDF
Boosting I/O Performance with KVM io_uring
ODP
oVirt 3.6 Deep Dive: Refresh LUN size
ODP
Deploying and managing gluster using ovirt - fudcon2015
PDF
OpenNebulaConf 2016 - OpenNebula 5.0 Highlights and Beyond by Ruben S. Monter...
PDF
Dynomite - PerconaLive 2017
PDF
BKK16-315 Graphics Stack Update
ODP
20160401 Gluster-roadmap
PDF
BKK16-411 Devicetree Specification
PDF
A real world use case with OSGi R7 - Jurgen Albert (Data In Motion Consulting...
PDF
WSO2Con USA 2015: WSO2 DevOps: How to Deploy, Manage, Administer and Monitor ...
PDF
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
PDF
BKK16-507 AOSP builds of Linaro with CI v2
PDF
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
PPTX
Azure functions: from a function to a whole application in 60 minutes
Disaster Recovery Strategies Using oVirt's new Storage Connection Management ...
nebulaconf
Managing ceph through_oVirt_using_Cinder
Virtualization Management The oVirt Way (August Penguin 2015)
Using Ceph in OStack.de - Ceph Day Frankfurt
Disaster Recovery in oVirt
Boosting I/O Performance with KVM io_uring
oVirt 3.6 Deep Dive: Refresh LUN size
Deploying and managing gluster using ovirt - fudcon2015
OpenNebulaConf 2016 - OpenNebula 5.0 Highlights and Beyond by Ruben S. Monter...
Dynomite - PerconaLive 2017
BKK16-315 Graphics Stack Update
20160401 Gluster-roadmap
BKK16-411 Devicetree Specification
A real world use case with OSGi R7 - Jurgen Albert (Data In Motion Consulting...
WSO2Con USA 2015: WSO2 DevOps: How to Deploy, Manage, Administer and Monitor ...
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
BKK16-507 AOSP builds of Linaro with CI v2
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Azure functions: from a function to a whole application in 60 minutes
Ad

Similar to Varnish at the BBC (20)

PDF
Multi Streaming Player
PDF
Deploy Eclipse hawBit in Production
PDF
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
PDF
Html5 storage suggestions for challenges.pptx
PDF
BlackRay - The open Source Data Engine
PDF
Continuous Deployment Applied at MyHeritage
PDF
getting started with varnish
PDF
Google App Engine Overview and Update
PDF
LCU14 310- Cisco ODP v2
PDF
Blackray @ SAPO CodeBits 2009
PDF
Microservices at Mercari
PDF
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
PDF
Eclipse Hara, Updating Embedded Devices with hawkBit Made Easy
PDF
SPDY and What to Consider for HTTP/2.0
PPTX
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
PPTX
Post Mortem Debugging in Embedded Linux Systems
ODP
Continuous delivery of Windows micro services in the cloud
PDF
Cache hcm-topdev
PDF
Cache hcm-topdev
Multi Streaming Player
Deploy Eclipse hawBit in Production
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Html5 storage suggestions for challenges.pptx
BlackRay - The open Source Data Engine
Continuous Deployment Applied at MyHeritage
getting started with varnish
Google App Engine Overview and Update
LCU14 310- Cisco ODP v2
Blackray @ SAPO CodeBits 2009
Microservices at Mercari
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Eclipse Hara, Updating Embedded Devices with hawkBit Made Easy
SPDY and What to Consider for HTTP/2.0
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Post Mortem Debugging in Embedded Linux Systems
Continuous delivery of Windows micro services in the cloud
Cache hcm-topdev
Cache hcm-topdev
Ad

Varnish at the BBC

  • 1. Varnish at the BBC Winning Gold in the London 2012 Olympic Games Graham Lyons
  • 2. Varnish at the BBC ● First deployed in 2009 ○ Specifically caching layer for iPlayer ○ New dynamic Platform ● Platform has grown to 100s of applications How do we scale Varnish across the Platform? (It served LOTS of traffic during the Olympics)
  • 3. In the BBC Infrastructure ● bbc.co.uk is made up of lots of applications ● Load balancer in front ● Sends request to Varnish ● Varnish sends request to another load balancer ● Second layer of load balancer distributes load across application servers ○ All applications installed on all servers
  • 5. Routing ● First load balancer adds header with name of a pool of servers ● Varnish forwards it on ● Second load balancer knows what to do with the header to route the request
  • 6. How do we use Varnish ● General HTTP cache ● Make use of header manipulation for more efficient caching, e.g. ○ GeoIP ○ Device detection ○ Cookie decomposition
  • 7. In 2009... ● Application logic in VCL ● Very small number of applications so it was manageable
  • 8. Where should we take it? ● BBC Platform HTTP cache ● Platform-wide features ● Different requirements to application- specific Varnish
  • 9. ...2012 (What we changed) ● Removed application logic (mostly) ● Added features to be used generally ○ e.g. GeoIP, Device detection ● Features on by default - no special configuration ● Try to stay vanilla and RFC2616(ish)
  • 10. Features? What features? ● GeoIP lookup ● Device meta information ● Cookie decomposition ○ 'Signed in' header All exposed as headers added to the request Companion PHP libraries to manage header access and Vary header on response
  • 11. Geo and Device Information ● Looked up via an HTTP call to respective services ● Logic in C library ● Cached locally (in process, in memory cache) ○ 70% hit for geoip ○ >95% hit for device data
  • 12. Cookies? ● Incoming Cookie header split into a header for each value ● e.g. Cookie: UID=4321... ○ ...becomes: X-Cookie-UID: 4321 Actually only operates on cookie values with particular prefixes (introduced for the Great EU Cookie Debacle)
  • 13. 'Signed in' header ● Boolean ○ Signed in ○ Not signed in ● Allows caching of page for 'not signed in' state
  • 14. Cache Variations All these features allow more efficient cache variations. Can cache variations based on: ● where the user is ● what type of device they're using ● any personalisations e.g. Norwegian Android user who loves Eastenders gets served straight from the cache
  • 15. Response to outside world ● External caches don't know about request headers Varnish adds ● Responses have to be reduced to being privately cacheable ● GeoIP exception ○ lookup is done on the last step outside our infrastructure
  • 16. Vary: Cookie? ● Originally planned to send this out for responses using X-Cookie-... ● Analytics cookie on site ● Changes on each page... ● Send responses out as uncacheable
  • 17. Setting a Unique Cookie ● Previously sent from backend ● Generate unique ID cookie in Varnish ● Allows cookie to be set and content served from cache
  • 18. Feedback features... ● How well is the cache being used? ● Record per application hit/miss ratios
  • 20. Big Sporting Event, 2012 "Don't f*** up the Olympics..."
  • 21. Olympic Requirements ● UK and non-UK versions ● Mobile and Desktop versions ● Traffic served by multiple applications
  • 22. Olympic Requirements ● UK and non-UK versions ● Mobile and Desktop versions ● Traffic served by multiple applications I think we can handle this...
  • 25. Crosstown Traffic Olympics Daily Peak: ● 10.4 million browsers to bbc.co.uk/sport ● 8 million UK ● 2.4 million International ● (Record numbers)
  • 27. So that went well What didn't work for us?
  • 28. Varnish and HD Streaming ● 24 HD streams ● Planned to use Varnish at the front ● Cached very, very well ● Needed to be highly available ● HA layer didn't hold up ● Had to use a load balancer instead and use the cache there
  • 29. What else has hurt? ESI ● Increase in complexity ● Working out 'best practice' ● Seg faults! ○ Overflow of sess_workspace
  • 30. However... ● Synthetic end point generated in Varnish ● Included as ESI ● Very good performance... ○ Almost 4 times previous load
  • 31. Other pains ● No Saint mode ○ Load balancing behind and multiple apps ● Network bandwidth ○ As few boxes as possible
  • 32. Next? ● Everywhere! ○ Ubiquitous caching layer ○ Already have most big players ● More monitoring ● Version 3 ○ VMODs? ● Make it simpler ○ Remove anything we can
  • 33. tl;dr Took Varnish from being an application- specific component to a Platform-wide essential
  • 34. Questions? Graham Lyons