SlideShare a Scribd company logo
The Stack Exchange 
Infrastructure 
Vroom Vroom
inet.perf.profile 
• SRE Generalist @ Stack Exchange 
• @GABeech 
• http://brokenhaze.com 
• http://stackexchange.com
A brief Overview 
• 560 Million Page Views a Month 
• 34TB of Data transfered a Month 
• 1665 rps (2250 peak) Across web Farm 
• WISC(HER)
Our First Priority is 
Performance 
Nobody likes a slow site, least of all us. 
When your site is slow people leave. 
Make your site fast, and the people will stay 
Good write up on moz.com: 
http://moz.com/blog/site-speed-are-you-fast-does- 
it-matter
The Performance 
toolkit 
• Mini Profiler 
• OpServer 
(https://github.com/opserver/Opserver) 
• Client Timings 
(http://teststackoverflow.com/)
Mini Profiler
OpServer
OpServer HAproxy
OpServer Redis
OpServer SQL
Client Timings
You can’t be fast if 
you are not up 
• Highly Redundant network 
• Datacenter, ISP, Edge, Core, Server, 
Port
Stack Exchange Infrastructure - LISA 14
Load Balencers 
• HAProxy 
• 2 Servers (Hot/Standby) 
• Multiple Tiers (HAProxy Processes)
Stack Exchange Infrastructure - LISA 14
SSL Termination 
• Terminated at LB 
• Feature added to HAProxy 1.5 
• See: 
http://brokenhaze.com/blog/2014/03/25/ 
how-stack-exchange-gets-the-most-out-of- 
haproxy/
Web Servers 
• IIS 
• 9 Production (2 Test/Dev) 
• Dell R610’s 
• 32GB Memory 
• 2xE5-5640
Data Tier 
• MS SQL Server 
• 4 Servers 
• 2 Always-On Clusters 
• Each Cluster 1 RW, 1 RO
Caching Tier 
• Redis 
• 2 Servers 
• Hot / Standby configuration
Tag Engine 
• Our Special index of SO 
• Tagging is hard 
• Written by Marc Gravell 
• http://blog.marcgravell.com/2014/04/technica 
l-debt-case-study-tags.html
Elastic Search 
• 203GB Index 
• 3 Machines 
• 42M searches/day
Deployment 
• Git 
• TeamCity 
• Custom Powershell Scripts
So what does this get 
you 
• 52 ms homepage render time 
• 33 ms questions page render time
Always See our 
Performance 
• http://stackexchange.com/performance
Thank YOU! 
Contact: 
@GABeech 
george@stackoverflow.com 
Office Hours: 
Wednesday, November 12th 
(today…) 
2:00pm - 3:30pm 
LISA Lab

More Related Content

PDF
Stack Exchange Infrastructure - LISA 14
PPTX
RavenDB 3.5
PPTX
RavenDB 4.0
PDF
Webinar - DreamObjects/Ceph Case Study
PPTX
RavenDB embedded at massive scales
KEY
Drupal High Availability High Performance 2012
PDF
Escalando Foursquare basado en Checkins y Recomendaciones
PDF
Postgres Open
Stack Exchange Infrastructure - LISA 14
RavenDB 3.5
RavenDB 4.0
Webinar - DreamObjects/Ceph Case Study
RavenDB embedded at massive scales
Drupal High Availability High Performance 2012
Escalando Foursquare basado en Checkins y Recomendaciones
Postgres Open

What's hot (20)

PPTX
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
PPTX
Redis Developers Day 2014 - Redis Labs Talks
PPTX
SQL Azure for ISUG(SQL Server Israeli User Group)
PPTX
Lessons from the Trenches - Building Enterprise Applications with RavenDB
PDF
Queryable State for Kafka Streamsを使ってみた
POTX
Mobile 3: Launch Like a Boss!
PDF
Velocity - Edge UG
PDF
CFWheels - Pragmatic, Beautiful Code
PDF
Solving Enterprise Integration with Apache Camel
PPTX
Building big data pipelines with Kafka and Kubernetes
PPT
Big Data DC - BenchPress
PDF
Elasticsearch JVM-MX Meetup April 2016
PDF
Redis Day Keynote Salvatore Sanfillipo Redis Labs
PPTX
I3 - Running SharePoint 2016 in Azure the do's and dont's - Jasjit Chopra
PDF
Velocity - NxtGen Oxford
PPTX
Building Ext JS Using HATEOAS - Jeff Stano
PPTX
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
PPTX
Ansible for large scale deployment
PPT
SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...
PDF
Optimising for Performance
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Redis Developers Day 2014 - Redis Labs Talks
SQL Azure for ISUG(SQL Server Israeli User Group)
Lessons from the Trenches - Building Enterprise Applications with RavenDB
Queryable State for Kafka Streamsを使ってみた
Mobile 3: Launch Like a Boss!
Velocity - Edge UG
CFWheels - Pragmatic, Beautiful Code
Solving Enterprise Integration with Apache Camel
Building big data pipelines with Kafka and Kubernetes
Big Data DC - BenchPress
Elasticsearch JVM-MX Meetup April 2016
Redis Day Keynote Salvatore Sanfillipo Redis Labs
I3 - Running SharePoint 2016 in Azure the do's and dont's - Jasjit Chopra
Velocity - NxtGen Oxford
Building Ext JS Using HATEOAS - Jeff Stano
SenchaCon 2016 - How to Auto Generate a Back-end in Minutes
Ansible for large scale deployment
SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...
Optimising for Performance
Ad

Similar to Stack Exchange Infrastructure - LISA 14 (20)

PDF
SharePoint Saturday San Antonio: SharePoint 2010 Performance
PPTX
Scaling with swagger
PDF
Boost the Performance of SharePoint Today!
PDF
SharePoint Saturday The Conference 2011 - SP2010 Performance
PDF
My Site is slow - Drupal Camp London 2013
PPTX
Be faster then rabbits
PPT
ActiveMQ 5.9.x new features
PDF
My site is slow
PDF
Scaling Social Games
PDF
Tuning Your SharePoint Environment
PDF
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
PPTX
Drupal performance
PPTX
Operationalizing MongoDB at AOL
PPTX
MongoDC 2012: "Operationalizing" MongoDB@AOL
PDF
Restful风格ž„web服务架构
ZIP
mtl_rubykaigi
PDF
Apache Geode Meetup, London
PPTX
Pascal benois performance_troubleshooting-spsbe18
PPTX
SharePoint 2013 Performance Analysis - Robi Vončina
KEY
Rack
SharePoint Saturday San Antonio: SharePoint 2010 Performance
Scaling with swagger
Boost the Performance of SharePoint Today!
SharePoint Saturday The Conference 2011 - SP2010 Performance
My Site is slow - Drupal Camp London 2013
Be faster then rabbits
ActiveMQ 5.9.x new features
My site is slow
Scaling Social Games
Tuning Your SharePoint Environment
MariaDB 10.1 what's new and what's coming in 10.2 - Tokyo MariaDB Meetup
Drupal performance
Operationalizing MongoDB at AOL
MongoDC 2012: "Operationalizing" MongoDB@AOL
Restful风格ž„web服务架构
mtl_rubykaigi
Apache Geode Meetup, London
Pascal benois performance_troubleshooting-spsbe18
SharePoint 2013 Performance Analysis - Robi Vončina
Rack
Ad

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I

Stack Exchange Infrastructure - LISA 14

  • 1. The Stack Exchange Infrastructure Vroom Vroom
  • 2. inet.perf.profile • SRE Generalist @ Stack Exchange • @GABeech • http://brokenhaze.com • http://stackexchange.com
  • 3. A brief Overview • 560 Million Page Views a Month • 34TB of Data transfered a Month • 1665 rps (2250 peak) Across web Farm • WISC(HER)
  • 4. Our First Priority is Performance Nobody likes a slow site, least of all us. When your site is slow people leave. Make your site fast, and the people will stay Good write up on moz.com: http://moz.com/blog/site-speed-are-you-fast-does- it-matter
  • 5. The Performance toolkit • Mini Profiler • OpServer (https://github.com/opserver/Opserver) • Client Timings (http://teststackoverflow.com/)
  • 12. You can’t be fast if you are not up • Highly Redundant network • Datacenter, ISP, Edge, Core, Server, Port
  • 14. Load Balencers • HAProxy • 2 Servers (Hot/Standby) • Multiple Tiers (HAProxy Processes)
  • 16. SSL Termination • Terminated at LB • Feature added to HAProxy 1.5 • See: http://brokenhaze.com/blog/2014/03/25/ how-stack-exchange-gets-the-most-out-of- haproxy/
  • 17. Web Servers • IIS • 9 Production (2 Test/Dev) • Dell R610’s • 32GB Memory • 2xE5-5640
  • 18. Data Tier • MS SQL Server • 4 Servers • 2 Always-On Clusters • Each Cluster 1 RW, 1 RO
  • 19. Caching Tier • Redis • 2 Servers • Hot / Standby configuration
  • 20. Tag Engine • Our Special index of SO • Tagging is hard • Written by Marc Gravell • http://blog.marcgravell.com/2014/04/technica l-debt-case-study-tags.html
  • 21. Elastic Search • 203GB Index • 3 Machines • 42M searches/day
  • 22. Deployment • Git • TeamCity • Custom Powershell Scripts
  • 23. So what does this get you • 52 ms homepage render time • 33 ms questions page render time
  • 24. Always See our Performance • http://stackexchange.com/performance
  • 25. Thank YOU! Contact: @GABeech george@stackoverflow.com Office Hours: Wednesday, November 12th (today…) 2:00pm - 3:30pm LISA Lab

Editor's Notes

  • #4: Windows IIS SQL Server C# HAProxy Elastic Search Redis
  • #5: Why do I bring up performance in an infra talk? simple. It drives our design decisions.
  • #7: Shown to every Dev/SRE on every page Oneboxed in our chat system
  • #8: Bubbles up problems
  • #12: How well are we actually doing when _you_ load the page
  • #13: The actual design starts now.
  • #14: 4 Different providers Selected for different characteristics Router Redundancy Hot/Standby HSRP/BGP on “T2” Full BGP tables and HSRP on T1
  • #15: 4B requests/month 3000 req/sec peak 10% CPU 18% peak Between 600k and 700k concurrent connections (EST, TIME_WAIT, ETC) Multiple Processes Allow for granular restarts and segregation of faults SSL Termination done on the LB Websockets: The weird connection Long lived TCP not HTTP
  • #16: Request flow In, is http? yes, servers: no term https, is http
  • #17: Source Port Exhaustion use 127.0.0.0/8 to resolve Server only running at ~12% cpu We don’t run full SSL everywhere yet
  • #18: 185 req/s 250 peak 15% CPU usage 20% peak
  • #19: (SO) 343 M Queries per day (SO) Peak of 7500 queries / second (SE) 216M Queries per day (SE) Peak 3200 queries / second CPU Use: SO 8% Peak 15% — SE 10% Peak 20%
  • #20: 3.65 B operations a day Peak 60,000/s 3% cpu usage
  • #21: 3 Servers, 32 GB RAM 3644 req/s 3% CPU 10% peak Replaced Full Text search in SQL Server Spins up a full copy of SO/SE Cool thing can be upgraded with 0 downtime
  • #22: 2 others/ not prod Machine learning Log stash (300TB)
  • #23: Team City monitors our Development Git repository Dev Auto builds (Deploy to Meta) When the build is verified Dev triggers Prod Build Copy Artifacts from Dev Build