SlideShare a Scribd company logo
Infrastructure
Building physical in a
virtual world!
Who am I?
Infrastructure Operations @ HootSuite
Chris Maxwell!
Lead Operations Engineer!
@WrathOfChris!
chris.maxwell@hootsuite.com!
Previously
Coral Princess, 2010!
Left: bow thrusters, core network!
Right: improvised cooling!
!
Princess Cruises – Drydock / datacenter refit team
Why should I listen to you?
Just a guy who’s been in the trenches a long time.
•  Learned to code in C long ago. BSD kernel hacking, secure
messaging, managed security appliances, nomadic file systems.!
•  >1000 wireless access points deployed to 14 cruise ships!
•  6 Cisco core network replacements from Nortel Passport!
•  First live-voyage core network replacement (Diamond Princess)!
•  Built 22 broadband wireless towers (of 75)!
•  Regional Voice-over-IPX (DSP on OS/2 over Novell !)!
Why HootSuite went physical
“unique” workload:
•  95% write
•  12TB dimension
•  I/O bound
•  Noisy
neighbours
•  pre- PIOPS
(AWS 100io/vol)
•  Need >68GB
•  No lock-in
What is “cloud”
Not a cloud definition slide!
•  Just datacenter best
practices from 1998

(infrastructures.org)!
•  Gold disk deploy - AMI!
•  Version Control - config mgmt!
•  Automate everything - APIs!
Cloud is like cutting your legs off at the knee - stop trying to walk
somewhere, just clone a new server in place – me.!
Compromising
Balancing best vs. budget
•  We chose software routers. OpenBSD + OpenBGPD on Dell!
•  We chose Cisco core switching!
•  We chose software firewalls. OpenBSD + PF on Dell!
•  We chose CloudStack on VMware!
•  We chose SAN + iSCSI!
Compromising
We chose software routers. OpenBSD + OpenBGPD on Dell
•  OpenBSD is secure, OpenBGPD is stable!
•  Scales to 1.5-2 Gbps per host, depending on packet size!
•  Redundant pairs instead of internally redundant (live upgrades!)!
•  Ops team understands BSD tools!
•  Added support for Intel 520 (82599) 10GE NICs!
•  Much lower cost than hardware routers!
Compromising
We chose Cisco core switching
•  Cisco is solid. Cisco engineers can be hired!
•  OSPF with millisecond timers = sub-second convergence!
•  Wanted 10Gig in the network core!
•  Needed minimal port count!
•  Ops team has Cisco experience.!
!
Compromising
We chose software firewalls. OpenBSD + PF on Dell
•  OpenBSD is secure, PF is stable!
•  Scales to 1-1.5 Gbps per host, depending on states/rules (~300k)!
•  CARP + Pfsync is great! We run Active+Standby, alternating
Masters.!
•  Redundant pairs instead of internally redundant (live upgrades!)!
•  Ops team understands BSD tools. Scripts sync security groups
from AWS to PF tables.!
!
Compromising
We chose CloudStack on VMware
•  2012: CloudStack more mature than OpenStack!
•  Wanted VMware hypervisor for core data services (MySQL,
Mongo)!
•  We use vMotion + HA on core services!
•  Did not want vendor lock-in, layered CloudStack for future options!
•  Original plan was mixed VMware + XenServer, but small Ops team!
Compromising
We chose SAN + iSCSI
•  We chose iSCSI for flexibility:!
•  We need snapshots. Most backups are sync+snap!
•  We like live migration of virtual machines!
•  We tolerate latency penalty of SAN for snapshot flexibility!
•  We run RAID-6 (2 parity disks)!
Tolerate 2 disk failures per slice before data loss!
Painful on write – 5,000 writes è 30,000 read + write!
Remote equipment – time to replacement is not instant!
!
SJC Stack – Core Network
BGP, OSPF, PF, on OpenBSD and Cisco!
Routers, switches, firewalls
SJC Stack – Private Cloud
CloudStack, VMware, iSCSI!
Switches, servers, storage
Network Overview
“no default” routing
Network Overview
AS 31931!
Multiple carriers, many paths
Thank You!
Chris Maxwell!
@WrathOfChris!
chris.maxwell@hootsuite.com!

More Related Content

PDF
Disaggregating Ceph using NVMeoF
PPTX
JetStor NAS 724UX and 724UX 10G ZFS appliance
PDF
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
PDF
Ceph Day San Jose - From Zero to Ceph in One Minute
PDF
Ceph Day Beijing- Ceph Community Update
PDF
Ambedded - how to build a true no single point of failure ceph cluster
PDF
SUSE - performance analysis-with_ceph
PDF
The new AMD EPYC solutions from OVHcloud: what benefits?
Disaggregating Ceph using NVMeoF
JetStor NAS 724UX and 724UX 10G ZFS appliance
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Day Beijing- Ceph Community Update
Ambedded - how to build a true no single point of failure ceph cluster
SUSE - performance analysis-with_ceph
The new AMD EPYC solutions from OVHcloud: what benefits?

What's hot (20)

PDF
inwinSTACK - ceph integrate with kubernetes
PPTX
Ceph Day KL - Ceph on ARM
PDF
NGS Informatics and Interpretation - Hardware Considerations by Michael McManus
KEY
DIY InfiniBand networking
PDF
Ceph Day Beijing - Welcome to Beijing Ceph Day
PDF
Redhat - rhcs 2017 past, present and future
PPTX
Performance analysis with_ceph
PDF
Open stack cinder
PPTX
Ceph: Low Fail Go Scale
PPTX
Walk Through a Software Defined Everything PoC
PPTX
Ceph Day Taipei - How ARM Microserver Cluster Performs in Ceph
PPTX
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
PDF
CEPH DAY BERLIN - CEPH ON THE BRAIN!
PDF
Integrating CloudStack & Ceph
PDF
Red Hat Storage Day Dallas - Storage for OpenShift Containers
PDF
Open Source vs. Open Standards by Sage Weil
PDF
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
PDF
Ceph Day Shanghai - Opening
PPTX
Red Hat Storage Day Boston - Supermicro Super Storage
PDF
Ceph Day Tokyo -- Ceph on All-Flash Storage
inwinSTACK - ceph integrate with kubernetes
Ceph Day KL - Ceph on ARM
NGS Informatics and Interpretation - Hardware Considerations by Michael McManus
DIY InfiniBand networking
Ceph Day Beijing - Welcome to Beijing Ceph Day
Redhat - rhcs 2017 past, present and future
Performance analysis with_ceph
Open stack cinder
Ceph: Low Fail Go Scale
Walk Through a Software Defined Everything PoC
Ceph Day Taipei - How ARM Microserver Cluster Performs in Ceph
Born to be fast! - Aviram Bar Haim - OpenStack Israel 2017
CEPH DAY BERLIN - CEPH ON THE BRAIN!
Integrating CloudStack & Ceph
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Open Source vs. Open Standards by Sage Weil
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Ceph Day Shanghai - Opening
Red Hat Storage Day Boston - Supermicro Super Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ad

Viewers also liked (11)

PDF
hotdog a TD tool for DD
PDF
Diary of Support Engineer
PDF
Plazma - Treasure Data’s distributed analytical database -
PDF
Presto as a Service - Tips for operation and monitoring
PDF
Internals of Presto Service
PDF
Understanding Presto - Presto meetup @ Tokyo #1
PDF
Treasure Data Intro for Data Enthusiast!!
PDF
Lightning fast genomics with Spark, Adam and Scala
PDF
Visual Design with Data
PDF
3 Things Every Sales Team Needs to Be Thinking About in 2017
PDF
How to Become a Thought Leader in Your Niche
hotdog a TD tool for DD
Diary of Support Engineer
Plazma - Treasure Data’s distributed analytical database -
Presto as a Service - Tips for operation and monitoring
Internals of Presto Service
Understanding Presto - Presto meetup @ Tokyo #1
Treasure Data Intro for Data Enthusiast!!
Lightning fast genomics with Spark, Adam and Scala
Visual Design with Data
3 Things Every Sales Team Needs to Be Thinking About in 2017
How to Become a Thought Leader in Your Niche
Ad

Similar to Building Physical in a Virtual World (20)

PDF
The Evolving Data Center – Past, Present and Future
PPTX
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
PDF
Five Years of EC2 Distilled
PDF
Infrastructure as code managing servers in the cloud Morris 2024 scribd download
PPTX
OpenStack and the Future of Application Centric Infrastructure
PDF
Building a Small DC
PDF
nutanix-infrastructure-presentation01.pdf
PDF
Building a Small Datacenter
PDF
How DreamHost builds a Public Cloud with OpenStack
PDF
How DreamHost builds a public cloud with OpenStack.pdf
PPTX
Resource Monitoring and Management II
PPTX
Hyper-Convergence: Worth the Hype?
ODP
Future of Sysadmin 2014
PDF
Zero to OpenStack cloud in 90 minutes
PDF
Infrastructure as code managing servers in the cloud Morris
PDF
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
PPT
Planning for-high-performance-web-application
PDF
Infrastructure as code managing servers in the cloud Morris
PDF
OSCON 2012 OpenStack Automation and DevOps Best Practices
PDF
Practice and challenges from building IaaS
The Evolving Data Center – Past, Present and Future
Openstack Summit Tokyo 2015 - Building a private cloud to efficiently handle ...
Five Years of EC2 Distilled
Infrastructure as code managing servers in the cloud Morris 2024 scribd download
OpenStack and the Future of Application Centric Infrastructure
Building a Small DC
nutanix-infrastructure-presentation01.pdf
Building a Small Datacenter
How DreamHost builds a Public Cloud with OpenStack
How DreamHost builds a public cloud with OpenStack.pdf
Resource Monitoring and Management II
Hyper-Convergence: Worth the Hype?
Future of Sysadmin 2014
Zero to OpenStack cloud in 90 minutes
Infrastructure as code managing servers in the cloud Morris
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Planning for-high-performance-web-application
Infrastructure as code managing servers in the cloud Morris
OSCON 2012 OpenStack Automation and DevOps Best Practices
Practice and challenges from building IaaS

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
A Presentation on Artificial Intelligence
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
1. Introduction to Computer Programming.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Getting Started with Data Integration: FME Form 101
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding

Building Physical in a Virtual World

  • 2. Who am I? Infrastructure Operations @ HootSuite Chris Maxwell! Lead Operations Engineer! @WrathOfChris! chris.maxwell@hootsuite.com!
  • 3. Previously Coral Princess, 2010! Left: bow thrusters, core network! Right: improvised cooling! ! Princess Cruises – Drydock / datacenter refit team
  • 4. Why should I listen to you? Just a guy who’s been in the trenches a long time. •  Learned to code in C long ago. BSD kernel hacking, secure messaging, managed security appliances, nomadic file systems.! •  >1000 wireless access points deployed to 14 cruise ships! •  6 Cisco core network replacements from Nortel Passport! •  First live-voyage core network replacement (Diamond Princess)! •  Built 22 broadband wireless towers (of 75)! •  Regional Voice-over-IPX (DSP on OS/2 over Novell !)!
  • 5. Why HootSuite went physical “unique” workload: •  95% write •  12TB dimension •  I/O bound •  Noisy neighbours •  pre- PIOPS (AWS 100io/vol) •  Need >68GB •  No lock-in
  • 6. What is “cloud” Not a cloud definition slide! •  Just datacenter best practices from 1998
 (infrastructures.org)! •  Gold disk deploy - AMI! •  Version Control - config mgmt! •  Automate everything - APIs! Cloud is like cutting your legs off at the knee - stop trying to walk somewhere, just clone a new server in place – me.!
  • 7. Compromising Balancing best vs. budget •  We chose software routers. OpenBSD + OpenBGPD on Dell! •  We chose Cisco core switching! •  We chose software firewalls. OpenBSD + PF on Dell! •  We chose CloudStack on VMware! •  We chose SAN + iSCSI!
  • 8. Compromising We chose software routers. OpenBSD + OpenBGPD on Dell •  OpenBSD is secure, OpenBGPD is stable! •  Scales to 1.5-2 Gbps per host, depending on packet size! •  Redundant pairs instead of internally redundant (live upgrades!)! •  Ops team understands BSD tools! •  Added support for Intel 520 (82599) 10GE NICs! •  Much lower cost than hardware routers!
  • 9. Compromising We chose Cisco core switching •  Cisco is solid. Cisco engineers can be hired! •  OSPF with millisecond timers = sub-second convergence! •  Wanted 10Gig in the network core! •  Needed minimal port count! •  Ops team has Cisco experience.! !
  • 10. Compromising We chose software firewalls. OpenBSD + PF on Dell •  OpenBSD is secure, PF is stable! •  Scales to 1-1.5 Gbps per host, depending on states/rules (~300k)! •  CARP + Pfsync is great! We run Active+Standby, alternating Masters.! •  Redundant pairs instead of internally redundant (live upgrades!)! •  Ops team understands BSD tools. Scripts sync security groups from AWS to PF tables.! !
  • 11. Compromising We chose CloudStack on VMware •  2012: CloudStack more mature than OpenStack! •  Wanted VMware hypervisor for core data services (MySQL, Mongo)! •  We use vMotion + HA on core services! •  Did not want vendor lock-in, layered CloudStack for future options! •  Original plan was mixed VMware + XenServer, but small Ops team!
  • 12. Compromising We chose SAN + iSCSI •  We chose iSCSI for flexibility:! •  We need snapshots. Most backups are sync+snap! •  We like live migration of virtual machines! •  We tolerate latency penalty of SAN for snapshot flexibility! •  We run RAID-6 (2 parity disks)! Tolerate 2 disk failures per slice before data loss! Painful on write – 5,000 writes è 30,000 read + write! Remote equipment – time to replacement is not instant! !
  • 13. SJC Stack – Core Network BGP, OSPF, PF, on OpenBSD and Cisco! Routers, switches, firewalls
  • 14. SJC Stack – Private Cloud CloudStack, VMware, iSCSI! Switches, servers, storage
  • 16. Network Overview AS 31931! Multiple carriers, many paths