OpenStack: Virtual Routers On Compute Nodes

Download as PPTX, PDF

1 like1,875 views

The document discusses the implementation of distributed virtual routers (VR-D) on compute nodes to address network reliability issues in an OpenStack environment. It outlines the architectural decisions, including colocating routers with virtual machines and the scalability of the L3 agent, while highlighting the achieved benefits and operational challenges. The solution has proven effective in production, paving the way for future enhancements such as high availability routers.

Technology

Virtual Routers on Compute Nodes:
A (Not So) Irrational Decision?

• Neutron with OVS and VXLAN tenant networks
• Kilo release
• Virtual Routers hosted on three control nodes
• No HA routers
In the Beginning

• We had major network reliability issues
• Customers were being DOSed
• Environment was running out of capacity
• We had some misconfiguration that was hard to fix
• Network upgrade was months behind schedule
• Impact of control node failure was huge
• Need to reduce failure impact
The Problem

OpenStack: Virtual Routers On Compute Nodes

The Dumb Idea
• Got together to brainstorm options
• Could we colocate routers with another service?
–Spread routers
–Spread load
–Reduce failure group size
• What about compute nodes?
• Is this a bad idea?
• Why don’t other people do this?

• Two dual-port 10G Intel X520 NICs
– Cross card, two port LACP for tenant traffic
• 1U Cisco C220 rack mount servers
– Intel E5-2650 processors
– 256 GB RAM
• OpenStack Neutron
– OVS
– VxLAN
• Legacy Virtual Router testing:
– Server Bandwidth consumed? ALL of it!
– Server RAM and CPU consumed? Negligible even with OVS
TWC Configurations and Testing

• Traditional network nodes
– Gigantic sized
– Horizontally scaled smaller servers
• DVR - Distributed Virtual Router
• “VR-D” - Virtual Routers - Distributed
• Other solutions away from mainline Neutron
All The Options

• Gigantic dedicated network nodes?
– Servers are generally idle
– Large failure domains
– Long rebuild times
• Why not scale dedicated network nodes horizontally?
– Servers are generally idle
• Resource Usage:
– RAM and CPU - incredibly low.
– Bandwidth!
Network Nodes - Detail

• DVR - network nodes have less responsibility overall
• FloatingIP SNAT takes place on the compute node
– L3 Agent is required on compute nodes
• FixedIP SNAT takes place on network nodes
– Still requires a Virtual Router for external gateway
– HA planned for Newton
• At TWC we were concerned about DVR’s:
– Readiness for production?
– Scale issues?
– Operational tooling changes
– Current use of many Floating IPs
– Massive customer conversion to TWC OpenStack
DVR - A Layman’s Summary

DVR Packet Paths
● East-West between VMs (orange)
● North-South with a FloatingIP (purple)
● North-South without a FloatingIP
(green)
● Other Cases (blue):
○ VMs on the same compute node.
○ VM and router on same compute
node.

• “VR-D” is “Virtual Routers - Distributed”
• Traditional Virtual Routers that cohabitate with VMs on
compute nodes!
• Servers running both L3 Agent and nova-compute
• Virtual Routers are either Legacy or HA
• Our current choice is not to include DHCP Agents
• What about?
– VM and Virtual Router bandwidth contention?
“VR-D” - what we made up

VR-D Packet Paths
● East-West between VMs (orange)
● North-South with a FloatingIP (green)
● North-South without a FloatingIP (green)
○ Chief difference between VR-D
and DVR!
● Other Cases (blue):
○ VMs on the same compute node.
○ VM and router on same compute
node (shown)

• Implementation: Surprisingly Easy
• Puppet to put l3-agent on all compute nodes
• Forgot about Metadata agent
• Manageable Issues
Implementation and Automation

L3 Agent Scalability (LP#1498844)
• #1 problem we’ve encountered
• L3 agent queries handled by single thread in
neutron-server
• Fixed in Mitaka, backport stalled

L3 Agent Scalability (LP#1498844)
• Rabbit Queue for L3 agent falls behind
• Falls behind with status checks with 100 L3 agents
• Restarts request full state - resource hog
• Rolling restarts had to be rate limited

Operational Complexity
• One more thing to check when a node fails
• Tooling has to be updated
• Monitoring has to be updated

Where are we going?
Generally this “VR-D” solution is working well in
production
• HA routers
• Custom router scheduling
• Routers on all compute nodes
• DVR?

Questions?
Clayton O’Neill
– clayton.oneill@twcable.com
– IRC: clayton
– Twitter: @clayton_oneill
Sean Lynn
–sean.lynn@twcable.com
–IRC: trad511

OpenStack: Virtual Routers On Compute Nodes

1. Virtual Routers on Compute Nodes: A (Not So) Irrational Decision?

2. • Neutron with OVS and VXLAN tenant networks • Kilo release • Virtual Routers hosted on three control nodes • No HA routers In the Beginning

3. • We had major network reliability issues • Customers were being DOSed • Environment was running out of capacity • We had some misconfiguration that was hard to fix • Network upgrade was months behind schedule • Impact of control node failure was huge • Need to reduce failure impact The Problem

5. The Dumb Idea • Got together to brainstorm options • Could we colocate routers with another service? –Spread routers –Spread load –Reduce failure group size • What about compute nodes? • Is this a bad idea? • Why don’t other people do this?

6. • Two dual-port 10G Intel X520 NICs – Cross card, two port LACP for tenant traffic • 1U Cisco C220 rack mount servers – Intel E5-2650 processors – 256 GB RAM • OpenStack Neutron – OVS – VxLAN • Legacy Virtual Router testing: – Server Bandwidth consumed? ALL of it! – Server RAM and CPU consumed? Negligible even with OVS TWC Configurations and Testing

7. • Traditional network nodes – Gigantic sized – Horizontally scaled smaller servers • DVR - Distributed Virtual Router • “VR-D” - Virtual Routers - Distributed • Other solutions away from mainline Neutron All The Options

8. Network Nodes - Overview

9. • Gigantic dedicated network nodes? – Servers are generally idle – Large failure domains – Long rebuild times • Why not scale dedicated network nodes horizontally? – Servers are generally idle • Resource Usage: – RAM and CPU - incredibly low. – Bandwidth! Network Nodes - Detail

10. • DVR - network nodes have less responsibility overall • FloatingIP SNAT takes place on the compute node – L3 Agent is required on compute nodes • FixedIP SNAT takes place on network nodes – Still requires a Virtual Router for external gateway – HA planned for Newton • At TWC we were concerned about DVR’s: – Readiness for production? – Scale issues? – Operational tooling changes – Current use of many Floating IPs – Massive customer conversion to TWC OpenStack DVR - A Layman’s Summary

11. DVR Packet Paths ● East-West between VMs (orange) ● North-South with a FloatingIP (purple) ● North-South without a FloatingIP (green) ● Other Cases (blue): ○ VMs on the same compute node. ○ VM and router on same compute node.

12. • “VR-D” is “Virtual Routers - Distributed” • Traditional Virtual Routers that cohabitate with VMs on compute nodes! • Servers running both L3 Agent and nova-compute • Virtual Routers are either Legacy or HA • Our current choice is not to include DHCP Agents • What about? – VM and Virtual Router bandwidth contention? “VR-D” - what we made up

13. VR-D Packet Paths ● East-West between VMs (orange) ● North-South with a FloatingIP (green) ● North-South without a FloatingIP (green) ○ Chief difference between VR-D and DVR! ● Other Cases (blue): ○ VMs on the same compute node. ○ VM and router on same compute node (shown)

14. • Implementation: Surprisingly Easy • Puppet to put l3-agent on all compute nodes • Forgot about Metadata agent • Manageable Issues Implementation and Automation

15. L3 Agent Scalability (LP#1498844) • #1 problem we’ve encountered • L3 agent queries handled by single thread in neutron-server • Fixed in Mitaka, backport stalled

16. L3 Agent Scalability (LP#1498844) • Rabbit Queue for L3 agent falls behind • Falls behind with status checks with 100 L3 agents • Restarts request full state - resource hog • Rolling restarts had to be rate limited

17. Operational Complexity • One more thing to check when a node fails • Tooling has to be updated • Monitoring has to be updated

18. Where are we going? Generally this “VR-D” solution is working well in production • HA routers • Custom router scheduling • Routers on all compute nodes • DVR?

19. Questions? Clayton O’Neill – clayton.oneill@twcable.com – IRC: clayton – Twitter: @clayton_oneill Sean Lynn –sean.lynn@twcable.com –IRC: trad511

Editor's Notes

#2: CLAYTON: I’m Clayton O’Neill and this is Sean Lynn. We’re both Principal Engineers at Time Warner Cable. We’re here to tell you about some network architecture changes we’ve made recently that we think are a little unusual and hope that you’ll find interesting Learn the production pros and cons of operating Neutron legacy and HA routers on compute nodes in your production cloud. Not ready for DVR or third-party network overhauls? Virtual router network “hot spots” got you down? Large virtual router failure domains keeping you up late at night? Neutron reference architectures not providing a scalable routing solution? If you answered yes to any of these questions then this talk is for you. What can I expect to learn? Real world performance and system resource use of virtual routers Neutron and underlay network traffic patterns, including encapsulation effects and "tromboning" Consideration of the impact of Open vSwitch flows on this methodology Required tuning of L3 Agents and message queueing Ensuring "good neighbor" effects between resources shared for networking and compute Discussion of why and when this type of implementation might be a mistake
#3: Our story starts at the beginning of 2016 Neutron with OVS and VXLAN tenant networks Kilo release Virtual Routers hosted on three control nodes We didn’t have HA routers implemented Wasn’t real sure how mature HA routers was, and…. We were using the L2 population driver, which wasn’t supported on Kilo with HA routers
#4: So at the beginning of the year, Sean and I both had back to back on-call rotations We had major network reliability issues Customers were being DOSed One of our environments was running out of capacity We had some NIC misconfiguration that we couldn’t fix without rebooting the control nodes and compute nodes, which would cause even more downtime Network upgrade was months behind schedule that would fix most of these issues When we had capacity issues, control nodes hosting routers would get overloaded Because we had other existing problems, sometimes they would crash We’d lose network connectivity for a third of our customers Everyone has nodes fail, but the impact of particular failure mode was unacceptable We started working on fixing all of these issues and one the things we did was rethinking our Neutron deployment architecture So we started thinking about what sorts of architectures does Neutron support, and “upgrading” to dedicated network nodes was an option that came up a lot
#5: If you go look at OpenStack reference architectures in the official documentation, you’ll run across this diagram which shows one of the deployment options: This diagram shows dedicated network nodes With this approach, we would be moving our virtual routers off of our control nodes on to dedicated nodes However, we weren’t sure how much load on our control nodes was due to network vs other services Sean will talk about this in more detail, but his benchmarking showed that these boxes would probably be idle most of the time. Turns out, pushing gigabits of traffic doesn’t take much out of a modern server So if we set up dedicated network nodes, we potentially end up with a number of servers basically sitting around idle, and what is the benefit to that? We also had questions about how many network nodes would make sense If we implemented 3 network nodes, then we have the same failure group size If we lose a single network node, we’d still lose network connectivity for a third of our customers If we go to 5 or even 10 network nodes, we’re “wasting” even more hardware that could be hosting instances The idea of having 3 or 5 or 10 servers sitting around mostly idle really seemed like a bad way of doing things. We were having a hard time coming up with other ideas
#6: CLAYTON Got together to discuss what our requirements really are, and figure out what our architecture options are What are the pros and cons of each option? So one of the things I started wondering is: if this virtual router service doesn’t require much in the way of CPU, could we colocate it with another service other than control plane? Maybe if we could pick a service that has a lot of nodes (more than 3 like control nodes) This would spread the routers around That would spread the load even more It would also reduce our failure group size So we had this dumb idea, what if we put them on compute nodes? We have lots of compute nodes Compute nodes failure already has similar impact to customers when they fail, so adding virtual routers doesn’t make them *more* important Is this a bad idea? Why don’t other people do this? We talked with our other team members, some other operators and some Neutron developers We got some feedback on issues to think about, but mostly people seemed to think it’d work ok We started coming up with a list of pros and cons for this approach and discussing it internally Sean is going to talk about that in more detail
#7: SEAN Thanks Clayton. Before we look all the options in detail, let’s be very clear. At TWC we have some very particular server and network abilities that influence our operational decisions. We have: Lots of bandwidth and powerful networking! Lots of RAM and CPU! And as Clayton mentioned we are using OVS with VxLAN overlay networking. These mean we have reduced hardware-based design constraints. We also spent some time testing the load of legacy and HA routers on a server. We tested both single router and 50 router scenarios. An expected result is that Virtual Routers can consume every bit of server bandwidth. A slightly unexpected result is that Virtual Routers have very little overhead in server RAM, CPU and overall load Our testing quantified “very little” as a couple percentage points increase in each category. Yes, that’s right, negligible increases in RAM, CPU and server load caused by Virtual Routers in most scenarios.
#8: SEAN As Clayton explained, we had our crazy idea, but we also considered several solution options A couple of which seem quite similar. First: Traditional network nodes either with smaller quantities of large capacity servers with lots of bandwidth per server, or horizontal scaling of smaller capacity servers. Second: DVR - where network nodes take on far less responsibility Third: And then there’s our idea, which we are calling “VR-D” - and merges the idea of network and compute nodes. It’s not really anything groundbreaking. It’s more of a novel implementation of an existing reference architecture. Other - like Akanda’s Astara, Juniper’s Contrail and a growing number of other solutions. To be honest, all these architectures are useful in certain cases given all the tradeoffs. But we want mainline unless its solutions are Not providing solutions our customers need Too unreliable Not scalable We also require a clear and relatively painless upgrade path for any changes. In general mainline Neutron provides us with an adequate solution That leaves us with: Network Nodes DVR “VR-D” So let’s discuss these in more detail
#9: SEAN Let’s start with a more detailed discussion of Network nodes You’ve seen this diagram before and this is what the reference architecture for network nodes looks like. This is essentially what we were doing at TWC, but we colocated the network node functionality with control node functionality - they weren’t dedicated network nodes. Why pick network nodes? Easily scalable Easy operations troubleshooting High EW v Low NS loads
#10: SEAN Let’s talk a bit about network node details. We kind of have two general patterns to follow with respect to network nodes. In the case of gigantic network nodes. These are a small number of dedicated servers with LOTs of bandwidth. Remember, our testing showed incredibly low server CPU and RAM needs. Consider hundreds of routers needing to be built simultaneously Neutron-server scheduler scalability Messaging load Physical time to build out hundreds of routers Better w/ native OVS interfaces Better in Liberty These servers are largely idle, even with router passing a lot of traffic. With horizontally scaled network nodes as an option we alleviate some of the failure domain issues, but still - Servers are largely idle. Our conclusion was that network nodes used so little server resources that it was a waste of capacity Or would cause unnecessary operational headaches of a specialty node type. What do I mean by incredibly low resource usage? Tested a 1 and a 50 router scenario and RAM use was a few Megabytes and CPU use and load average were a few percentage points. The only resource that was used was bandwidth.
#11: SEAN In addition to traditional network nodes we considered the option of DVR. At this time DVR still requires network nodes, but they have far less responsibility. In DVR the biggest functional change as compared to network nodes is that the NAT for FloatingIPs takes place on the compute nodes. But, DVR still uses network nodes and virtual routers for VMs without FloatingIPs which require external network access. We worried a bit about High Availability for these network nodes and this is planned for the Newton release At TWC we worried a bit about DVR being production ready and possible scaling issues. DVR would also require a massive retrofit of OpenStack at TWC, including tooling changes.
#12: SEAN But for a complete discussion let’s consider the DVR packet paths. Remember DVR still requires Virtual Routers for a particular traffic flow. I’ve chosen to illustrate this There are four…. One: East-West traffic from VM to VM. This is illustrated in orange. Same between traditional virtual routers and DVR VM on a private VLAN, down the 4 virtual device hops to OVS, VxLAN encapsulation, packets sent over the underlay network, OVS VxLAN de-encapsulation, private VLAN tagging, up four virtual device hops… DONE!! Two: North-South traffic from a VM with a FIP to the external network (internet). This is illustrated in purple. This is the same path as a VM without an external gateway and is the chief functional difference between VR-D and DVR! Three: North-South traffic from a VM through an external gateway This is the same path taken in VR-D Four: Other cases VMs on the same compute node Specialty
#13: SEAN Now back to our “dumb idea” We’re labeling it VR-D simply to differentiate from DVR. It may look in certain ways like DVR but they are not the same. In fact our solution has nothing groundbreaking. It’s a novel implementation of a reference architecture. Quite simply this is servers running both the l3-agent Neutron service and Nova Compute on the same hardware. This results in Virtual Routers on the same servers as VMs It fit all our requirements and It maximized our use of server resources But WAIT! Some of you might be starting to wonder about: Potential VM and VR bandwidth contention Yes, this is a potential problem but Clayton will come back to that.
#14: SEAN First, let’s consider the packet paths of our VR-D idea. There are four…. So why am I only showing three paths? Two are the same! Let’s walk through these. One: East-West traffic from VM to VM. This is illustrated in orange. Same between VR-D and DVR VM on a private VLAN, down the 4 virtual device hops to OVS, VxLAN encapsulation, packets sent over the underlay network, OVS VxLAN de-encapsulation, private VLAN tagging, up four virtual device hops… DONE!! Two: North-South traffic from a VM with a FIP to the external network (internet). This is illustrated in green. This is the same path as a VM without an external gateway and is the chief functional difference between VR-D and DVR! Three: “specialty” East-West traffic illustrated in blue Traffic here is isolated within a single compute node VM:VM and VM:VR
#15: CLAYTON: When we put this placeholder slide in, I figured this would be a few slides. Turns out actually implementing virtual routers on compute nodes was pretty easy We added the l3 agent to our Puppet compute node profile and deployed Figured out the hard way that we needed to deploy the metadata agent along with the L3 agent for instances to boot (for us) Fixed that and we haven’t had a *lot* of issues, but we have run into some issues that we’ve managed to work through, but that you should be aware of.
#16: So 149 88 44 is the biggest problem we’ve run into The problem is that neutron-server has a single thread set aside from handling RPC API calls from service plugins In Mitaka there was a change to allow handling these RPC messages in all RPC worker threads Normally you’d run one worker per cpu Unfortunately the backport of this fix to Liberty has stalled out, it is due to be discussed in a work session at the summit
#17: We’ve seen this problem manifest as the Rabbit Queue for the L3 agent filling up slowly as the neutron-server workers fall behind We’ve seen issues with this just with L3 agents mostly idle on 100ish compute nodes With large numbers of L3 agents this bug can drop up even if the L3 agents aren’t actively doing anything Because of this issue with idle L3 agents, we’ve limited the number of L3 agents we’re running to 20 per environment This still gives us a much smaller failure group size The next problem we ran into was restarts When the L3 agent starts up it asks neutron-server what it’s supposed to be hosting, waits With 20 L3 agents we’ve seen this problem doing deploys where we were previously running 40 compute nodes at a time. The first time we did that with a change that affected the L3 agent, none of them came back What can happen is that all the L3 agents request their assignments, neutron-server gets overwhelmed Takes too long to service any of them, they all retry, and they keep failing to come online Our work around for this is to break compute nodes hosting routers into a separate deploy group where only two of them can be deploying at a time
#18: The easiest part of this was actually making the compute nodes host routers Harder part is that it adds more operational complexity One of the questions we now have to ask when a compute host fails is: Is this node hosting routers? Unfortunately, for reasons having to do with network and rack topology, our nodes hosting routers aren’t nodes 1-20, it’s more like 85 though 104 and 108 through 121 This leads to our next issue: Our tools had to be updated We have a tool to that automatically notify customers on a given node that it has failed, this had to be updated to account for routers We have a tool for evacuating all customers from a compute node. This also had to be updated to evacuate routers Additionally, we had monitoring changes we had to make to ensure routers on compute nodes were properly monitored One issue we’ve not yet addressed is capacity management. Since we’re mixing north/south virtual router traffic with east/west tenant traffic, it’s harder to understand capacity usage
#19: Overall this “VR-D” architecture change has worked out well We feel like this is a good fit for us, and might be a good fit for other operators also Let’s talk briefly about what we have in progress and think we’ll be doing in the future Currently our implementation of VR-D uses legacy routers. We upgraded Neutron to Liberty a few weeks ago and we’ve been testing HA routers since then Soon after the summit we will be enabling HA routers in production to further reduce our failure domain and increase uptime. We’re also working on a plugin for neutron to make router scheduling topology aware Use new Neutron Availability Zones to ensure HA routers will be on separate switches, and Take hints about Nova resource usage on the node. There is a similar feature in Mitaka now, that we’re also looking into Routers on all compute nodes Is this a good idea? Will after LP1498844 is fixed, how well will the l3 agent scale? This is something we’ll be looking into after we’re able to move to a version of neutron that has this fixed. Lastly, we’re keeping an eye on progress with DVR. We think that it may be worth looking into more deeply with the newton release if HA router integration is completed. Long term is seems like DVR is where we’d like to be, but it’s a complex architecture, so we’re wary
#20: That’s all we’ve got, we appreciate everyone coming If you want to get in touch with us then here is our contact information Hopefully have some time for questions

OpenStack: Virtual Routers On Compute Nodes

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to OpenStack: Virtual Routers On Compute Nodes (20)

Recently uploaded (20)

OpenStack: Virtual Routers On Compute Nodes

Editor's Notes