SlideShare a Scribd company logo
TCP Issues in Virtualized Datacenter
Networks
Hemanth Kumar Mantri
Department of Computer Science 1 of 27
Selected Papers
• The TCP Outcast Problem: Exposing
Unfairness in Data Center Networks.
– NSDI’12
• vSnoop: Improving TCP Throughput in
VirtualizedEnvironments via Ack Offload.
– ACM/IEEE SC, 2010
2 of 27
Background and Motivation
• Data center is a shared environment
– Multi Tenancy
• Virtualization: A key enabler of cloud
computing
– Amazon EC2
• Resource sharing
– CPU/Memory are strictly shared
– Network sharing largely laissez-faire
3 of 27
Data Center Networks
• Flows compete via TCP
• Ideally, TCP should achieve true fairness
– All flows get equal share of link capacity
• In practice, TCP exhibits RTT-bias
– Throughput is inversely proportional to RTT
• 2 Major Issues
– Unfairness (in general)
– Low Throughput (in virtualized environments)
4 of 27
Datacenter Topology (Hierarchical)
5 of 27
Traffic Pattern: Many to One
6 of 27
Key Find: Unfairness
Inverse RTT Bias?
Low RTT = Low Throughput
7 of 27
Further Investigation
Instantaneous Average
2-hop flow is consistently starved!!
TCP Outcast Problem
• Some Flows are ‘Outcast’ed and receive very low
throughput compared to others
• Almost an order of magnitude reduction in some
cases
8 of 27
Experiments
• Same RTTs
• Same Hop Length
• Unsynchronized Flows
• Introduce Background Traffic
• Vary Switch Buffer Size
• Vary TCP
– RENO, MP-TCP, BIC, Cubic + SACK
• Unfairness Persists! 9 of 27
Observation
Flow differential at input ports is the culprit! 10 of 27
Vary #flows at competing bottle neck
switch
11 of 27
Reason: Port Blackout
1. Packets are roughly same size
2. Similar inter-arrival rates (Predictable Timing) 12 of 27
Port Blackout
• Can occur on any input port
• Happens for small intervals of time
• Has more catastrophic effect on
throughput of fewer flows!!
– Experiments showed that “same number” of
packet drops affect the throughput of fewer
flows much more than if there were several
concurrent flows.
13 of 27
Conditions for TCP Outcast
14 of 27
Solutions?
• Stochastic Fair Queuing (SFQ)
– Explicitly enforce fairness among flows
– Expensive for commodity switches
• Equal Length Routing
– All flows are forced to go through Core
– Better interleaving of packets, alleviate PB
15 of 27
• Multiple VMs hosted by one physical host
• Multiple VMs sharing the same core
– Flexibility, scalability, and economy
VM Consolidation
Hardware
Virtualization Layer
VM 1 VM 3 VM 4VM 2
Observation:
VM consolidation negatively
impacts network performance!
16 of 27
Sender
Hardware
Virtualization Layer
Investigating the Problem
Server
VM 1 VM 2 VM 3
Client
17 of 27
40
60
80
100
120
140
160
180
5432
RTT(ms)
Number of VMs
RTT increases in
proportion to VM
scheduling slice
(30ms)
Effect of CPU Sharing
18 of 27
Exact Culprit
Sender
Hardware
Driver Domain
(dom0)
VM 1
Device
Driver
VM 3
bufbuf
VM 2
buf
19 of 27
Connection to the VM is much
slower than dom0!
Impact on TCP Throughput
+ dom0
x VM
20 of 27
Solution: vSnoop
• Alleviates the negative effect of VM scheduling on
TCP throughput
• Implemented within the driver domain to
accelerate TCP connections
• Does not require any modifications to the VM
• Does not violate end-to-end TCP semantics
• Applicable across a wide range of VMMs
– Xen, VMware, KVM, etc.
21 of 27
Sender VM1 BufferDriver Domain
Time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
TCP Connection to a VM
Scheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
SYN
VM Scheduling
Latency
RTT
RTT
VM Scheduling
Latency
Sender establishes a TCP
connection to VM1
22 of 27
Sender VM Shared BufferDriver Domain
Time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
Key Idea: Acknowledgement Offload
Scheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
w/ vSnoop
Faster progress during
TCP slowstart
23 of 27
• Challenge 1: Out-of-order/special packets (SYN, FIN packets)
• Solution: Let the VM handle these packets
• Challenge 2: Packet loss after vSnoop
• Solution: Let vSnoop acknowledge only if room in buffer
• Challenge 3: ACKs generated by the VM
• Solution: Suppress/rewrite ACKs already generated by vSnoop
Challenges
24 of 27
vSnoop Implementation in Xen
Driver Domain (dom0)
Bridge
Netfront
Netback
vSnoop
VM1
Netfront
Netback
VM3
Netfront
Netback
VM2
buf bufbuf
Tuning
Netfront
25 of 27
Median
0.192MB/s
0.778MB/s
6.003MB/s
TCP Throughput Improvement
• 3 VMs consolidated, 1000 transfers of a 100KB file
• Vanilla Xen, Xen+tuning, Xen+tuning+vSnoop
30x Improvement
+ Vanilla Xen
x Xen+tuning
* Xen+tuning+vSnoop
26 of 27
Thank You!
• References
– http://friends.cs.purdue.edu/dokuwiki/doku.php
– https://www.usenix.org/conference/nsdi12/tech-
schedule/technical-sessions
• Most animations and pictures are taken from
the authors’ original slides and NSDI’12
conference talk.
27 of 27
BACKUP SLIDES
28
Conditions for Outcast
• Switches use the tail-drop queue
management discipline
• A large set of flows and a small set of
flows arriving at two different input ports
compete for a bottleneck output port at a
switch
29
Why does Unfairness Matter?
• Multi Tenant Clouds
– Some tenants get better performance than
others
• Map Reduce Apps
– Straggler problems
– One delayed flow affects overall job
completion
30
State Machine Maintained Per-
FlowStart
Unexpected
Sequence
Active
(online)
No buffer
(offline)
Out-of-order
packet
In-order pkt
Buffer space available
Out-of-order
packet
In-order pkt
No buffer
In-order pkt
Buffer space available
No buffer
Packet recv
Early acknowledgements
for in-order packets
Don’t
acknowledge
Pass out-of-order
pkts to VM
31
vSnoop’s Impact on TCP Flows
• Slow Start
– Early acknowledgements help progress
connections faster
– Most significant benefit for short transfers that are
more prevalent in data centers
• Congestion Avoidance and Fast Retransmit
– Large flows in the steady state can also benefit
from vSnoop
– Benefit not as much as for Slow Start 32

More Related Content

PDF
2016-tcpkali-websocket
PDF
A survey on SCTP
PPTX
Geneve
PPTX
Congestion control Assignment Help
PPT
3b multiple access
ODP
Virtual Distro Dispatcher - A costless distributed virtual environment from T...
PDF
Oslo.Messaging new 0mq driver proposal
PDF
Training Slides: 104 - Basics - Working With Command Line Tools
2016-tcpkali-websocket
A survey on SCTP
Geneve
Congestion control Assignment Help
3b multiple access
Virtual Distro Dispatcher - A costless distributed virtual environment from T...
Oslo.Messaging new 0mq driver proposal
Training Slides: 104 - Basics - Working With Command Line Tools

What's hot (20)

PPTX
Hhm 3470 mq v8 and more recent new things for z os
PDF
Design and Performance Characteristics of Tap-as-a-Service
PPT
Application Live Migration in LAN/WAN Environment
PDF
Training Slides: Basics 102: Introduction to Tungsten Clustering
PDF
VM Live Migration Speedup in Xen
PPTX
Feedback queuing models for time shared systems
PDF
IBM MQ Clustering (2017 version)
PPTX
Feedback Queueing Models for Time Shared Systems
PDF
VMworld 2014: Extreme Performance Series
PDF
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
PDF
XS Boston 2008 Quantitative
PDF
XS Boston 2008 XenLoop
PPTX
Demand-Based Coordinated Scheduling for SMP VMs
PPTX
Message passing in Distributed Computing Systems
PPTX
AMQP 1.0 introduction
PDF
XS 2008 Boston Capacity Planning
PDF
Swift container sync
PDF
Containers in a File
PDF
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
Hhm 3470 mq v8 and more recent new things for z os
Design and Performance Characteristics of Tap-as-a-Service
Application Live Migration in LAN/WAN Environment
Training Slides: Basics 102: Introduction to Tungsten Clustering
VM Live Migration Speedup in Xen
Feedback queuing models for time shared systems
IBM MQ Clustering (2017 version)
Feedback Queueing Models for Time Shared Systems
VMworld 2014: Extreme Performance Series
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XS Boston 2008 Quantitative
XS Boston 2008 XenLoop
Demand-Based Coordinated Scheduling for SMP VMs
Message passing in Distributed Computing Systems
AMQP 1.0 introduction
XS 2008 Boston Capacity Planning
Swift container sync
Containers in a File
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
Ad

Similar to TCP Issues in DataCenter Networks (20)

PDF
A way of managing data center networks
PDF
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
PDF
10 sdn-vir-6up
PPTX
Tcp variants for data center networks
PDF
Virtual Extensible LAN VXLAN A Practical guide to VXLAN solution Second Editi...
PPT
Course on TCP Dynamic Performance
PPT
Internet Virtualization
PDF
Elephant & mice flows
PDF
Understanding network and service virtualization
PPTX
Network and Service Virtualization tutorial at ONUG Spring 2015
PDF
RIPE 80: Buffers and Protocols
PDF
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
PDF
NZNOG 2020: Buffers, Buffer Bloat and BBR
PDF
IEEE_ICC'23_SARENA.pdf
PPT
Lect9 (1)
PPT
Lect9
PDF
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
PDF
Application Delivery Networking
PDF
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
PPTX
SDN: an introduction
A way of managing data center networks
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
10 sdn-vir-6up
Tcp variants for data center networks
Virtual Extensible LAN VXLAN A Practical guide to VXLAN solution Second Editi...
Course on TCP Dynamic Performance
Internet Virtualization
Elephant & mice flows
Understanding network and service virtualization
Network and Service Virtualization tutorial at ONUG Spring 2015
RIPE 80: Buffers and Protocols
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
NZNOG 2020: Buffers, Buffer Bloat and BBR
IEEE_ICC'23_SARENA.pdf
Lect9 (1)
Lect9
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
Application Delivery Networking
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
SDN: an introduction
Ad

More from Hemanth Kumar Mantri (8)

PPTX
Basic Paxos Implementation in Orc
PDF
Neural Networks in File access Prediction
PDF
Connected Components Labeling
PDF
JPEG Image Compression
PDF
Traffic Simulation using NetLogo
PDF
Search Engine Switching
PDF
Hadoop and MapReduce
PDF
Basic Paxos Implementation in Orc
Neural Networks in File access Prediction
Connected Components Labeling
JPEG Image Compression
Traffic Simulation using NetLogo
Search Engine Switching
Hadoop and MapReduce

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
sap open course for s4hana steps from ECC to s4
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
sap open course for s4hana steps from ECC to s4
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction

TCP Issues in DataCenter Networks

  • 1. TCP Issues in Virtualized Datacenter Networks Hemanth Kumar Mantri Department of Computer Science 1 of 27
  • 2. Selected Papers • The TCP Outcast Problem: Exposing Unfairness in Data Center Networks. – NSDI’12 • vSnoop: Improving TCP Throughput in VirtualizedEnvironments via Ack Offload. – ACM/IEEE SC, 2010 2 of 27
  • 3. Background and Motivation • Data center is a shared environment – Multi Tenancy • Virtualization: A key enabler of cloud computing – Amazon EC2 • Resource sharing – CPU/Memory are strictly shared – Network sharing largely laissez-faire 3 of 27
  • 4. Data Center Networks • Flows compete via TCP • Ideally, TCP should achieve true fairness – All flows get equal share of link capacity • In practice, TCP exhibits RTT-bias – Throughput is inversely proportional to RTT • 2 Major Issues – Unfairness (in general) – Low Throughput (in virtualized environments) 4 of 27
  • 6. Traffic Pattern: Many to One 6 of 27
  • 7. Key Find: Unfairness Inverse RTT Bias? Low RTT = Low Throughput 7 of 27
  • 8. Further Investigation Instantaneous Average 2-hop flow is consistently starved!! TCP Outcast Problem • Some Flows are ‘Outcast’ed and receive very low throughput compared to others • Almost an order of magnitude reduction in some cases 8 of 27
  • 9. Experiments • Same RTTs • Same Hop Length • Unsynchronized Flows • Introduce Background Traffic • Vary Switch Buffer Size • Vary TCP – RENO, MP-TCP, BIC, Cubic + SACK • Unfairness Persists! 9 of 27
  • 10. Observation Flow differential at input ports is the culprit! 10 of 27
  • 11. Vary #flows at competing bottle neck switch 11 of 27
  • 12. Reason: Port Blackout 1. Packets are roughly same size 2. Similar inter-arrival rates (Predictable Timing) 12 of 27
  • 13. Port Blackout • Can occur on any input port • Happens for small intervals of time • Has more catastrophic effect on throughput of fewer flows!! – Experiments showed that “same number” of packet drops affect the throughput of fewer flows much more than if there were several concurrent flows. 13 of 27
  • 14. Conditions for TCP Outcast 14 of 27
  • 15. Solutions? • Stochastic Fair Queuing (SFQ) – Explicitly enforce fairness among flows – Expensive for commodity switches • Equal Length Routing – All flows are forced to go through Core – Better interleaving of packets, alleviate PB 15 of 27
  • 16. • Multiple VMs hosted by one physical host • Multiple VMs sharing the same core – Flexibility, scalability, and economy VM Consolidation Hardware Virtualization Layer VM 1 VM 3 VM 4VM 2 Observation: VM consolidation negatively impacts network performance! 16 of 27
  • 17. Sender Hardware Virtualization Layer Investigating the Problem Server VM 1 VM 2 VM 3 Client 17 of 27
  • 18. 40 60 80 100 120 140 160 180 5432 RTT(ms) Number of VMs RTT increases in proportion to VM scheduling slice (30ms) Effect of CPU Sharing 18 of 27
  • 19. Exact Culprit Sender Hardware Driver Domain (dom0) VM 1 Device Driver VM 3 bufbuf VM 2 buf 19 of 27
  • 20. Connection to the VM is much slower than dom0! Impact on TCP Throughput + dom0 x VM 20 of 27
  • 21. Solution: vSnoop • Alleviates the negative effect of VM scheduling on TCP throughput • Implemented within the driver domain to accelerate TCP connections • Does not require any modifications to the VM • Does not violate end-to-end TCP semantics • Applicable across a wide range of VMMs – Xen, VMware, KVM, etc. 21 of 27
  • 22. Sender VM1 BufferDriver Domain Time SYN SYN,ACK SYN SYN,ACK VM1 buffer TCP Connection to a VM Scheduled VM VM1 VM2 VM3 VM1 VM2 VM3 SYN,ACK SYN VM Scheduling Latency RTT RTT VM Scheduling Latency Sender establishes a TCP connection to VM1 22 of 27
  • 23. Sender VM Shared BufferDriver Domain Time SYN SYN,ACK SYN SYN,ACK VM1 buffer Key Idea: Acknowledgement Offload Scheduled VM VM1 VM2 VM3 VM1 VM2 VM3 SYN,ACK w/ vSnoop Faster progress during TCP slowstart 23 of 27
  • 24. • Challenge 1: Out-of-order/special packets (SYN, FIN packets) • Solution: Let the VM handle these packets • Challenge 2: Packet loss after vSnoop • Solution: Let vSnoop acknowledge only if room in buffer • Challenge 3: ACKs generated by the VM • Solution: Suppress/rewrite ACKs already generated by vSnoop Challenges 24 of 27
  • 25. vSnoop Implementation in Xen Driver Domain (dom0) Bridge Netfront Netback vSnoop VM1 Netfront Netback VM3 Netfront Netback VM2 buf bufbuf Tuning Netfront 25 of 27
  • 26. Median 0.192MB/s 0.778MB/s 6.003MB/s TCP Throughput Improvement • 3 VMs consolidated, 1000 transfers of a 100KB file • Vanilla Xen, Xen+tuning, Xen+tuning+vSnoop 30x Improvement + Vanilla Xen x Xen+tuning * Xen+tuning+vSnoop 26 of 27
  • 27. Thank You! • References – http://friends.cs.purdue.edu/dokuwiki/doku.php – https://www.usenix.org/conference/nsdi12/tech- schedule/technical-sessions • Most animations and pictures are taken from the authors’ original slides and NSDI’12 conference talk. 27 of 27
  • 29. Conditions for Outcast • Switches use the tail-drop queue management discipline • A large set of flows and a small set of flows arriving at two different input ports compete for a bottleneck output port at a switch 29
  • 30. Why does Unfairness Matter? • Multi Tenant Clouds – Some tenants get better performance than others • Map Reduce Apps – Straggler problems – One delayed flow affects overall job completion 30
  • 31. State Machine Maintained Per- FlowStart Unexpected Sequence Active (online) No buffer (offline) Out-of-order packet In-order pkt Buffer space available Out-of-order packet In-order pkt No buffer In-order pkt Buffer space available No buffer Packet recv Early acknowledgements for in-order packets Don’t acknowledge Pass out-of-order pkts to VM 31
  • 32. vSnoop’s Impact on TCP Flows • Slow Start – Early acknowledgements help progress connections faster – Most significant benefit for short transfers that are more prevalent in data centers • Congestion Avoidance and Fast Retransmit – Large flows in the steady state can also benefit from vSnoop – Benefit not as much as for Slow Start 32