SlideShare a Scribd company logo
Data Center Networking 
Stanford CS144 Lecture 17 
Philip Levis, 11/30/11
Computer network (5)
Low latencies: μs 
High capacity: GigE, 10 GigE 
Specialized traffic 
Centrally managed
Topology 
(picture courtesy of Al-Fares et al, “A Scalable, Commodity Data Center Network Architecture”)
Storage Workload 
(picture courtesy of Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in 
Cluster-based Storage Systems”)
Query Workload 
(picture courtesy of Alizadeh et al., “Data Center TCP (DCTCP)”)
Problems
Per-Pair Bandwidth 
(picture courtesy of Al-Fares et al, “A Scalable, Commodity Data Center Network Architecture”)
Incast 
(from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage 
Systems”)
Incast Details 
(from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage 
Systems”)
Mixed traffic 
• Low latency for short flows 
• High burst tolerance (incast) 
• High throughput for long flows
Recent Research 
• New switching topology: Al-Fares et al. 
• Fix TCP incast: Vasudevan et al. 
• Data Center TCP: Alizadeh et al.
Per-Pair Bandwidth 
(picture courtesy of Al-Fares et al, “A Scalable, Commodity Data Center Network Architecture”)
Fat Tree
Fat Tree 
(k/2)2 
k/2 
k/2 
k
Switching 
Prefix Port 
10.2.0.0/24 0 
10.2.1.0/24 1 
0.0.0.0/0 Suffix Port 
0.0.0.2/8 2 
0.0.0.3/8 3 
TCAM 
10.2.0.X 
10.2.1.X 
X.X.X.2 
X.X.X.3 
Encoder 
Prefix Next Hop Port 
00 10.2.0.1 0 
01 10.2.1.1 1 
10 10.4.1.1 2 
11 10.4.1.2 3
Not Perfect 
(k/2)2 
k/2 
k/2 
k
Fat-Tree Status
Incast 
• RTO = SRTT + (4 X RTTVAR)
Behavior 
(from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage 
Systems”)
RFC 6298 
(2.4) Whenever RTO is computed, if it is less than 1 
second, then the RTO SHOULD be rounded up to 1 second. 
- in practice, often 200ms 
RFC 2581 
The delayed ACK algorithm specified in [Bra89] SHOULD be 
used by a TCP receiver. When used, a TCP receiver MUST NOT 
excessively delay acknowledgments. Specifically, an ACK SHOULD 
be generated for at least every second full-sized segment, and 
MUST be generated within 500 ms of the arrival of the first 
unacknowledged packet. 
- in practice, often 40ms
Solutions 
• Proposal 1: Adjust RTO (Vasudevan et al.) 
• Proposal 2: DCTCP (Alizadeh et al.)
RTT
RTT 2
RTO 
• Make RTOmin 200μs 
• Timeout = (RTO + (rand(0.5) x RTO))
Improvement
Wide Area
DCTCP 
• Three goals 
• Low latency for short flows 
• High burst tolerance (incast) 
• High throughput for long flows 
• Basic approach: keep switch queues short
Queue Length 
• RTT measurements are noisy 
• At high speeds, very small 
• GigE: 10 packets is 120μs 
• 10GigE: 10 paciets is 12μs 
• Use ECN (explicit congestion notification) 
• RFC 3168
Setting ECN 
K 
Set ECN bit
Monitoring α 
• Per RTT, measure F, the fraction of packets 
sent that had the ECN bit set 
• DCTCP acks copy the ECN bit of the corresponding 
data packets into ECN-Echo field 
• Compute α, EWMA of F
Adjusting cwnd 
• cwnd = cwnd x (1 - α/2)
DCTCP Caveat 
“We stress that DCTCP is designed for the data 
center environment. In this paper, we make no 
claims about suitability of DCTCP for wide area 
networks.”
Data Center Networks 
• Very different than wide area Internet 
• Tiny RTTs 
• Different traffic patterns 
• Single administrative domain 
• Standards (e.g., IETF) much less important 
• A lot of very novel network design

More Related Content

PDF
PPTX
TCP-FIT: An Improved TCP Congestion Control Algorithm and its Performance
PPT
Congestion control avoidance
PPTX
Congestion control in tcp
PPTX
Cubic
PDF
TCP Congestion Control
PPT
Tcp congestion avoidance algorithm identification
PPT
Tcp congestion control
TCP-FIT: An Improved TCP Congestion Control Algorithm and its Performance
Congestion control avoidance
Congestion control in tcp
Cubic
TCP Congestion Control
Tcp congestion avoidance algorithm identification
Tcp congestion control

What's hot (20)

PPSX
Adoptive flowcontrol in TCP
PPTX
Tcp congestion avoidance
PPT
Lect9
PDF
TCP Westwood
PPTX
TCP Congestion Control By Owais Jara
PPT
TCP congestion control
ODP
Congestion Control in Computer Networks - ATM and TCP
PPTX
PDF
Congestion control
PPT
Tcp Congestion Avoidance
PDF
Computer network (13)
PDF
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
PDF
Mobile computing-tcp data flow control
PPSX
Adoptive retransmission in TCP
DOCX
Leaky bucket algorithm
PPT
Tcp congestion control (1)
PPTX
Leaky Bucket & Tocken Bucket - Traffic shaping
PPSX
Congestion control in TCP
PPTX
Protocols of noiseless
PPT
Lect9 (1)
Adoptive flowcontrol in TCP
Tcp congestion avoidance
Lect9
TCP Westwood
TCP Congestion Control By Owais Jara
TCP congestion control
Congestion Control in Computer Networks - ATM and TCP
Congestion control
Tcp Congestion Avoidance
Computer network (13)
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
Mobile computing-tcp data flow control
Adoptive retransmission in TCP
Leaky bucket algorithm
Tcp congestion control (1)
Leaky Bucket & Tocken Bucket - Traffic shaping
Congestion control in TCP
Protocols of noiseless
Lect9 (1)
Ad

Similar to Computer network (5) (20)

PPT
TLS in manet
PPT
TCP Over Wireless
PPTX
LAWIN: a Latency-AWare InterNet Architecture for Latency Support on Best-Effo...
PDF
Iaetsd an effective approach to eliminate tcp incast
PDF
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PPTX
6TiSCH @Telecom Bretagne 2015
PPTX
Part 9 : Congestion control and IPv6
PDF
RxNetty vs Tomcat Performance Results
PPTX
Protocols for Fast Delivery of Large Data Volumes
PDF
Designing TCP-Friendly Window-based Congestion Control
PDF
A DRAM-friendly priority queue Internet packet scheduler implementation and i...
PPTX
3.TRANSPORT LAYER Computer Network .pptx
PDF
Communication Performance Over A Gigabit Ethernet Network
PDF
features of tcp important for the web
PPTX
Part9-congestion.pptx
PPT
Pushing the limits of Controller Area Network (CAN)
DOCX
MAT 510 – Homework AssignmentHomework Assignment 6 Due.docx
PPTX
NE #1.pptx
PPTX
RDMA at Hyperscale: Experience and Future Directions
PDF
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
TLS in manet
TCP Over Wireless
LAWIN: a Latency-AWare InterNet Architecture for Latency Support on Best-Effo...
Iaetsd an effective approach to eliminate tcp incast
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
6TiSCH @Telecom Bretagne 2015
Part 9 : Congestion control and IPv6
RxNetty vs Tomcat Performance Results
Protocols for Fast Delivery of Large Data Volumes
Designing TCP-Friendly Window-based Congestion Control
A DRAM-friendly priority queue Internet packet scheduler implementation and i...
3.TRANSPORT LAYER Computer Network .pptx
Communication Performance Over A Gigabit Ethernet Network
features of tcp important for the web
Part9-congestion.pptx
Pushing the limits of Controller Area Network (CAN)
MAT 510 – Homework AssignmentHomework Assignment 6 Due.docx
NE #1.pptx
RDMA at Hyperscale: Experience and Future Directions
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
Ad

More from NYversity (20)

PDF
Programming methodology-1.1
PDF
3016 all-2007-dist
PDF
Programming methodology lecture28
PDF
Programming methodology lecture27
PDF
Programming methodology lecture26
PDF
Programming methodology lecture25
PDF
Programming methodology lecture24
PDF
Programming methodology lecture23
PDF
Programming methodology lecture22
PDF
Programming methodology lecture20
PDF
Programming methodology lecture19
PDF
Programming methodology lecture18
PDF
Programming methodology lecture17
PDF
Programming methodology lecture16
PDF
Programming methodology lecture15
PDF
Programming methodology lecture14
PDF
Programming methodology lecture13
PDF
Programming methodology lecture12
PDF
Programming methodology lecture11
PDF
Programming methodology lecture10
Programming methodology-1.1
3016 all-2007-dist
Programming methodology lecture28
Programming methodology lecture27
Programming methodology lecture26
Programming methodology lecture25
Programming methodology lecture24
Programming methodology lecture23
Programming methodology lecture22
Programming methodology lecture20
Programming methodology lecture19
Programming methodology lecture18
Programming methodology lecture17
Programming methodology lecture16
Programming methodology lecture15
Programming methodology lecture14
Programming methodology lecture13
Programming methodology lecture12
Programming methodology lecture11
Programming methodology lecture10

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Structure & Organelles in detailed.
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
01-Introduction-to-Information-Management.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
RMMM.pdf make it easy to upload and study
PPTX
Institutional Correction lecture only . . .
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
PPH.pptx obstetrics and gynecology in nursing
Anesthesia in Laparoscopic Surgery in India
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Structure & Organelles in detailed.
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Complications of Minimal Access Surgery at WLH
O5-L3 Freight Transport Ops (International) V1.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
01-Introduction-to-Information-Management.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
TR - Agricultural Crops Production NC III.pdf
GDM (1) (1).pptx small presentation for students
RMMM.pdf make it easy to upload and study
Institutional Correction lecture only . . .
Renaissance Architecture: A Journey from Faith to Humanism
Final Presentation General Medicine 03-08-2024.pptx
Pharma ospi slides which help in ospi learning
Sports Quiz easy sports quiz sports quiz
PPH.pptx obstetrics and gynecology in nursing

Computer network (5)

  • 1. Data Center Networking Stanford CS144 Lecture 17 Philip Levis, 11/30/11
  • 3. Low latencies: μs High capacity: GigE, 10 GigE Specialized traffic Centrally managed
  • 4. Topology (picture courtesy of Al-Fares et al, “A Scalable, Commodity Data Center Network Architecture”)
  • 5. Storage Workload (picture courtesy of Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems”)
  • 6. Query Workload (picture courtesy of Alizadeh et al., “Data Center TCP (DCTCP)”)
  • 8. Per-Pair Bandwidth (picture courtesy of Al-Fares et al, “A Scalable, Commodity Data Center Network Architecture”)
  • 9. Incast (from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems”)
  • 10. Incast Details (from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems”)
  • 11. Mixed traffic • Low latency for short flows • High burst tolerance (incast) • High throughput for long flows
  • 12. Recent Research • New switching topology: Al-Fares et al. • Fix TCP incast: Vasudevan et al. • Data Center TCP: Alizadeh et al.
  • 13. Per-Pair Bandwidth (picture courtesy of Al-Fares et al, “A Scalable, Commodity Data Center Network Architecture”)
  • 15. Fat Tree (k/2)2 k/2 k/2 k
  • 16. Switching Prefix Port 10.2.0.0/24 0 10.2.1.0/24 1 0.0.0.0/0 Suffix Port 0.0.0.2/8 2 0.0.0.3/8 3 TCAM 10.2.0.X 10.2.1.X X.X.X.2 X.X.X.3 Encoder Prefix Next Hop Port 00 10.2.0.1 0 01 10.2.1.1 1 10 10.4.1.1 2 11 10.4.1.2 3
  • 17. Not Perfect (k/2)2 k/2 k/2 k
  • 19. Incast • RTO = SRTT + (4 X RTTVAR)
  • 20. Behavior (from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems”)
  • 21. RFC 6298 (2.4) Whenever RTO is computed, if it is less than 1 second, then the RTO SHOULD be rounded up to 1 second. - in practice, often 200ms RFC 2581 The delayed ACK algorithm specified in [Bra89] SHOULD be used by a TCP receiver. When used, a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet. - in practice, often 40ms
  • 22. Solutions • Proposal 1: Adjust RTO (Vasudevan et al.) • Proposal 2: DCTCP (Alizadeh et al.)
  • 23. RTT
  • 24. RTT 2
  • 25. RTO • Make RTOmin 200μs • Timeout = (RTO + (rand(0.5) x RTO))
  • 28. DCTCP • Three goals • Low latency for short flows • High burst tolerance (incast) • High throughput for long flows • Basic approach: keep switch queues short
  • 29. Queue Length • RTT measurements are noisy • At high speeds, very small • GigE: 10 packets is 120μs • 10GigE: 10 paciets is 12μs • Use ECN (explicit congestion notification) • RFC 3168
  • 30. Setting ECN K Set ECN bit
  • 31. Monitoring α • Per RTT, measure F, the fraction of packets sent that had the ECN bit set • DCTCP acks copy the ECN bit of the corresponding data packets into ECN-Echo field • Compute α, EWMA of F
  • 32. Adjusting cwnd • cwnd = cwnd x (1 - α/2)
  • 33. DCTCP Caveat “We stress that DCTCP is designed for the data center environment. In this paper, we make no claims about suitability of DCTCP for wide area networks.”
  • 34. Data Center Networks • Very different than wide area Internet • Tiny RTTs • Different traffic patterns • Single administrative domain • Standards (e.g., IETF) much less important • A lot of very novel network design