Switch Buffering - Adding Intelligence offer any help for Ethernet in AI fabric
Some sort of intelligence be it in cell phones, sensor ecosystems around us, smart cars either make our life easy or make it more optimised. Any day to day thing that becomes intelligent, it gets curiosity of people to know more. One such non obvious intelligence and a short lesson learnt, I want to share with you.
Recently I was part of a discussion on building up an AI fabric with Ethernet and some added boosters which makes it lossless. By boosters I mean enhancements like Priority flow control, Explicit congestion notification with a flavour of WRED or AFD in Ethernet. There were some arguments on buffering, Intelligent Buffering to be precise which caught my attention.
After all, buffering is buffering so what the hack this intelligent buffering is ?
Moreover can adding some intelligence on buffering side, help our ethernet friend to become a strong contender for AI fabric. This short post will answer this. Topics like PFC, WRED, AFD will be covered in later post or video.
Firstly, just one liner on what is buffer for completeness sake. In a switch, buffers are the memory spaces used to store data packets temporarily, before they are processed or transmitted. These buffers plays an important role in managing network traffic, especially during network congestion or when there's a speed mismatch between incoming and outgoing ports.
You need to have right amount of buffer in the switches. Too much buffer, and it takes a while for a packet to go in, dumped in buffer and come out. This way you add tremendous amount of latency. On the contrary too little buffer, even the signalling doesn’t get across the network as you drop packets. So you have to have the right amount of buffer.
When buffers get full, packets get drop, no other choice. Switch need to drop something. One way is, It can randomly pick packets or from tail end and drop them and we all going to be happy. As such no intelligence.
Now look at the picture above, if you get some better tuning, intelligent buffering and be smart enough to see, man if switch drop these large packet, It can get 4 more packets in there. That way dropping can come down from 100% to 75%. So adding such smartness and intelligence to buffering is a good thing.
Now come to AI fabric for Data Centers, key traffic flow in such fabric is GPU to GPU which is of same MTU size. RDMA traffic from GPU to GPU is a giant flow. So such traffic use case doesn’t fit exactly into our intelligent buffering analogy. But buffering is still essential for fabric why ?
Because the underlying transport infra is ethernet, mechanism like PFC, WRED , AFD has to be in place for loss less behaviour. And for this, congestion management needs to work as a system. For example, to have the ECN working, there has to have enough buffers on the switch so that CNP (Congestion Notification) messages can be marked properly and get to the priority queue before the buffers gets over run. Priority queue gives it the guarantee that it goes before everyone else towards originator so that it can slow down . Now because it takes about 3 micro seconds to go from host to next host which is 3 hop away in a leaf spine network (worst case scenario) . There has to be enough buffer on switches to keep absorbing traffic before the originating guys slow down.
So buffering essential for AI fabric built with ethernet to provide loss less transport.
This article is a part of Data Center design enablement series that I am working on. More details to follow soon. Want to join for a open webinar on Data Center Design: sign up with the link below: