SlideShare a Scribd company logo
12
Most read
16
Most read
20
Most read
Brought to you by
Capturing NIC and Kernel
Tx/Rx Timestamps for
Packets in Go
Blain Smith
Staff Software Engineer at Rocket Science
Blain Smith
Staff Software Engineer | Professor of Rocket Systems, Rocket Science
■ 10+ years building AAA online game services (AmongUs,
Unity, WB, Riot, 2K)
■ Low level networking and distributed systems
■ Competitive powerlifter and strongman
■ Multiplayer Games Engineering Specialists!
■ Led, advised or invested in many of the industry’s largest
companies including Unity, PUBG, Multiplay, Vivox among
others as well as building solutions for the biggest games
out there.
■ Publishers around the globe trust us and our expert team to
design and deliver planet-scale online services for the
world’s biggest games such as Rocket League, League of
Legends, DOOM, and many more!
3 Basic Measurements
There are 3 basic measurements we want to know about given any 2 computers
on a network.
■ Latency - time it takes for a packet move from computer A to computer B
■ Jitter - variation of latency over a given timeframe
■ Packet Loss - percentage of packets that were lost over a given timeframe
ping
> ping rocketscience.gg
PING rocketscience.gg (76.76.21.21) 56(84) bytes of data.
64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=1 ttl=119 time=24.6 ms
64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=2 ttl=119 time=22.2 ms
64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=3 ttl=119 time=24.6 ms
64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=4 ttl=119 time=23.0 ms
64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=5 ttl=119 time=23.9 ms
^C
--- rocketscience.gg ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4012ms
rtt min/avg/max/mdev = 22.184/23.666/24.618/0.935 ms
ping with Go
// using https://github.com/go-ping/ping
pinger, err := ping.NewPinger("rocketscience.gg")
if err != nil {
panic(err)
}
pinger.Count = 3
err = pinger.Run()
if err != nil {
panic(err)
}
stats := pinger.Statistics() // get send/receive/duplicate/rtt stats
HTTP Timestamp with Go
// http_server.go
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
log.Println(time.Now().UnixNano(), r.Method, r.URL.String())
w.WriteHeader(http.StatusOK)
io.Copy(w, r.Body)
})
http.ListenAndServe(":8080", nil)
// Client
> curl -d 'testing' localhost:8080
Testing
// Server Logs
2022/05/23 19:41:44 1653349304362572711 POST /
TCP Timestamp with Go
// tcp_server.go
ln, _ := net.Listen("tcp", ":8080")
for {
conn, _ := ln.Accept()
go func(conn net.Conn) {
log.Println(time.Now().UnixNano(), conn.RemoteAddr().String())
io.Copy(conn, conn)
}(conn)
}
// Client
> netcat localhost 8080
hi
hi
// Server Logs
2022/05/23 20:52:31 1653353551565391099 [::1]:53648
UDP Timestamp with Go
// udp_server.go
conn, _ := net.ListenPacket("udp", ":8080")
for {
data := make([]byte, 1024)
n, src, _ := conn.ReadFrom(data)
log.Println(time.Now().UnixNano(), src.String())
conn.WriteTo(data[:n], src)
}
// Client
> netcat -u localhost 8080
hi
hi
// Server Logs
2022/05/23 20:57:43 1653353863894400940 [::1]:48194
Problems with time.Time in Go
Every time you call tcpconn.(Read|Write) or udpconn.(ReadFrom|WriteTo)
your Go process makes a Linux system call send or recv via the kernel so it can
put the data into the NIC queue to be sent out onto or read in from the wire. These
system calls and queues all take time.
Sample Ping Sequence
■ Latency = Rx - Tx in
userspace
■ Includes kernel time (green)
● Go runtime context switching
● OS context switching
■ Includes NIC queue (blue)
● Tx/Rx buffers
■ We want just wire time
(orange)
Packet Sequence with Control Messages
■ Full Latency = Rx - Tx in userspace
■ Kernel Tx: is when packet makes it into the
kernel from userspace
■ NIC Tx is when the packet gets placed onto
the wire (if supported by hardware)
■ Kernel Rx is when the packet enters the
kernel from the NIC
■ NIC Rx is when the packet comes in from the
wire into the NIC
AF_PACKET
■ Requires we manually construct each layer of our packet and wrap our data
(Layer 1, 2, 3, and 4) manually
■ The great github.com/mdlayher/socket library makes working with raw
sockets very idiomatic
socket, err := socket.Socket(unix.AF_PACKET, unix.SOCK_RAW, 0, "socket", nil)
socket.ReadFrom(...)
socket.WriteTo(...)
Manually Creating Packet []byte
■ Use github.com/google/gopacket
■ Allows you to encode/decode packet layers
l1 := layers.Ethernet{SrcMAC: smac, DstMAC: dmac EthernetType: layers.EthernetTypeIPv4}
l2 := layers.IPv4{Version: 4, TTL: 64, Protocol: layers.IPProtocolUDP, SrcIP: saddr, DstIP: daddr}
l3 := layers.UDP{SrcPort: layers.UDPPort(sport), DstPort: layers.UDPPort(dport)}
l4 := []byte("this is my application data")
pkt := gopacket.NewSerializeBuffer()
gopacket.SerializeLayers(pkt, nil, l1, l2, l4, gopacket.Payload(l4))
fullpkt := pkt.Bytes()
socket.Write(fullpkt)
SO_TIMESTAMPING for Control Messages
Control messages are extra information provided by the Linux kernel about the socket.
Timestamp information is transmitted with these control messages using specific types.
Accessing these control messages are available with Resvmsg on a socket to "receive a
message" from the kernel once SO_TIMESTAMPING is enabled on the socket.
https://www.kernel.org/doc/html/latest/networking/timestamping.html
data := make([]byte, 1024)
cmsg := make([]byte, 1024)
// read incoming data from the RX queue (normally conn.Read)
socket.Recvmsg(data, cmsg, 0)
// read outgoing data from the TX queue (sent with conn.Write)
socket.Recvmsg(data, cmg, unix.MSG_ERRQUEUE)
Parsing the Timestamps in Control Messages
Socket kernel and NIC timestamp information can be parsed from these control
messages.
msgs, _ := unix.ParseSocketControlMessage(cmsg)
if msg.Header.Level != unix.SOL_SOCKET && msg.Header.Type != SocketOptTimestamping {
return ts, fmt.Errorf("no timestamp control messages")
}
var ts SocketTimestamps
ts.Software = time.Unix(int64(binary.LittleEndian.Uint64(msg.Data[0:])),
int64(binary.LittleEndian.Uint64(msg.Data[8:]))) // kernel timestamp
ts.Hardware = time.Unix(int64(binary.LittleEndian.Uint64(msg.Data[32:])),
int64(binary.LittleEndian.Uint64(msg.Data[40:]))) // NIC timestamp (if supported by NIC hardware)
Use blainsmith/fast_afpacket
■ Implements net.PacketConn
■ Sets up the AF_PACKET & SO_TIMESTAMPING options for you
■ Offer convenient RecvTxTimestamps() & RecvRxTimestamps()
// create a socket and bind it directly to the interface
// named eth0 and send/recv all IPv4/v6 TCP/UDP traffic!
iface, _ := net.InterfaceByName("eth0")
socket, _ := fastafpacket.Listen(iface, unix.SOCK_RAW, unix.ETH_P_ALL, nil)
Packet Sender/Receiver
for range ticker.C {
packet, _ := encodePacket(srcmac, srcip, srcport, dstmac, dstip, dstport, data)
_, _ = conn.WriteTo(packet, &fastafpacket.Addr{HardwareAddr: dstmac})
}
for {
packet := make([]byte, 1024)
_, _, ts, _ := conn.RecvTxTimestamps(packet)
log.Println(“tx”, ts.Hardware.UnixNano(), ts.Software.UnixNano())
}
// tx 1652138203507654453 1652138202507654453
for {
packet := make([]byte, 1024)
n, _, ts, _ := conn.RecvRxTimestamps(packet)
log.Println(“rx”, ts.Hardware.UnixNano(), ts.Software.UnixNano(), ts.Hardware.Sub(time.Now()))
}
// rx 1652138204508141040 1652138203508141040 -423.796µs
Use Cases
■ One-way latency NIC A -> NIC B
● Asymmetric measurements
● NIC B -> NIC A might be different
■ Observability into time being spent processing packets
● in/out kernel
● in/out NIC queues
■ Wire only
● Subtract kernel and NIC timestamps from userspace timestamps
● Removes jitter introduced from context switching (both Go runtime and OS) and NIC queuing
■ Target specific NICs
● Target which NIC the packet goes out and which destination NIC on the receiving side
● Expose slow NIC queues when comparing to other NICs on the same machine
Timestamp Delays
Once we have kernel and NIC timestamp information we can calculate the delays
between each of the layers.
■ tx.Software.Sub(userspaceTxTime) = outgoing kernel delay
■ tx.Hardware.Sub(userspaceTxTime) = outgoing NIC delay
■ rx.Software.Sub(userspaceRxTime) = incoming kernel delay
■ rx.Hardware.Sub(userspaceRxTime) = incoming NIC delay
Timestamping Support
■ Kernel timestamping should be able for most, but hardware timestamps
depends on NIC hardware
■ Mellanox (owned by NVIDIA) NICs have support
■ Use ethtool to check your own hardware
> ethtool -T eth0
Time stamping parameters for eth0:
Capabilities:
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
Further Reading
■ AF_PACKET: https://man7.org/linux/man-pages/man7/packet.7.html
■ SO_TIMESTAMPING: https://www.kernel.org/doc/html/latest/networking/timestamping.html
■ One-way Delay: https://api.semanticscholar.org/CorpusID:5433866
■ Clock and Time Sync: https://arxiv.org/abs/2106.16140
■ NTP: http://www.ntp.org
Brought to you by
Blain Smith
@blainsmith
blainsmith.com
rocketscience.gg

More Related Content

PPTX
Linux Network Stack
PDF
Physical Memory Management.pdf
PDF
Faster packet processing in Linux: XDP
PDF
introduction to linux kernel tcp/ip ptocotol stack
PDF
iostatの見方
PDF
New Ways to Find Latency in Linux Using Tracing
PPTX
The TCP/IP Stack in the Linux Kernel
PDF
DoS and DDoS mitigations with eBPF, XDP and DPDK
Linux Network Stack
Physical Memory Management.pdf
Faster packet processing in Linux: XDP
introduction to linux kernel tcp/ip ptocotol stack
iostatの見方
New Ways to Find Latency in Linux Using Tracing
The TCP/IP Stack in the Linux Kernel
DoS and DDoS mitigations with eBPF, XDP and DPDK

What's hot (20)

PDF
I2C Subsystem In Linux-2.6.24
PDF
Rapport administration systèmes et supervision réseaux tp4 diabang master1 tr
ODP
Dpdk performance
PPTX
NoSQL Data Modeling 101
PDF
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
PDF
The linux networking architecture
PDF
Linux Kernel Startup Code In Embedded Linux
PDF
Fun with Network Interfaces
PDF
Performance Wins with BPF: Getting Started
PDF
Physical Memory Models.pdf
PDF
Memory Mapping Implementation (mmap) in Linux Kernel
PDF
OSTree: OSイメージとパッケージシステムの間にGitのアプローチを
PDF
Interrupt Affinityについて
PDF
EBPF and Linux Networking
PDF
Hands-on ethernet driver
PDF
Linux Kernel - Virtual File System
PDF
netfilter and iptables
PDF
Process Address Space: The way to create virtual address (page table) of user...
PDF
The Linux Block Layer - Built for Fast Storage
I2C Subsystem In Linux-2.6.24
Rapport administration systèmes et supervision réseaux tp4 diabang master1 tr
Dpdk performance
NoSQL Data Modeling 101
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
The linux networking architecture
Linux Kernel Startup Code In Embedded Linux
Fun with Network Interfaces
Performance Wins with BPF: Getting Started
Physical Memory Models.pdf
Memory Mapping Implementation (mmap) in Linux Kernel
OSTree: OSイメージとパッケージシステムの間にGitのアプローチを
Interrupt Affinityについて
EBPF and Linux Networking
Hands-on ethernet driver
Linux Kernel - Virtual File System
netfilter and iptables
Process Address Space: The way to create virtual address (page table) of user...
The Linux Block Layer - Built for Fast Storage
Ad

Similar to Capturing NIC and Kernel TX and RX Timestamps for Packets in Go (20)

PDF
Network Programming: Data Plane Development Kit (DPDK)
PPTX
#2 (UDP)
PDF
Pycon - Python for ethical hackers
PPTX
Gpu workshop cluster universe: scripting cuda
PDF
Please help with the below 3 questions, the python script is at the.pdf
PDF
LSFMM 2019 BPF Observability
PDF
So you think you can stream.pptx
PPTX
Track c-High speed transaction-based hw-sw coverification -eve
PDF
Ping to Pong
PDF
A22 Introduction to DTrace by Kyle Hailey
PPTX
Understanding DPDK
PPTX
Stress your DUT
PPTX
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PPTX
Modern Linux Tracing Landscape
PPTX
Dpdk applications
PDF
Open stack pike-devstack-tutorial
PDF
Osol Pgsql
PDF
How to Leverage Go for Your Networking Needs
PPTX
Networking in Java
PPT
TopicMapReduceComet log analysis by using splunk
Network Programming: Data Plane Development Kit (DPDK)
#2 (UDP)
Pycon - Python for ethical hackers
Gpu workshop cluster universe: scripting cuda
Please help with the below 3 questions, the python script is at the.pdf
LSFMM 2019 BPF Observability
So you think you can stream.pptx
Track c-High speed transaction-based hw-sw coverification -eve
Ping to Pong
A22 Introduction to DTrace by Kyle Hailey
Understanding DPDK
Stress your DUT
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
Modern Linux Tracing Landscape
Dpdk applications
Open stack pike-devstack-tutorial
Osol Pgsql
How to Leverage Go for Your Networking Needs
Networking in Java
TopicMapReduceComet log analysis by using splunk
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
20250228 LYD VKU AI Blended-Learning.pptx

Capturing NIC and Kernel TX and RX Timestamps for Packets in Go

  • 1. Brought to you by Capturing NIC and Kernel Tx/Rx Timestamps for Packets in Go Blain Smith Staff Software Engineer at Rocket Science
  • 2. Blain Smith Staff Software Engineer | Professor of Rocket Systems, Rocket Science ■ 10+ years building AAA online game services (AmongUs, Unity, WB, Riot, 2K) ■ Low level networking and distributed systems ■ Competitive powerlifter and strongman
  • 3. ■ Multiplayer Games Engineering Specialists! ■ Led, advised or invested in many of the industry’s largest companies including Unity, PUBG, Multiplay, Vivox among others as well as building solutions for the biggest games out there. ■ Publishers around the globe trust us and our expert team to design and deliver planet-scale online services for the world’s biggest games such as Rocket League, League of Legends, DOOM, and many more!
  • 4. 3 Basic Measurements There are 3 basic measurements we want to know about given any 2 computers on a network. ■ Latency - time it takes for a packet move from computer A to computer B ■ Jitter - variation of latency over a given timeframe ■ Packet Loss - percentage of packets that were lost over a given timeframe
  • 5. ping > ping rocketscience.gg PING rocketscience.gg (76.76.21.21) 56(84) bytes of data. 64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=1 ttl=119 time=24.6 ms 64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=2 ttl=119 time=22.2 ms 64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=3 ttl=119 time=24.6 ms 64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=4 ttl=119 time=23.0 ms 64 bytes from 76.76.21.21 (76.76.21.21): icmp_seq=5 ttl=119 time=23.9 ms ^C --- rocketscience.gg ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4012ms rtt min/avg/max/mdev = 22.184/23.666/24.618/0.935 ms
  • 6. ping with Go // using https://github.com/go-ping/ping pinger, err := ping.NewPinger("rocketscience.gg") if err != nil { panic(err) } pinger.Count = 3 err = pinger.Run() if err != nil { panic(err) } stats := pinger.Statistics() // get send/receive/duplicate/rtt stats
  • 7. HTTP Timestamp with Go // http_server.go http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { log.Println(time.Now().UnixNano(), r.Method, r.URL.String()) w.WriteHeader(http.StatusOK) io.Copy(w, r.Body) }) http.ListenAndServe(":8080", nil) // Client > curl -d 'testing' localhost:8080 Testing // Server Logs 2022/05/23 19:41:44 1653349304362572711 POST /
  • 8. TCP Timestamp with Go // tcp_server.go ln, _ := net.Listen("tcp", ":8080") for { conn, _ := ln.Accept() go func(conn net.Conn) { log.Println(time.Now().UnixNano(), conn.RemoteAddr().String()) io.Copy(conn, conn) }(conn) } // Client > netcat localhost 8080 hi hi // Server Logs 2022/05/23 20:52:31 1653353551565391099 [::1]:53648
  • 9. UDP Timestamp with Go // udp_server.go conn, _ := net.ListenPacket("udp", ":8080") for { data := make([]byte, 1024) n, src, _ := conn.ReadFrom(data) log.Println(time.Now().UnixNano(), src.String()) conn.WriteTo(data[:n], src) } // Client > netcat -u localhost 8080 hi hi // Server Logs 2022/05/23 20:57:43 1653353863894400940 [::1]:48194
  • 10. Problems with time.Time in Go Every time you call tcpconn.(Read|Write) or udpconn.(ReadFrom|WriteTo) your Go process makes a Linux system call send or recv via the kernel so it can put the data into the NIC queue to be sent out onto or read in from the wire. These system calls and queues all take time.
  • 11. Sample Ping Sequence ■ Latency = Rx - Tx in userspace ■ Includes kernel time (green) ● Go runtime context switching ● OS context switching ■ Includes NIC queue (blue) ● Tx/Rx buffers ■ We want just wire time (orange)
  • 12. Packet Sequence with Control Messages ■ Full Latency = Rx - Tx in userspace ■ Kernel Tx: is when packet makes it into the kernel from userspace ■ NIC Tx is when the packet gets placed onto the wire (if supported by hardware) ■ Kernel Rx is when the packet enters the kernel from the NIC ■ NIC Rx is when the packet comes in from the wire into the NIC
  • 13. AF_PACKET ■ Requires we manually construct each layer of our packet and wrap our data (Layer 1, 2, 3, and 4) manually ■ The great github.com/mdlayher/socket library makes working with raw sockets very idiomatic socket, err := socket.Socket(unix.AF_PACKET, unix.SOCK_RAW, 0, "socket", nil) socket.ReadFrom(...) socket.WriteTo(...)
  • 14. Manually Creating Packet []byte ■ Use github.com/google/gopacket ■ Allows you to encode/decode packet layers l1 := layers.Ethernet{SrcMAC: smac, DstMAC: dmac EthernetType: layers.EthernetTypeIPv4} l2 := layers.IPv4{Version: 4, TTL: 64, Protocol: layers.IPProtocolUDP, SrcIP: saddr, DstIP: daddr} l3 := layers.UDP{SrcPort: layers.UDPPort(sport), DstPort: layers.UDPPort(dport)} l4 := []byte("this is my application data") pkt := gopacket.NewSerializeBuffer() gopacket.SerializeLayers(pkt, nil, l1, l2, l4, gopacket.Payload(l4)) fullpkt := pkt.Bytes() socket.Write(fullpkt)
  • 15. SO_TIMESTAMPING for Control Messages Control messages are extra information provided by the Linux kernel about the socket. Timestamp information is transmitted with these control messages using specific types. Accessing these control messages are available with Resvmsg on a socket to "receive a message" from the kernel once SO_TIMESTAMPING is enabled on the socket. https://www.kernel.org/doc/html/latest/networking/timestamping.html data := make([]byte, 1024) cmsg := make([]byte, 1024) // read incoming data from the RX queue (normally conn.Read) socket.Recvmsg(data, cmsg, 0) // read outgoing data from the TX queue (sent with conn.Write) socket.Recvmsg(data, cmg, unix.MSG_ERRQUEUE)
  • 16. Parsing the Timestamps in Control Messages Socket kernel and NIC timestamp information can be parsed from these control messages. msgs, _ := unix.ParseSocketControlMessage(cmsg) if msg.Header.Level != unix.SOL_SOCKET && msg.Header.Type != SocketOptTimestamping { return ts, fmt.Errorf("no timestamp control messages") } var ts SocketTimestamps ts.Software = time.Unix(int64(binary.LittleEndian.Uint64(msg.Data[0:])), int64(binary.LittleEndian.Uint64(msg.Data[8:]))) // kernel timestamp ts.Hardware = time.Unix(int64(binary.LittleEndian.Uint64(msg.Data[32:])), int64(binary.LittleEndian.Uint64(msg.Data[40:]))) // NIC timestamp (if supported by NIC hardware)
  • 17. Use blainsmith/fast_afpacket ■ Implements net.PacketConn ■ Sets up the AF_PACKET & SO_TIMESTAMPING options for you ■ Offer convenient RecvTxTimestamps() & RecvRxTimestamps() // create a socket and bind it directly to the interface // named eth0 and send/recv all IPv4/v6 TCP/UDP traffic! iface, _ := net.InterfaceByName("eth0") socket, _ := fastafpacket.Listen(iface, unix.SOCK_RAW, unix.ETH_P_ALL, nil)
  • 18. Packet Sender/Receiver for range ticker.C { packet, _ := encodePacket(srcmac, srcip, srcport, dstmac, dstip, dstport, data) _, _ = conn.WriteTo(packet, &fastafpacket.Addr{HardwareAddr: dstmac}) } for { packet := make([]byte, 1024) _, _, ts, _ := conn.RecvTxTimestamps(packet) log.Println(“tx”, ts.Hardware.UnixNano(), ts.Software.UnixNano()) } // tx 1652138203507654453 1652138202507654453 for { packet := make([]byte, 1024) n, _, ts, _ := conn.RecvRxTimestamps(packet) log.Println(“rx”, ts.Hardware.UnixNano(), ts.Software.UnixNano(), ts.Hardware.Sub(time.Now())) } // rx 1652138204508141040 1652138203508141040 -423.796µs
  • 19. Use Cases ■ One-way latency NIC A -> NIC B ● Asymmetric measurements ● NIC B -> NIC A might be different ■ Observability into time being spent processing packets ● in/out kernel ● in/out NIC queues ■ Wire only ● Subtract kernel and NIC timestamps from userspace timestamps ● Removes jitter introduced from context switching (both Go runtime and OS) and NIC queuing ■ Target specific NICs ● Target which NIC the packet goes out and which destination NIC on the receiving side ● Expose slow NIC queues when comparing to other NICs on the same machine
  • 20. Timestamp Delays Once we have kernel and NIC timestamp information we can calculate the delays between each of the layers. ■ tx.Software.Sub(userspaceTxTime) = outgoing kernel delay ■ tx.Hardware.Sub(userspaceTxTime) = outgoing NIC delay ■ rx.Software.Sub(userspaceRxTime) = incoming kernel delay ■ rx.Hardware.Sub(userspaceRxTime) = incoming NIC delay
  • 21. Timestamping Support ■ Kernel timestamping should be able for most, but hardware timestamps depends on NIC hardware ■ Mellanox (owned by NVIDIA) NICs have support ■ Use ethtool to check your own hardware > ethtool -T eth0 Time stamping parameters for eth0: Capabilities: hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) software-system-clock (SOF_TIMESTAMPING_SOFTWARE) hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) PTP Hardware Clock: 0 Hardware Transmit Timestamp Modes: off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON)
  • 22. Further Reading ■ AF_PACKET: https://man7.org/linux/man-pages/man7/packet.7.html ■ SO_TIMESTAMPING: https://www.kernel.org/doc/html/latest/networking/timestamping.html ■ One-way Delay: https://api.semanticscholar.org/CorpusID:5433866 ■ Clock and Time Sync: https://arxiv.org/abs/2106.16140 ■ NTP: http://www.ntp.org
  • 23. Brought to you by Blain Smith @blainsmith blainsmith.com rocketscience.gg