How Generic Receive Offloading (GRO) boosts network performance

Architect at Flipkart

You would have encountered unrealistically large packet sizes while analyzing tcpdump! The reason isn't a network glitch, it's Generic Receive Offloading (GRO), and it's a huge win for performance. GRO is a software technique that significantly optimizes CPU usage by reducing the number of individual packets the CPU has to process. It combines similar packets into one large packet before they are sent up the network stack. This dramatically reduces the overhead associated with per-packet processing. Point to note is, the cost of processing a packet is not proportional to its size. The work of inspecting headers, performing checksums, and passing data up the stack is relatively constant. By combining many small packets, GRO amortizes this fixed cost over a much larger amount of data. In high throughput scenarios this allows the system to handle much more data with the same CPU resources, improving overall performance and also allowing the CPU to focus on application level tasks rather than spending its cycles on packet-by-packet overhead. PS: You can check if it's enabled on your machines by executing sudo ethtool -k <interface_name> | grep generic-receive-offload PPS: Also, tcpdump captures packets at a higher level in the network stack, after the kernel has already received the packets and has performed optimizations like GRO. It does not capture packets directly from the NIC's ring buffer :) #Networking #Linux #Performance

To view or add a comment, sign in

More Relevant Posts

Sidharth Seela

LFX 2025 LKMP Intern | Embedded Software Engineer | putting logic inside electromechanical devices.
2w Edited
Report this post
Falcon Boot, A way of booting embedded applications without which a product might not be a "shippable product". Have you wondered, why some good Linux products (Excluding Computers) don't show a boot menu nor any boot logs. Also that their boot-times are less than 1-2 seconds. The reason is optimising boot for user experience. Slow boot-times means missing an opportunity to capture a moment and much more. To boot using falcon, some changes in compile-config were needed, some features had to be removed to make space. After using u-boot's documentation , bootlin's slides and some help from the u-boot maintainers , Voila. Although u-boot optimisation might not give more than a second of improvement. But in the bigger scheme, it might mean saving a lot of energy from sleep-wake cycles, no need of carrying DRAM current at sleep? CPU can sleep-tight. Total boot-time( from reset to shell): 2.8Seconds. Let's hope by the new year that fraction vanishes. :) #Linux #Embedded #EmbeddedLinux #uBoot #LinuxFoundation https://lnkd.in/g_UFZjtH

UNDERSTANDING PENGUINS AND PUFFERS sidharth-458.github.io
Like Comment
To view or add a comment, sign in
Jetmir Haxhisefa

Experienced Tech Lead | Entrepreneur | Leading Engineering teams to drive Business Growth Through Technology
1w
Report this post
Go 1.25 just introduced container-aware GOMAXPROCS defaults. For most people outside infra this might sound like a minor runtime detail, but it’s a pretty big deal if you’re running Go apps in Kubernetes or any container platform. Before, Go simply set GOMAXPROCS to match the number of CPU cores on the machine. Which meant that if your container had a CPU limit set lower than the machine’s cores, Go would still try to use more threads than it was allowed. The result: the Linux kernel throttled you in 100ms chunks. That’s wasted cycles and ugly tail latency spikes. Now Go looks at the container CPU limits and adjusts automatically. No more mismatched defaults, no more silent throttling ruining your p99. If the orchestrator changes the limit on the fly, Go adapts on the fly too. It’s one of those changes that feels small, but in practice it makes Go apps more predictable and less surprising out of the box. Less time debugging weird latency, more time building the actual product. I’m curious, how often do you explicitly tune GOMAXPROCS in your services, or do you mostly let the runtime handle it?
2 Comments
Like Comment
To view or add a comment, sign in
Leigh Trinity

Board of Directors @ Malware Village | Exploit dev, Malware analysis
3w
Report this post
Morning friends. If you ever wondered how to begin the reverse engineering process; here it is. Network analysis and some windows internals come into play. Using TCPVIEW and Windbg for the win. 😇

2 Comments
Like Comment
To view or add a comment, sign in
emran abu bakar

Average Joe Next Door
2w Edited
Report this post
Decoding Falcon Kernel Messages – What They Really Mean When Linux servers stall or crash, kernel logs often surface mysterious references like: falcon_lsm_pinned, pinnedhook_security_file_permission, or even unload_network_ops_symbols. Here’s a quick guide to help understand what’s noise versus what matters: 1. Syscall Hooks – Normal, Non-Blocking Examples: pinnedhook_security_file_permission twnotify_sys_write Falcon uses LSM hooks to monitor syscalls (open, write, exec). These are always expected. Not a root cause of stalls. 2. Symbol “Noise” – Diagnostic Only Examples: unload_network_ops_symbols+0x94e8/0x9720 [falcon_lsm_pinned] This looks like unload/reload, but it’s just the kernel unwinder printing symbols. Cosmetic only, with no system impact. 3. Driver Lifecycle – Potential Disruption During Reloads Examples: falcon_lsm_pinned: module loaded, version 15907 falcon: watchdog timeout detected, reloading kernel driver These appear during sensor updates or watchdog restarts. They may cause short interruptions while hooks reload. 4. Errors / OOPS – Critical Examples: BUG: unable to handle kernel NULL pointer dereference … falcon_file_permission general protection fault in falcon_lsm_pinned Rare, but if you see this, Falcon is directly in the fault path. These require escalation to CrowdStrike and Red Hat. Key Takeaway Most Falcon entries are expected monitoring noise. The only real concern is Category 4 OOPS, which is rare but blocking. #crowdstrike #sosreport

4 Comments
Like Comment
To view or add a comment, sign in
Deepak Keshri

Embedded platform developer | Kernel developer | Linux Device Driver | QNX | BSP | Integration | GDB ,Objdump| Technical Leader at KPIT| Ex- Allegion | Ex-Harman
1w
Report this post
🧩 Mapping Kernel Memory to User Space – Simplified One of the most powerful (and tricky!) aspects of Linux kernel development is mapping kernel memory into user space. This lets applications interact directly with kernel or device memory – avoiding costly data copies and boosting performance. 🚀 Here’s a breakdown ⬇️ 🔹 Why Map Kernel Memory? CPU runs in unprivileged mode in user space. Sometimes, user apps need access to kernel/device memory. Instead of duplicating data, we map memory directly → saves time, space, and overhead. 🔹 The Key Function: remap_pfn_range() Maps physical memory pages into a user process’s VMA (Virtual Memory Area). Inputs: VMA structure, PFN (Page Frame Number), size, protection flags. Result: User space sees the same memory as the kernel – no extra copies! 👉 Example: Used when implementing the mmap() system call inside drivers. 🔹 I/O Memory Mapping: io_remap_pfn_range() Special version for mapping device I/O memory (like registers). Only the PFN source differs. Common use case: /dev/mem → giving apps direct access to device memory. 🔹 The mmap() File Operation From the driver side: 1️⃣ Driver implements mmap() callback. 2️⃣ User program calls mmap(). 3️⃣ Kernel sets up the VMA mapping. 4️⃣ User process now accesses device/kernel memory like a file, but much faster ⚡. 📝 Takeaway remap_pfn_range() → RAM → User space. io_remap_pfn_range() → I/O memory → User space. mmap() → The bridge between drivers & applications. These are the backbone of efficient device drivers, reducing overhead and enabling high-performance memory access in embedded & OS development. 💡 Fun fact: Every time you call mmap() in user space, the kernel silently sets up these mappings for you. The magic happens behind the scenes. ✨ #Linux #KernelDevelopment #DeviceDrivers #BSP #EmbeddedSystems #kernel
2 Comments
Like Comment
To view or add a comment, sign in
Pragati Gupta

Passionate About C Programming, Algorithms & Tech Innovation
1w
Report this post
Day 9 – Portability: C Everywhere C is like a universal adapter. Write code once, and with small changes, it can run on desktops, servers, or even microcontrollers. That’s why it’s still taught worldwide and still runs the backbone of technology. Your little “Hello World” in C carries the same DNA as the code running in systems across industries. 👉 Do you think portability matters more for beginners, or is it better to just focus on getting programs to work?
Like Comment
To view or add a comment, sign in
Stephane Thirion

CEO chez Raidho Consulting
3w Edited
Report this post
Diagnosing and fixing TrueNAS performance issues can be a real challenge. I recently transformed a sluggish system delivering just 80MB/s writes into a powerhouse hitting 1.8GB/s and the root cause wasn’t what I expected. It wasn’t a single problem, but a combination of consumer-grade drives, misconfigured cache and memory pressure. This deep dive case study details my complete troubleshooting journey, highlighting the importance of systematic diagnostics and understanding how seemingly minor issues can compound into major performance bottlenecks. If you’re battling slow storage speeds or just want to learn how to optimize your TrueNAS setup, check out the full story here: https://lnkd.in/efQ8Si_g #TrueNAS #Storage #Performance #Troubleshooting #DIY #Homelab #NAS #IT #DataStorage #Automation #Tech #SystemAdmin #Linux #ITPro
Like Comment
To view or add a comment, sign in
Shubhangi Ingle

C/C++ Software Developer | Automotive | IoT | Embedded System |Open Source Contributor | Technical Blog Writer
2w
Report this post
When facing performance issues in production apps, every engineer encounters challenges like slow database queries or sluggish automotive ECU responses. The crucial question in such scenarios is often, "Is it the code, CPU, disk, or network?" Linux offers valuable insights to address these concerns if we understand where to focus our attention. In my recent article, I delve into essential Linux performance analysis tools such as vmstat, mpstat, iostat, sar, strace, and perf. I aim to demystify these tools for beginners while ensuring their relevance in practical debugging scenarios. This walkthrough is particularly beneficial for industry applications like: - Resolving high-latency database services - Enhancing performance in automotive control systems - Optimizing C++ backend systems during heavy loads For a detailed exploration of how to troubleshoot application latency issues using Linux performance analysis tools, check out my article here: https://lnkd.in/dfQmxWrH If you've ever found yourself grappling with performance challenges, this guide can assist you in transitioning from guesswork to precise measurement. #Linux #PerformanceEngineering #Cplusplus #Debugging #AutomotiveSoftware #DatabasePerformance #SystemDesign

Linux Performance Analysis Tools: How to Troubleshoot Application Latency Issues medium.com
Like Comment
To view or add a comment, sign in
Genode Labs

214 followers
1mo
Report this post
Genode 25.08 is out and brings a whole bunch of deep technical improvements of the framework in preparation of the next Sculpt OS release in October. A highlight of the release is the introduction of a new kernel scheduler for fairness and low latency to our custom kernel platform. The redesigned scheduler specifically addresses the requirements of Sculpt OS. The design was inspired by Borrowed-virtual-time (Duda and Cheriton 1999) and takes years of experience with dynamic workloads on diverse (micro-)kernels into account. Further, we updated our drivers ported from Linux to version 6.12 to further establish our support for recent peripherals and devices. Our block-storage stack experienced a thorough performance analysis and optimizations. Also, our support of seL4 as underlying kernel was updated to version 13.0 of the kernel and conditioned for better scalability, i.e., initial support in Sculpt OS. The comprehensive documentation at https://lnkd.in/eQPFMQYP comprises all details of these and further changes in the release.

Release notes for the Genode OS Framework 25.08 genode.org

1 Comment
Like Comment
To view or add a comment, sign in
Amin Gattout

Ingénieur logiciel embarqué (MCU – Linux embarqué )
1w
Report this post
A while ago when I was exploring platform devices in the Linux kernel I came across an important function : of_platform_default_populate_init(void) It creates platform devices from the Device Tree. I was naively expecting that this function would be called in the kernel main function. I was quite stumped at the time: how is this important function not called ? After further investigation I saw it was only referenced once in the kernel, just a few lines after the definition of this function: arch_initcall_sync(of_platform_default_populate_init). The discovery of initcalls was mind-blowing. An initcall is simply a function that the kernel runs automatically during boot, but not because it’s called in main. Instead, when you tag a function with a macro like arch_initcall, the build system places its address into a special section of the kernel image. At runtime, the kernel walks through these sections in order and executes each registered function. That’s why of_platform_default_populate_init() is executed without ever being explicitly called: the arch_initcall_sync macro ensures it’s registered in the architecture initcalls stage, and the kernel takes care of invoking it at exactly the right point in the boot sequence. You can find more about initcalls here : https://lnkd.in/eDRdMwQ7

An introduction to Linux kernel initcalls collabora.com
Like Comment
To view or add a comment, sign in

1,266 followers

19 Posts

View Profile Follow

LinkedIn respects your privacy

How Generic Receive Offloading (GRO) boosts network performance

Explore content categories