tcpdump mailing list archives

Re: Setting BPF_SPECIAL_VLAN_HANDLING on a "dead" handle


From: Guy Harris <gharris () sonic net>
Date: Mon, 7 Jul 2025 13:26:25 -0700

On Jul 7, 2025, at 6:42 AM, Denis Ovsienko <denis () ovsienko info> wrote:

One thing that can complicate this is that some always-true and
always-false components are in fact specific to the link-layer type,
for example, "ip" generates:
* always-true for DLT_IPV4
* always-false for DLT_IPV6
* a load and a comparison for DLT_RAW

Yes.

What I was thinking of was to generate a higher-level intermediate representation in the parser; that IR would be 
link-layer-independent, and would *not* be a form of cBPF or eBPF machine code, so, for example, it wouldn't know about 
particular registers, and operations would not necessarily correspond to particular cBPF or eBPF instructions.  There 
could probably be a bunch of optimizations done to programs in that IR.

A separate pass would, for a given link-layer type, modify the IR code to correspond to code for that link-layer type, 
e.g. replacing a higher-level operations such as "compare the destination MAC address against this value" or "compare 
the link layer's protocol field against this type" with code that knows where those fields are in the packet (and, in 
the case of he protocol field, what values correspond to particular protocols), and do further optimizations.

The final pass would generate machine code for a particular target:

        cBPF for a packet that corresponds to what's on the wire;

        cBPF for a packet that has the outermost VLAN tag removed and put into special metadata;

        etc.

and possible eBPF versions of those if there are advantages to directly handing eBPF to the Linux kernel rather than 
handing it cBPF and letting it translate that to eBPF.

(If we can figure out how to eliminate recursive algorithms in favor of iterative ones, that might be an advantage; 
sadly, with all these fuzzers out there, "to iterate is human; to recurse is divine" has turned into "to iterate is 
human; to recurse is to request a ton of "ZOMG this test gets a stack overflow!!!!111ONE!!!!!!".

Generating a parse tree in the first pass risks adding a shiny new recursive algorithm to upset fuzzers, although, if 
it makes certain things easier, if we can limit the recursion depth to something such that a fuzzer would have to 
*really* go crazy to provoke a stack overflow, that might be OK.)
_______________________________________________
tcpdump-workers mailing list -- tcpdump-workers () lists tcpdump org
To unsubscribe send an email to tcpdump-workers-leave () lists tcpdump org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


Current thread: