SlideShare a Scribd company logo
Linux kernel TLV meetup, 28.02.2016
kerneltlv.com
Kfir Gollan
 What is the Berkeley Packet Filter?
 Writing BPF filters
 Debuging BPF
 Using BPF in user-space applications
 Advanced features of BPF
 BPF at its base is a way to perform fast packet filtering
at the kernel level.
 Filters are defined by user space
 Filters are executed in the kernel
 Invented by Steven McCanne and Van Jacobson in
1990. First publication at December 1992.
 Support for BPF in Linux was added by Jay Schulist for
the 2.5 development kernel.
 Many features were added later on.
For example JIT for BPF was added for the 3.0 kernel
(2011)
 We want to be able to filter packets at the kernel level
 Discard irrelevant packets at the kernel without copying
them to user space.
 The performance gains when using promiscuous mode
are substantial.
 We want to change filters dynamically without
recompiling the kernel or using a custom kernel
module.
 We want the filters to be architecture independent.
Berkeley Packet Filters
Berkeley Packet Filters
 BPF defines a set of operations that can be performed
on the filtered packet. Each operation gets its own
opcode.
 BPF was designed to be protocol indepented, as such it
treats the packets as raw buffers. It is up to the filter
writer to parse the needed packet headers.
 BPF is based on three building blocks:
 A: A 32 bit wide accumulator
 X: A 32 bit wide index register
 M[]: 16 x 32 bit wide misc registers aka “scratch
memory”
 LOAD: copy a value into the accumulator or index
register.
 STORE: copy either the accumulator or index
register to the scratch memory.
 ALU: perform arithmetic or logic operation on
the accumulator register.
 BRANCH: alter the flow of control.
 RETURN: terminate the filter and indicate what
portion of the packet to save.
 MISC: various operations that doesn’t match the
other types.
jf: 8jt: 8opcode: 16
k: 32
 Each instruction is represented by 64 bits.
 opcode 16 bits, indicates the instruction type.
 jt 8 bits, offset of the next instruction for true
case (“if” block) – Jump True.
 jf 8 bits, offset of the next instruction for false
case (“else” block) – Jump False.
 k 32 bits, generic data, used for various
purposes.
DescriptionInstruction
Load word into Ald
Load half-word into Aldh
Load byte into Aldb
Load word into Xldx
Load byte into Xldxb
Store A into M[]st
Store X into M[]stx
Jump to labeljmp
Jump on k == Ajeq
Jump on k > Ajgt
Jump on k >= Ajge
Jump on k & Ajset
DescriptionInstruction
A + <x>add
A - <x>sub
A * <x>mul
A / <x>div
A & <x>and
A | <x>or
A ^ <x>xor
A << <x>lsh
A >> <x>rsh
Returnret
Copy A into Xtax
Copy X into Atxa
DescriptionMode
The literal value stored in k.#k
The length of the packet.#len
The word at offset k in the scratch
memory store.
M [ k ]
The byte, half-word or word at byte
offset k in the packet.
[ k ]
The byte, half-word or word at byte
offset x+k in the packet.
[ x + k ]
Jump to label LL
Jump to lt if true, otherwise jump to l f#k, lt. lf
The index registerx
Four times the value of the low four bits
of the byte at the offset k in the packet
4 * ([k] & 0xf)
 MAC header
 IP header
TypeSource MACDestination MAC
2 bytes6 bytes6 bytes
 To catch all the IP packets over MAC we need to check
the type field in the MAC header.
ldh [12]
jeq #ETHERTYPE_IP, L1, L2
L1: ret #-1
L2: ret #0
 Load half-word (2 bytes) from offset 12 of the packet
(the type field) into A register.
 Check if A register is equal to #ETHERTYPE_IP
 If true -> return #-1
 If false -> return 0
ldh [12] ; A <= ether.type
jeq #ETHERTYPE_IP, L1, L3 ; A == #ETHERTYPE_IP ?
L1: ld [26] ; A = ip.src
and #0xffffff00 ; A = A & 0xffffff00
jeq #0x80037000, L3, L2 ; A == 128.3.112.0
L2: ret #-1
L3: ret #0
 Check if the packet type is IP
 “Remove” the lower byte of the src IP
 Check if the src IP matches 128.3.112.X
Berkeley Packet Filters
 A utility used to create bpf binary code (bpf
“assembly” compiler”).
 Part of the mainline kernel. tools/net/bpf_asm.c
 Supports two output formats
 c style output
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 1, 0x00000800 },
{ 0x06, 0, 0, 0xffffffff },
{ 0x06, 0, 0, 0000000000 },
 raw output
4,40 0 0 12,21 0 1 2048,6 0 0 4294967295,6 0 0 0,
 A utility used to debug bpf filters
 Part of the mainline kernel. tools/net/bpf_dbg.c
 Main features
 pcap files as input for filters.
 bpf-asm raw output format for bpf filter definition.
 single-stepping through filters
 breakpoints
 internal status (A,X,M, PC)
 disassemble raw bpf to bpf-asm
Berkeley Packet Filters
Berkeley Packet Filters
struct sock_filter { /* Filter block */
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
};
 Filter block is in fact a single BPF instruction.
 Used to pass a filter specifications to the kernel.
struct sock_filter code[] = {
{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 8, 0x000086dd },
{ 0x30, 0, 0, 0x00000014 },
…
};
struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
unsigned short len; /* Number of filter blocks */
struct sock_filter *filter; /* Filter blocks list */
};
 A parameter for setsockopt that allows to attach a filter
to a socket.
struct sock_fprog bpf = {…};
setsockopt(int fd,
SOL_SOCKET,
SO_ATTACH_FILTER,
&bpf.
sizeof(bpf));
 SO_ATTACH_FILTER
Attach a BPF filter to a socket.
Note: only a single filter can be attached at a given
time.
 SO_DETACH_FILTER
Remove the currently attached filter from the socket.
 SO_LOCK_FILTER
Lock a filter on a socket. This is useful for setting a
filter and then dropping privileges.
 Example: create a raw socket, apply a filter, lock it, drop
CAP_NET_RAW
 Choosing the correct flags to the socket is critical for
making the filter work properly.
 The filtered buffer will start in the wanted location in the net
stack.
 Selecting the socket domain
 AF_PACKET – filtering at L2 (e.g ethernet)
 AF_INET – IPv4 filtering
 AF_INET6 – IPv6 filtering
 Selecting the socket type
 SOCK_RAW – raw filtering, no headers are handled by the
kernel
 SOCK_STREAM/SOCK_DGRAM etc – headers are handled
by the kernel
 libpcap – Packet CAPture library
 Provides an easy to use api for packet filtering.
 Supports a high level filtering format which is
converted to BPF.
 pcap_compile – create a bpf filter
 pcap_setfilter – attach a filter
 Look at man 7 pcap-filter for a detailed description.
ether dst [mac address]
dst net [ip address]
dst portrange [port1]-[port2]
 user-space packet sniffing program.
 Uses bpf for kernel level filtering (based on pcap)
 Dump the generated bpf filter
 -d bpf asm format
 -dd c format
 -ddd bpf raw format
$ sudo tcpdump -d "ip and udp“
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 5
(002) ldb [23]
(003) jeq #0x11 jt 4 jf 5
(004) ret #65535
(005) ret #0
Berkeley Packet Filters
 A just-in-time BPF instruction translation.
 Note that the translation is performed when attaching
the filter via bpf_jit_compile(..).
 BPF instructions are mapped directly to architecture
depended instructions.
 BPF registers are mapped to machine physical registers
 Provides a performance gain
 About 50ns per packet for simple filters (E5540 @
2.53GHz).
The more complex the filter the better performance
gains from JIT.
 Supported on x86,x86-64, powerpc. arm and more.
 Each BPF filter is verified before attaching it to a
socket.
 This is critical, the filters come from userspace!
 The following rules are enforced
 The filter must not contain references or jumps that are
out of range.
 The filter must contain only valid BPF opcodes.
 The filter must end with RET opcode.
 All jumps are forward – loops are not allowed!
 The verification is implemented at sk_chk_filter
function in net/core/filter.c (until kernel 3.16),
modified after adding seccomp.
 The Linux kernel also has a couple of BPF extensions
that are used along with the class of load instructions.
 The extensions are "overloading" the k argument with
a negative offset + a particular extension offset.
 The result of such BPF extensions are loaded into A.
DescriptionInstruction
skb->lenlen
skb->protocolproto
skb->pkt_typetype
Payload start offsetpoff
skb->dev->ifindexifidx
skb->markmark
DescriptionInstruction
skb->queue_mappingqueue
skb->hashrxhash
Prandom_u32()rand
Executing cpu idcpu
Netlink attributesnla
skb->dev->typehatype
 extended BPF is an internal mechanism that can be used
only in kernel context (not from userspace!)
 eBPF adds a set of new features:
 Increased number of registers 10 instead of 2.
 Register width increased to 64 bit
 Conditional jf/jt targets replaced with jt/fall-through
 bpf_call instruction and register passing convention for zero
overhead calls from/to other kernel functions.
 Originally designed to be a “restricted C” language that will
be architecture independent and JITed in kernel context.
 Eventually used mostly for kernel tracing.
Berkeley Packet Filters
 SECure COMPuting, or seccomp, is a security
mechanism available in the linux kernel.
 It applies BPF filtering to syscalls
 filter the syscall number & its parameters
 It can be used to limit the available syscalls
 For example strict mode allows only read, write, _exit
and sigreturn
 Uses the BPF filters, the filtered buffers are different.
 Look at man 2 seccomp for more details.
Berkeley Packet Filters

More Related Content

PDF
eBPF - Rethinking the Linux Kernel
PDF
BPF Internals (eBPF)
PDF
Introduction to eBPF and XDP
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
PPTX
The TCP/IP Stack in the Linux Kernel
PPTX
eBPF Basics
PDF
Meet cute-between-ebpf-and-tracing
PDF
netfilter and iptables
eBPF - Rethinking the Linux Kernel
BPF Internals (eBPF)
Introduction to eBPF and XDP
LinuxCon 2015 Linux Kernel Networking Walkthrough
The TCP/IP Stack in the Linux Kernel
eBPF Basics
Meet cute-between-ebpf-and-tracing
netfilter and iptables

What's hot (20)

PDF
BPF: Tracing and more
PDF
UM2019 Extended BPF: A New Type of Software
PPTX
Understanding eBPF in a Hurry!
PPT
Linux memory
PPTX
Tutorial: Using GoBGP as an IXP connecting router
PPTX
Linux Network Stack
PDF
eBPF/XDP
PPTX
Dataplane programming with eBPF: architecture and tools
PDF
BPF - in-kernel virtual machine
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
PDF
Library Operating System for Linux #netdev01
PDF
eBPF Trace from Kernel to Userspace
PDF
The ideal and reality of NVDIMM RAS
PDF
Linux systems - Linux Commands and Shell Scripting
PDF
The linux networking architecture
PDF
The Linux Block Layer - Built for Fast Storage
PDF
Introduction of eBPF - 時下最夯的Linux Technology
PDF
Architecture Of The Linux Kernel
PPTX
Memory model
PDF
Kdump and the kernel crash dump analysis
BPF: Tracing and more
UM2019 Extended BPF: A New Type of Software
Understanding eBPF in a Hurry!
Linux memory
Tutorial: Using GoBGP as an IXP connecting router
Linux Network Stack
eBPF/XDP
Dataplane programming with eBPF: architecture and tools
BPF - in-kernel virtual machine
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Library Operating System for Linux #netdev01
eBPF Trace from Kernel to Userspace
The ideal and reality of NVDIMM RAS
Linux systems - Linux Commands and Shell Scripting
The linux networking architecture
The Linux Block Layer - Built for Fast Storage
Introduction of eBPF - 時下最夯的Linux Technology
Architecture Of The Linux Kernel
Memory model
Kdump and the kernel crash dump analysis
Ad

Similar to Berkeley Packet Filters (20)

PDF
Building Network Functions with eBPF & BCC
PDF
ebpf and IO Visor: The What, how, and what next!
PDF
DEF CON 27 - JEFF DILEO - evil e bpf in depth
PDF
Security Monitoring with eBPF
PDF
Efficient System Monitoring in Cloud Native Environments
PDF
eBPF Tooling and Debugging Infrastructure
PDF
story_of_bpf-1.pdf
PDF
Socket Programming- Data Link Access
ODP
eBPF maps 101
PDF
Andrea Righi - Spying on the Linux kernel for fun and profit
PDF
Spying on the Linux kernel for fun and profit
PDF
Introduction to eBPF
PDF
Kernel Recipes 2019 - BPF at Facebook
PDF
DCSF 19 eBPF Superpowers
PPTX
eBPF Workshop
PDF
Kernel bug hunting
PDF
BPF - All your packets belong to me
PDF
Meetup 2009
PDF
Packet sniffing
PDF
Transparent eBPF Offload: Playing Nice with the Linux Kernel
Building Network Functions with eBPF & BCC
ebpf and IO Visor: The What, how, and what next!
DEF CON 27 - JEFF DILEO - evil e bpf in depth
Security Monitoring with eBPF
Efficient System Monitoring in Cloud Native Environments
eBPF Tooling and Debugging Infrastructure
story_of_bpf-1.pdf
Socket Programming- Data Link Access
eBPF maps 101
Andrea Righi - Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
Introduction to eBPF
Kernel Recipes 2019 - BPF at Facebook
DCSF 19 eBPF Superpowers
eBPF Workshop
Kernel bug hunting
BPF - All your packets belong to me
Meetup 2009
Packet sniffing
Transparent eBPF Offload: Playing Nice with the Linux Kernel
Ad

More from Kernel TLV (20)

PDF
DPDK In Depth
PDF
SGX Trusted Execution Environment
PDF
Fun with FUSE
PPTX
Kernel Proc Connector and Containers
PPTX
Bypassing ASLR Exploiting CVE 2015-7545
PDF
Present Absence of Linux Filesystem Security
PDF
OpenWrt From Top to Bottom
PDF
Make Your Containers Faster: Linux Container Performance Tools
PDF
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
PDF
File Systems: Why, How and Where
PDF
KernelTLV Speaker Guidelines
PDF
Userfaultfd: Current Features, Limitations and Future Development
PDF
Linux Kernel Cryptographic API and Use Cases
PPTX
DMA Survival Guide
PPSX
FD.IO Vector Packet Processing
PPTX
WiFi and the Beast
PPTX
Introduction to DPDK
PDF
FreeBSD and Drivers
PDF
Specializing the Data Path - Hooking into the Linux Network Stack
PPTX
Linux Interrupts
DPDK In Depth
SGX Trusted Execution Environment
Fun with FUSE
Kernel Proc Connector and Containers
Bypassing ASLR Exploiting CVE 2015-7545
Present Absence of Linux Filesystem Security
OpenWrt From Top to Bottom
Make Your Containers Faster: Linux Container Performance Tools
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
File Systems: Why, How and Where
KernelTLV Speaker Guidelines
Userfaultfd: Current Features, Limitations and Future Development
Linux Kernel Cryptographic API and Use Cases
DMA Survival Guide
FD.IO Vector Packet Processing
WiFi and the Beast
Introduction to DPDK
FreeBSD and Drivers
Specializing the Data Path - Hooking into the Linux Network Stack
Linux Interrupts

Recently uploaded (20)

PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
System and Network Administration Chapter 2
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Digital Strategies for Manufacturing Companies
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
AI in Product Development-omnex systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PTS Company Brochure 2025 (1).pdf.......
Adobe Illustrator 28.6 Crack My Vision of Vector Design
System and Network Administration Chapter 2
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Understanding Forklifts - TECH EHS Solution
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Which alternative to Crystal Reports is best for small or large businesses.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Wondershare Filmora 15 Crack With Activation Key [2025
Digital Strategies for Manufacturing Companies
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
How Creative Agencies Leverage Project Management Software.pdf
AI in Product Development-omnex systems
VVF-Customer-Presentation2025-Ver1.9.pptx
Operating system designcfffgfgggggggvggggggggg

Berkeley Packet Filters

  • 1. Linux kernel TLV meetup, 28.02.2016 kerneltlv.com Kfir Gollan
  • 2.  What is the Berkeley Packet Filter?  Writing BPF filters  Debuging BPF  Using BPF in user-space applications  Advanced features of BPF
  • 3.  BPF at its base is a way to perform fast packet filtering at the kernel level.  Filters are defined by user space  Filters are executed in the kernel  Invented by Steven McCanne and Van Jacobson in 1990. First publication at December 1992.  Support for BPF in Linux was added by Jay Schulist for the 2.5 development kernel.  Many features were added later on. For example JIT for BPF was added for the 3.0 kernel (2011)
  • 4.  We want to be able to filter packets at the kernel level  Discard irrelevant packets at the kernel without copying them to user space.  The performance gains when using promiscuous mode are substantial.  We want to change filters dynamically without recompiling the kernel or using a custom kernel module.  We want the filters to be architecture independent.
  • 7.  BPF defines a set of operations that can be performed on the filtered packet. Each operation gets its own opcode.  BPF was designed to be protocol indepented, as such it treats the packets as raw buffers. It is up to the filter writer to parse the needed packet headers.  BPF is based on three building blocks:  A: A 32 bit wide accumulator  X: A 32 bit wide index register  M[]: 16 x 32 bit wide misc registers aka “scratch memory”
  • 8.  LOAD: copy a value into the accumulator or index register.  STORE: copy either the accumulator or index register to the scratch memory.  ALU: perform arithmetic or logic operation on the accumulator register.  BRANCH: alter the flow of control.  RETURN: terminate the filter and indicate what portion of the packet to save.  MISC: various operations that doesn’t match the other types.
  • 9. jf: 8jt: 8opcode: 16 k: 32  Each instruction is represented by 64 bits.  opcode 16 bits, indicates the instruction type.  jt 8 bits, offset of the next instruction for true case (“if” block) – Jump True.  jf 8 bits, offset of the next instruction for false case (“else” block) – Jump False.  k 32 bits, generic data, used for various purposes.
  • 10. DescriptionInstruction Load word into Ald Load half-word into Aldh Load byte into Aldb Load word into Xldx Load byte into Xldxb Store A into M[]st Store X into M[]stx Jump to labeljmp Jump on k == Ajeq Jump on k > Ajgt Jump on k >= Ajge Jump on k & Ajset DescriptionInstruction A + <x>add A - <x>sub A * <x>mul A / <x>div A & <x>and A | <x>or A ^ <x>xor A << <x>lsh A >> <x>rsh Returnret Copy A into Xtax Copy X into Atxa
  • 11. DescriptionMode The literal value stored in k.#k The length of the packet.#len The word at offset k in the scratch memory store. M [ k ] The byte, half-word or word at byte offset k in the packet. [ k ] The byte, half-word or word at byte offset x+k in the packet. [ x + k ] Jump to label LL Jump to lt if true, otherwise jump to l f#k, lt. lf The index registerx Four times the value of the low four bits of the byte at the offset k in the packet 4 * ([k] & 0xf)
  • 12.  MAC header  IP header TypeSource MACDestination MAC 2 bytes6 bytes6 bytes
  • 13.  To catch all the IP packets over MAC we need to check the type field in the MAC header. ldh [12] jeq #ETHERTYPE_IP, L1, L2 L1: ret #-1 L2: ret #0  Load half-word (2 bytes) from offset 12 of the packet (the type field) into A register.  Check if A register is equal to #ETHERTYPE_IP  If true -> return #-1  If false -> return 0
  • 14. ldh [12] ; A <= ether.type jeq #ETHERTYPE_IP, L1, L3 ; A == #ETHERTYPE_IP ? L1: ld [26] ; A = ip.src and #0xffffff00 ; A = A & 0xffffff00 jeq #0x80037000, L3, L2 ; A == 128.3.112.0 L2: ret #-1 L3: ret #0  Check if the packet type is IP  “Remove” the lower byte of the src IP  Check if the src IP matches 128.3.112.X
  • 16.  A utility used to create bpf binary code (bpf “assembly” compiler”).  Part of the mainline kernel. tools/net/bpf_asm.c  Supports two output formats  c style output { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 1, 0x00000800 }, { 0x06, 0, 0, 0xffffffff }, { 0x06, 0, 0, 0000000000 },  raw output 4,40 0 0 12,21 0 1 2048,6 0 0 4294967295,6 0 0 0,
  • 17.  A utility used to debug bpf filters  Part of the mainline kernel. tools/net/bpf_dbg.c  Main features  pcap files as input for filters.  bpf-asm raw output format for bpf filter definition.  single-stepping through filters  breakpoints  internal status (A,X,M, PC)  disassemble raw bpf to bpf-asm
  • 20. struct sock_filter { /* Filter block */ __u16 code; /* Actual filter code */ __u8 jt; /* Jump true */ __u8 jf; /* Jump false */ __u32 k; /* Generic multiuse field */ };  Filter block is in fact a single BPF instruction.  Used to pass a filter specifications to the kernel. struct sock_filter code[] = { { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 8, 0x000086dd }, { 0x30, 0, 0, 0x00000014 }, … };
  • 21. struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ unsigned short len; /* Number of filter blocks */ struct sock_filter *filter; /* Filter blocks list */ };  A parameter for setsockopt that allows to attach a filter to a socket. struct sock_fprog bpf = {…}; setsockopt(int fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf. sizeof(bpf));
  • 22.  SO_ATTACH_FILTER Attach a BPF filter to a socket. Note: only a single filter can be attached at a given time.  SO_DETACH_FILTER Remove the currently attached filter from the socket.  SO_LOCK_FILTER Lock a filter on a socket. This is useful for setting a filter and then dropping privileges.  Example: create a raw socket, apply a filter, lock it, drop CAP_NET_RAW
  • 23.  Choosing the correct flags to the socket is critical for making the filter work properly.  The filtered buffer will start in the wanted location in the net stack.  Selecting the socket domain  AF_PACKET – filtering at L2 (e.g ethernet)  AF_INET – IPv4 filtering  AF_INET6 – IPv6 filtering  Selecting the socket type  SOCK_RAW – raw filtering, no headers are handled by the kernel  SOCK_STREAM/SOCK_DGRAM etc – headers are handled by the kernel
  • 24.  libpcap – Packet CAPture library  Provides an easy to use api for packet filtering.  Supports a high level filtering format which is converted to BPF.  pcap_compile – create a bpf filter  pcap_setfilter – attach a filter  Look at man 7 pcap-filter for a detailed description. ether dst [mac address] dst net [ip address] dst portrange [port1]-[port2]
  • 25.  user-space packet sniffing program.  Uses bpf for kernel level filtering (based on pcap)  Dump the generated bpf filter  -d bpf asm format  -dd c format  -ddd bpf raw format $ sudo tcpdump -d "ip and udp“ (000) ldh [12] (001) jeq #0x800 jt 2 jf 5 (002) ldb [23] (003) jeq #0x11 jt 4 jf 5 (004) ret #65535 (005) ret #0
  • 27.  A just-in-time BPF instruction translation.  Note that the translation is performed when attaching the filter via bpf_jit_compile(..).  BPF instructions are mapped directly to architecture depended instructions.  BPF registers are mapped to machine physical registers  Provides a performance gain  About 50ns per packet for simple filters (E5540 @ 2.53GHz). The more complex the filter the better performance gains from JIT.  Supported on x86,x86-64, powerpc. arm and more.
  • 28.  Each BPF filter is verified before attaching it to a socket.  This is critical, the filters come from userspace!  The following rules are enforced  The filter must not contain references or jumps that are out of range.  The filter must contain only valid BPF opcodes.  The filter must end with RET opcode.  All jumps are forward – loops are not allowed!  The verification is implemented at sk_chk_filter function in net/core/filter.c (until kernel 3.16), modified after adding seccomp.
  • 29.  The Linux kernel also has a couple of BPF extensions that are used along with the class of load instructions.  The extensions are "overloading" the k argument with a negative offset + a particular extension offset.  The result of such BPF extensions are loaded into A. DescriptionInstruction skb->lenlen skb->protocolproto skb->pkt_typetype Payload start offsetpoff skb->dev->ifindexifidx skb->markmark DescriptionInstruction skb->queue_mappingqueue skb->hashrxhash Prandom_u32()rand Executing cpu idcpu Netlink attributesnla skb->dev->typehatype
  • 30.  extended BPF is an internal mechanism that can be used only in kernel context (not from userspace!)  eBPF adds a set of new features:  Increased number of registers 10 instead of 2.  Register width increased to 64 bit  Conditional jf/jt targets replaced with jt/fall-through  bpf_call instruction and register passing convention for zero overhead calls from/to other kernel functions.  Originally designed to be a “restricted C” language that will be architecture independent and JITed in kernel context.  Eventually used mostly for kernel tracing.
  • 32.  SECure COMPuting, or seccomp, is a security mechanism available in the linux kernel.  It applies BPF filtering to syscalls  filter the syscall number & its parameters  It can be used to limit the available syscalls  For example strict mode allows only read, write, _exit and sigreturn  Uses the BPF filters, the filtered buffers are different.  Look at man 2 seccomp for more details.

Editor's Notes

  • #4: I used the following LWN articles for this slide: BPF: the universal in-kernel virtual machine: https://lwn.net/Articles/599755/ A JIT for packet filters: http://lwn.net/Articles/437981/
  • #6: Image taken from: http://www.tiger1997.jp/report/activity/securityreport_20131111.html
  • #8: Information taken from: https://www.kernel.org/doc/Documentation/networking/filter.txt
  • #9: Taken from: The BSD Packet Filter: A New Architecture for User-level Packet Capture http://www.tcpdump.org/papers/bpf-usenix93.pdf
  • #11: Note: this is only a subset of the BPF instruction set. Taken from https://www.kernel.org/doc/Documentation/networking/filter.txt
  • #21: https://www.kernel.org/doc/Documentation/networking/filter.txt
  • #22: https://www.kernel.org/doc/Documentation/networking/filter.txt
  • #26: taken from https://blog.cloudflare.com/bpf-the-forgotten-bytecode/
  • #28: JIT benchmark can be found here https://lwn.net/Articles/437986/
  • #29: http://lxr.free-electrons.com/source/net/core/filter.c?v=3.16#L1230
  • #31: http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-step.html