Linux Network Concepts and Data Structures: sock, socket, and Their Relationships

Linux Network Concepts and Data Structures: sock, socket, and Their Relationships

Understanding how Linux handles network communication involves delving into several key concepts and data structures. At the heart of it all are the ideas of sockets, which provide an interface for applications to interact with the network, and the underlying kernel structures that manage these interactions.

1. The socket System Call and File Descriptors

In Linux, everything is treated as a file. Network connections are no exception. When an application wants to communicate over the network, it typically starts by calling the system call.

  • System Call: This call creates an endpoint for communication. It returns a file descriptor (an integer) that the application can then use to refer to this communication endpoint, just like it would a regular file.

  • File Descriptor (fd): This is a small, non-negative integer used to access an I/O resource. For sockets, the file descriptor acts as a handle to the kernel's internal representation of the socket.

2. The struct socket (Kernel-side Representation)

While the application sees a file descriptor, the kernel maintains a more complex data structure to represent the socket. This is primarily the .

: This is a core kernel data structure that represents a network socket. It acts as an abstraction layer, providing a generic interface for various network protocols (TCP, UDP, raw IP, etc.).

Key Fields within :

  • : Specifies the socket type (e.g., for TCP, for UDP, for raw IP).

  • : Represents the current state of the socket (e.g., , , ).

  • : Various flags related to the socket's behavior.

  • : A pointer to a , which defines the protocol-specific operations for this socket type.

  • : A pointer to a , which links the generic socket to protocol-specific data.

  • : A pointer to the associated with this socket's file descriptor, linking the socket to the VFS layer.

3. The struct sock (Protocol-Specific Representation)

The is arguably the most important data structure for understanding network communication within the Linux kernel. It holds all the protocol-specific information for a given socket. There's one per active network connection (or listening socket).

: This structure contains a vast amount of information related to a network connection, tailored to the specific protocol (TCP, UDP, etc.). It lives within the protocol stack.

Key Fields within :

  • : Address family (e.g., for IPv4, for IPv6).

  • : Protocol (e.g., , ).

  • , : Destination and source IP addresses.

  • , : Destination and source port numbers.

  • : Queue for incoming data waiting to be read.

  • : Queue for outgoing data waiting to be sent.

  • : Queue for incoming connection requests (for listening sockets).

  • : Current TCP state (e.g., , ).

  • , : Receive and send buffer sizes.

  • , : Currently allocated memory for buffers.

4. struct proto and struct proto_ops (Protocol Operations)

Linux's networking stack is highly modular. The and structures facilitate this modularity.

  • : This structure defines the general operations for a specific protocol (e.g., for TCP, for UDP). It contains function pointers to common operations.

  • : This structure defines the operations specific to a socket type for a given protocol. For example, the for TCP sockets or for UDP sockets.

5. sk_buff (Socket Buffer)

is the fundamental data structure used to pass network packets through the Linux kernel's networking stack. It's a highly optimized structure for managing packet data and metadata.

: Represents a network packet. It contains:

  • : Pointer to the actual packet data (headers and payload).

  • : The protocol of the packet.

  • : The network device it originated from or is destined for.

  • : Total length of the packet.

  • : Control buffer for protocol-specific data.

  • Link pointers for chaining s in queues.

Summary of Relationships:

  • is the user-space function that creates the initial kernel objects.

  • File Descriptors are the user-space handles to kernel sockets.

  • is the generic kernel representation of a socket, linking to the VFS and the protocol-specific data.

  • is the core, protocol-specific data structure holding all the details of a network connection or listening endpoint. It's the "engine room" of the socket.

  • provides the interface (function pointers) for generic socket operations, allowing the to interact with the underlying protocol.

  • defines general protocol-level operations.

  • is the data unit that flows through the network stack, managed by the 's queues.

This layered approach allows Linux to support a wide variety of network protocols and configurations efficiently, while providing a consistent interface to user-space applications.

Richard Weinberger

🐧🔐🛠 Security/Linux-Kernel consultant @ sigma star gmbh

1mo

How do you know it's 73%

Mukul Chauhan

Embedded Software Engineer

1mo

Thanks for sharing David Zhu, i have a doubt how Diagnostic over IP works over ethernet? How can we manage it? Can you give us a big picture of the flow from user space to kernel?

Hyunkyu Shin

Embedded Software Engineer at Thundersoft

1mo

What a nice post 👍

Like
Reply
Aneesh Jain

Embedded Software Engineer | C/C++, Linux/Embedded Linux, MATLAB | Bare-Metal Programming | Zephyr | I help develop, test and, integrate embedded applications

1mo

This is good David Zhu

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics