Why the Kernel Is Always There — Even When It’s Not Running

Why the Kernel Is Always There — Even When It’s Not Running

(Part 4 of the series: The Kernel Is Not a Process)

In earlier parts of this series, we explored how the Linux kernel is not a task. It is not scheduled like a process, and it doesn’t have a PID or a runqueue slot. It is entered—by system calls, by interrupts, by traps—and it executes code in response to these events.

But this brings us to a more subtle question:

If the kernel is not a process, not a thread, and not a scheduled entity, how can it be always present in the system?

The answer lies in how the kernel is structured into memory, how it is protected by hardware, and how the CPU treats it as a privileged substrate—one that exists beneath every process, yet remains untouchable unless the system explicitly transitions to it.


The Kernel Is Always Mapped

The Linux kernel is permanently loaded into physical memory during early boot. But more importantly, it is also mapped into the virtual address space of every user process.

On 64-bit architectures, the kernel typically occupies the upper portion of the virtual address space. For example, on x86_64:

  • User space: 0x0000000000000000 to 0x00007fffffffffff
  • Kernel space: 0xffff800000000000 and above

Each region spans 128 TiB within the 48-bit virtual address space.

Each process shares the same kernel mapping. This is not duplication—it’s a globally consistent virtual region, mapped into every process by the kernel's memory management unit (MMU).

However, this mapping comes with a catch:

The kernel’s address range is inaccessible when the CPU is in ring 3.

Protected by Hardware: Enter Ring 0 or Fault

The kernel’s memory pages are marked in the page table with access flags:

  • Supervisor-only (privileged)
  • Executable, read-only, or writeable depending on the section
  • Global (shared across all address spaces)

If a user-space thread tries to access kernel memory directly—whether by dereferencing a pointer or jumping to a kernel address—the CPU will raise a page fault. This is enforced at the hardware level using the page table's User/Supervisor (U/S) bit, and further protected by the Write Protect bit in control register CR0.

No matter what tricks user space might attempt, the only way to safely access kernel memory is to enter ring 0—through a mechanism like syscall, int, or SVC, which the kernel controls.

This mapping strategy—kernel mapped, but protected—allows for:

  • Fast transitions (no need to switch page tables on syscall)
  • Consistent access to per-thread kernel stacks
  • A unified execution substrate shared across all processes


Per-Thread Kernel Stacks: One for Each Context

Every thread in Linux is assigned its own kernel-mode stack, separate from the user-mode stack. This stack is allocated at thread creation and is used during any transition into kernel mode.

When the CPU enters the kernel (on syscall or interrupt), it switches to this kernel stack:

  • Not shared with user space
  • Protected from overflows via guard pages
  • Used for control flow, local variables, and stack frames

This design ensures that even though the kernel is not scheduled as a thread, it has a safe execution context in every thread that might enter it.

There is no “kernel thread” waiting around to service syscalls. Instead, the kernel stack and code are ready and mapped, waiting for the CPU to transfer control.


No Scheduling Required — Yet the Kernel Schedules

The kernel is not a scheduled entity—but it is the author of scheduling.

When a syscall like nanosleep(), poll(), or read() causes a thread to block, it is the kernel that:

  • Puts the thread to sleep
  • Decides when to wake it
  • Chooses which thread or process should run next

Yet the kernel itself doesn’t need a scheduler time slice. Its code executes in the context of whatever caused the transition—whether a user thread entering via a system call or an interrupt handler responding in non-threaded context.

Once the kernel finishes its work, it doesn’t remain in control. It either returns to user space (for user threads) or yields to the scheduler (for kernel threads), resuming later as needed.


Always Present, Never Running

This is the true nature of the Linux kernel:

  • It’s always mapped into memory—part of every process’s address space.
  • It’s always protected—accessible only in kernel mode (ring 0).
  • It’s always prepared—with stacks, entry points, and handlers ready to respond.

But it’s never “running” in the traditional sense.

You won’t see it in top or ps. You can’t send it a signal. It doesn’t loop, wait, or schedule itself like a user process. It doesn’t need to.

And yet, every file read, every packet sent, every interrupt handled—all of it flows silently through the kernel. It doesn’t run alongside your programs; it enables them from below.


Conclusion: Presence Without Execution

Across this series, we’ve examined the Linux kernel from multiple perspectives:

The picture that emerges is unlike any process, daemon, or service. The kernel does not loop or idle. It does not launch itself. It does not wait to be scheduled — because it is the one doing the scheduling.

Its memory is always resident. Its code is always mapped. Yet it only becomes active in hardware-controlled moments: when a thread crosses the boundary, when a device interrupts, when a fault occurs.

Sometimes the kernel runs a system call. Sometimes it responds to an IRQ. Sometimes it defers work to a kernel thread that wakes quietly, does its job, and yields again.

The kernel is not the thing you see. It is the thing that makes everything else visible.

It does not live in user space. But user space exists because it does.

It has no lifecycle. It defines the lifecycle.

Moon Hee Lee

Linux Systems Engineer | Linux kernel Bug Fixing

3mo

For the full series, see The Kernel In the Mind 🐧 https://www.linkedin.com/pulse/kernel-mind-moon-hee-lee-miwze

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics