The Birth of Threads: From Fork to clone and Beyond

Abdulrahman Nader

Developer @ GEO-Solutions | Learning C, x86 & Linux | Hopes to be a systems programmer

Published Apr 15, 2025

In response to a request I couldn’t turn down, I’ll try—with Allah’s help—to share` what I know about threads, or more formally known as Lightweight Processes.

But listen, if we’re going to talk about threads, I don’t think it makes sense to jump in from the middle. We gotta roll back the clock a bit—way back actually—to the days before time-sharing systems even existed.

Back in the Day: Batch Systems and the Road to Time Sharing

Back then, computing systems were built around Batch Processing. You had a bunch of executable programs loaded in memory, and each program would run one after the other in a queue-like fashion. No sharing. No multitasking. You wait your turn.

But in 1962, MIT introduced the first Time-Sharing computing system: CTSS (Compatible Time-Sharing System). Not long after, a joint project between Bell Labs, MIT, and GE gave us MULTICS (MULTiplexed Information and Computing Service). Eventually, that path led to the birth of the legendary UNIX, thanks to the great work of Ken Thompson and Dennis Ritchie (yeah, the guy behind the C language—thanks to him UNIX became portable and no longer tied to a specific computer architecture).

Alright, I know I’m rambling, but this context matters. Because Time-Sharing is the foundation of the OS you and I use today—Linux, Windows, macOS… all of them.

What is Time Sharing Anyway?

It means that multiple processes take turns using the CPU. Each one gets a slice of time (a time slice), then the OS switches to another process. This switching mechanism—called Context Switching—is what makes concurrency possible. You know, the thing we all love because it makes multitasking feel real.

But here’s the catch: for the OS to switch from one process to another, it needs to save the state of the current process so it can resume from the exact same spot later.

That state includes data stored in the CPU registers—especially two of the most critical ones:

SP (Stack Pointer) → points to the top of the process’s stack in memory
PC (Program Counter) → points to the instruction that’s going to execute next

All that state and info is stored in a structure commonly known as the PCB (Process Control Block). It’s just a C struct packed with fields like the process ID (PID), a pointer to its address space layout, an array of file descriptors, and so on.

In Linux, it’s known as task_struct. You’ll find similar names in other UNIX-like systems—check out the repos for Linux, Minix, NetBSD, FreeBSD, etc. Take a peek, poke around the structs, and may Allah grant us knowledge and understanding.

Alright, So Where Do Threads Come In?

So far, everything’s nice and clean. But then developers started asking a very valid question:

"Why can’t I make some of my functions run concurrently, like side by side, without spinning up a whole new heavyweight process for each one?"

See, every time you call fork() to create a new process, it’s not cheap. You’re creating a whole new task_struct, a whole new memory layout… The OS has to reserve real memory for all those duplicated structures.

Yes, things got a little better with the introduction of Copy-on-Write (CoW)—you should definitely read about that—but still, spawning full processes just for internal functions in your app? That’s overkill.

You don’t really need a full-blown process. All you need is a separate execution context: its own stack, a unique PID, and a few lightweight fields. The rest? You actually want to share it all with the parent process—same memory, same files, same everything.

But wait, there’s more: Even if you go through the trouble of forking processes, now you’ve got to figure out how to communicate between them. That’s when you open the lovely can of worms called IPC (Inter-Process Communication). And it’s a whole drama.

The Hero Enters: clone()

That’s where the Linux syscall clone() comes in (other OSs have similar syscalls with slightly different names). What clone() does is basically clone the current process, but with fine-grained control.

You tell it which parts to share and which to separate using flags. In the case of threads, you typically want to share:

The address space
File descriptors
Filesystem info
Signal handlers

Now instead of a full-fat process, you’ve got yourself a Lightweight Process, aka a Thread.

And because threads share so much with their parent, context switching is faster, memory usage is lower, and you don’t need clunky IPC to communicate—they all live in the same memory space.

Real-World Use and Libraries

Of course, this whole thing can get messy if you manage threads manually. That’s why most languages give you libraries or runtimes to handle this cleanly. In C, you've got pthread. In .NET or Java, you’ve got thread pools and built-in concurrency management. High-level runtimes usually handle scheduling, resource sharing, and even load balancing for you.

So, that's the journey of threads—why we needed them, how they emerged, and their significance in today's systems.

From the era of straightforward batch jobs to the complex multi-threaded applications we run now, this evolution has been about achieving more with fewer resources. Threads offer that balance: enabling concurrency without the overhead of full processes.

Hope I didn't go on for too long. Some awesome deep-dive resources are in the first comment for anyone curious enough to explore further.

Take care, and see you around 👋

Adel Elzalabany

System Architect | GETGroup

4mo

Great post, let me justify why fork(2) remain appealing which is evident in popular web and database servers. 1. the privilege of being able to terminate a child process would make memory leaks tolerable 2. While threads share address space, processes can share memory using anonymous mmap(2) or shm* 3. Preforking allow you to maintain greater performance while preserving the ability to perform planned restarts of child processes You will also find a lot of threaded code makes use of POSIX queues to avoid complex thread synchronization. and even with fork you can share current and future file descriptors so no limits on what you can do with fork based code So while threads indeed will provide the best performance, forking remains a minor overhead with great benefits as long as you make the correct decisions with your concurrency architecure

1 Reaction

Ghassen Fatnassi

Contiguous Tensor

4mo

You'll enjoy this https://gfxcourses.stanford.edu/cs149/fall24

1 Reaction

Abdulrahman Nader

Developer @ GEO-Solutions | Learning C, x86 & Linux | Hopes to be a systems programmer

4mo

- GNU Libc's pthread_create Implementation - how pthread_create is implemented using clone(): https://github.com/bminor/glibc/blob/master/nptl/pthread_create.c#L234 Launching Linux Threads and Processes with clone() - In-depth explanation of how clone() is utilized to create threads and processes in Linux, with practical examples and a breakdown of flag implications: https://eli.thegreenplace.net/2018/launching-linux-threads-and-processes-with-clone/ Raw Linux Threads via System Calls - A tutorial showing how to create threads in Linux directly through system calls, bypassing standard libraries, using clone() directly: https://nullprogram.com/blog/2015/05/15/ Understanding clone() and Thread Creation - A Stack Overflow discussion on the flags to use with clone() to replicate pthread_create() behavior, offering practical developer advice: https://stackoverflow.com/questions/12153530/what-flags-should-i-have-to-set-in-clone2-so-that-it-will-work-same-as-pthread Threading From Scratch: Syscalls, Memory, and Your First Thread - A blog series guiding you through building a threading library from the ground up, covering syscalls, memory, and thread creation: https://lnkd.in/djWaBmRB

See more comments

To view or add a comment, sign in

The Birth of Threads: From Fork to clone and Beyond

Abdulrahman Nader

Developer @ GEO-Solutions | Learning C, x86 & Linux | Hopes to be a systems programmer

Back in the Day: Batch Systems and the Road to Time Sharing

What is Time Sharing Anyway?

Alright, So Where Do Threads Come In?

The Hero Enters: clone()

Real-World Use and Libraries

Others also viewed

The Art of Creating Minimal ELF64 Executables by Unconventional Methods

Make your own Linux 64-bit shellcodes - Part 1

Understanding the Interface Layers from User Space to the Linux Kernel

Go Concurrency Series: Deep Dive into Go Scheduler(I)

Golang Benchmarking

Enabling SPI-NOR Flash on Raspberry Pi 5 with Device Tree Modification

Memory is a beautiful thing...

How to use eBPF for monitoring Linux thread contention?

Devicetree

Understanding the __cpu_to_le16 Macro: A Guide to Endianness Conversion in the Linux Kernel

Explore topics